Ad

What Does The Expression : Select `(column1|column2|column3)?+.+` From Table In SQL Means?

- 1 answer

I am trying to convert a SQL Code into Pyspark SQL. While selecting the columns from a table , the Select Statement has something as below :

Select a.(column1|column2|column3)?+.+,trim(column c) from Table a;

I would like to understand what "a.(column1|column2|column3)?+.+" expression resolves to and what it actually implies? How to address this while converting the sql into pyspark?

Ad

Answer

That is a way of selecting certain column names using regexps. That regex matches (and excludes) the columns column1, column2 or column3.

It is the Spark's equivalent of the Hive's Quoted Identifiers. See also Spark's documentation.

Be aware that, for enabling this behavior, it is first necessary to run the following command:

spark.sql("SET spark.sql.parser.quotedRegexColumnNames=true").show(false)
Ad
source: stackoverflow.com
Ad