Package org.apache.spark.sql.util
Class SchemaUtils
Object
org.apache.spark.sql.util.SchemaUtils
Utils for handling schemas.
TODO: Merge this file with SchemaUtils
.
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionstatic void
checkColumnNameDuplication
(scala.collection.Seq<String> columnNames, boolean caseSensitiveAnalysis) Checks if input column names have duplicate identifiers.static void
checkColumnNameDuplication
(scala.collection.Seq<String> columnNames, scala.Function2<String, String, Object> resolver) Checks if input column names have duplicate identifiers.static void
checkSchemaColumnNameDuplication
(DataType schema, boolean caseSensitiveAnalysis) Checks if an input schema has duplicate column names.static void
checkSchemaColumnNameDuplication
(StructType schema, scala.Function2<String, String, Object> resolver) Checks if an input schema has duplicate column names.static void
checkTransformDuplication
(scala.collection.Seq<Transform> transforms, String checkType, boolean isCaseSensitive) Checks if the partitioning transforms are being duplicated or not.static String
static scala.collection.Seq<String>
explodeNestedFieldNames
(StructType schema) Returns all column names in this schema as a flat list.static scala.collection.Seq<Object>
findColumnPosition
(scala.collection.Seq<String> column, StructType schema, scala.Function2<String, String, Object> resolver) Returns the given column's ordinal within the givenschema
.static scala.collection.Seq<String>
getColumnName
(scala.collection.Seq<Object> position, StructType schema) Gets the name of the column in the given position.static scala.collection.Seq<org.apache.spark.sql.catalyst.expressions.NamedExpression>
restoreOriginalOutputNames
(scala.collection.Seq<org.apache.spark.sql.catalyst.expressions.NamedExpression> projectList, scala.collection.Seq<String> originalNames)
-
Constructor Details
-
SchemaUtils
public SchemaUtils()
-
-
Method Details
-
checkSchemaColumnNameDuplication
Checks if an input schema has duplicate column names. This throws an exception if the duplication exists.- Parameters:
schema
- schema to checkcaseSensitiveAnalysis
- whether duplication checks should be case sensitive or not
-
checkSchemaColumnNameDuplication
public static void checkSchemaColumnNameDuplication(StructType schema, scala.Function2<String, String, Object> resolver) Checks if an input schema has duplicate column names. This throws an exception if the duplication exists.- Parameters:
schema
- schema to checkresolver
- resolver used to determine if two identifiers are equal
-
checkColumnNameDuplication
public static void checkColumnNameDuplication(scala.collection.Seq<String> columnNames, scala.Function2<String, String, Object> resolver) Checks if input column names have duplicate identifiers. This throws an exception if the duplication exists.- Parameters:
columnNames
- column names to checkresolver
- resolver used to determine if two identifiers are equal
-
checkColumnNameDuplication
public static void checkColumnNameDuplication(scala.collection.Seq<String> columnNames, boolean caseSensitiveAnalysis) Checks if input column names have duplicate identifiers. This throws an exception if the duplication exists.- Parameters:
columnNames
- column names to checkcaseSensitiveAnalysis
- whether duplication checks should be case sensitive or not
-
explodeNestedFieldNames
Returns all column names in this schema as a flat list. For example, a schema like: | - a | | - 1 | | - 2 | - b | - c | | - nest | | - 3 will get flattened to: "a", "a.1", "a.2", "b", "c", "c.nest", "c.nest.3"- Parameters:
schema
- (undocumented)- Returns:
- (undocumented)
-
checkTransformDuplication
public static void checkTransformDuplication(scala.collection.Seq<Transform> transforms, String checkType, boolean isCaseSensitive) Checks if the partitioning transforms are being duplicated or not. Throws an exception if duplication exists.- Parameters:
transforms
- the schema to check for duplicatescheckType
- contextual information around the check, used in an exception messageisCaseSensitive
- Whether to be case sensitive when comparing column names
-
findColumnPosition
public static scala.collection.Seq<Object> findColumnPosition(scala.collection.Seq<String> column, StructType schema, scala.Function2<String, String, Object> resolver) Returns the given column's ordinal within the givenschema
. The length of the returned position will be as long as how nested the column is.- Parameters:
column
- The column to search for in the given struct. If the length ofcolumn
is greater than 1, we expect to enter a nested field.schema
- The current struct we are looking at.resolver
- The resolver to find the column.- Returns:
- (undocumented)
-
getColumnName
public static scala.collection.Seq<String> getColumnName(scala.collection.Seq<Object> position, StructType schema) Gets the name of the column in the given position.- Parameters:
position
- (undocumented)schema
- (undocumented)- Returns:
- (undocumented)
-
restoreOriginalOutputNames
public static scala.collection.Seq<org.apache.spark.sql.catalyst.expressions.NamedExpression> restoreOriginalOutputNames(scala.collection.Seq<org.apache.spark.sql.catalyst.expressions.NamedExpression> projectList, scala.collection.Seq<String> originalNames) -
escapeMetaCharacters
- Parameters:
str
- The string to be escaped.- Returns:
- The escaped string.
-