Package org.apache.spark.sql.api.r
Class SQLUtils
Object
org.apache.spark.sql.api.r.SQLUtils
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic ArrayTypecreateArrayType(String elementType) static org.apache.spark.sql.classic.Dataset<Row>createDF(RDD<byte[]> rdd, StructType schema, org.apache.spark.sql.classic.SparkSession sparkSession) static StructFieldcreateStructField(String name, String dataType, boolean nullable) static StructTypecreateStructType(scala.collection.immutable.Seq<StructField> fields) static org.apache.spark.sql.classic.Dataset<Row>dapply(org.apache.spark.sql.classic.Dataset<Row> df, byte[] func, byte[] packageNames, Object[] broadcastVars, StructType schema) The helper function for dapply() on R side.static Object[][]static JavaRDD<byte[]>dfToRowRDD(org.apache.spark.sql.classic.Dataset<Row> df) static org.apache.spark.sql.classic.Dataset<Row>gapply(org.apache.spark.sql.classic.RelationalGroupedDataset gd, byte[] func, byte[] packageNames, Object[] broadcastVars, StructType schema) The helper function for gapply() on R side.static JavaSparkContextgetJavaSparkContext(org.apache.spark.sql.classic.SparkSession spark) static org.apache.spark.sql.classic.SparkSessiongetOrCreateSparkSession(JavaSparkContext jsc, Map<Object, Object> sparkConfigMap, boolean enableHiveSupport) getSessionConf(org.apache.spark.sql.classic.SparkSession spark) static String[]getTableNames(org.apache.spark.sql.classic.SparkSession sparkSession, String databaseName) static org.apache.spark.internal.Logging.LogStringContextLogStringContext(scala.StringContext sc) static org.slf4j.Loggerstatic voidorg$apache$spark$internal$Logging$$log__$eq(org.slf4j.Logger x$1) static JavaRDD<byte[]>readArrowStreamFromFile(org.apache.spark.sql.classic.SparkSession sparkSession, String filename) R callable function to read a file in Arrow stream format and create anRDDusing each serialized ArrowRecordBatch as a partition.static ObjectreadSqlObject(DataInputStream dis, char dataType) static StructTypestatic voidsetSparkContextSessionConf(org.apache.spark.sql.classic.SparkSession spark, Map<Object, Object> sparkConfigMap) static org.apache.spark.sql.classic.Dataset<Row>toDataFrame(JavaRDD<byte[]> arrowBatchRDD, StructType schema, org.apache.spark.sql.classic.SparkSession sparkSession) R callable function to create aDataFramefrom aJavaRDDof serialized ArrowRecordBatches.static booleanwriteSqlObject(DataOutputStream dos, Object obj)
-
Constructor Details
-
SQLUtils
public SQLUtils()
-
-
Method Details
-
getOrCreateSparkSession
public static org.apache.spark.sql.classic.SparkSession getOrCreateSparkSession(JavaSparkContext jsc, Map<Object, Object> sparkConfigMap, boolean enableHiveSupport) -
setSparkContextSessionConf
-
getSessionConf
-
getJavaSparkContext
-
createStructType
-
createStructField
-
createDF
public static org.apache.spark.sql.classic.Dataset<Row> createDF(RDD<byte[]> rdd, StructType schema, org.apache.spark.sql.classic.SparkSession sparkSession) -
dfToRowRDD
-
SERIALIZED_R_DATA_SCHEMA
-
dapply
public static org.apache.spark.sql.classic.Dataset<Row> dapply(org.apache.spark.sql.classic.Dataset<Row> df, byte[] func, byte[] packageNames, Object[] broadcastVars, StructType schema) The helper function for dapply() on R side.- Parameters:
df- (undocumented)func- (undocumented)packageNames- (undocumented)broadcastVars- (undocumented)schema- (undocumented)- Returns:
- (undocumented)
-
gapply
public static org.apache.spark.sql.classic.Dataset<Row> gapply(org.apache.spark.sql.classic.RelationalGroupedDataset gd, byte[] func, byte[] packageNames, Object[] broadcastVars, StructType schema) The helper function for gapply() on R side.- Parameters:
gd- (undocumented)func- (undocumented)packageNames- (undocumented)broadcastVars- (undocumented)schema- (undocumented)- Returns:
- (undocumented)
-
dfToCols
-
readSqlObject
-
writeSqlObject
-
getTableNames
-
createArrayType
-
readArrowStreamFromFile
public static JavaRDD<byte[]> readArrowStreamFromFile(org.apache.spark.sql.classic.SparkSession sparkSession, String filename) R callable function to read a file in Arrow stream format and create anRDDusing each serialized ArrowRecordBatch as a partition.- Parameters:
sparkSession- (undocumented)filename- (undocumented)- Returns:
- (undocumented)
-
toDataFrame
public static org.apache.spark.sql.classic.Dataset<Row> toDataFrame(JavaRDD<byte[]> arrowBatchRDD, StructType schema, org.apache.spark.sql.classic.SparkSession sparkSession) R callable function to create aDataFramefrom aJavaRDDof serialized ArrowRecordBatches.- Parameters:
arrowBatchRDD- (undocumented)schema- (undocumented)sparkSession- (undocumented)- Returns:
- (undocumented)
-
org$apache$spark$internal$Logging$$log_
public static org.slf4j.Logger org$apache$spark$internal$Logging$$log_() -
org$apache$spark$internal$Logging$$log__$eq
public static void org$apache$spark$internal$Logging$$log__$eq(org.slf4j.Logger x$1) -
LogStringContext
public static org.apache.spark.internal.Logging.LogStringContext LogStringContext(scala.StringContext sc)
-