Packages

  • package root
    Definition Classes
    root
  • package org
    Definition Classes
    root
  • package apache
    Definition Classes
    org
  • package spark

    Core Spark functionality.

    Core Spark functionality. org.apache.spark.SparkContext serves as the main entry point to Spark, while org.apache.spark.rdd.RDD is the data type representing a distributed collection, and provides most parallel operations.

    In addition, org.apache.spark.rdd.PairRDDFunctions contains operations available only on RDDs of key-value pairs, such as groupByKey and join; org.apache.spark.rdd.DoubleRDDFunctions contains operations available only on RDDs of Doubles; and org.apache.spark.rdd.SequenceFileRDDFunctions contains operations available on RDDs that can be saved as SequenceFiles. These operations are automatically available on any RDD of the right type (e.g. RDD[(Int, Int)] through implicit conversions.

    Java programmers should reference the org.apache.spark.api.java package for Spark programming APIs in Java.

    Classes and methods marked with Experimental are user-facing features which have not been officially adopted by the Spark project. These are subject to change or removal in minor releases.

    Classes and methods marked with Developer API are intended for advanced users want to extend Spark through lower level interfaces. These are subject to changes or removal in minor releases.

    Definition Classes
    apache
  • package sql

    Allows the execution of relational queries, including those expressed in SQL using Spark.

    Allows the execution of relational queries, including those expressed in SQL using Spark.

    Definition Classes
    spark
  • package api

    Contains API classes that are specific to a single language (i.e.

    Contains API classes that are specific to a single language (i.e. Java).

    Definition Classes
    sql
  • package artifact
    Definition Classes
    sql
  • package avro
    Definition Classes
    sql
  • package catalog
    Definition Classes
    sql
  • package catalyst
    Definition Classes
    sql
  • package columnar
    Definition Classes
    sql
  • package connector
    Definition Classes
    sql
  • package expressions
    Definition Classes
    sql
  • package jdbc
    Definition Classes
    sql
  • package sources

    A set of APIs for adding data sources to Spark SQL.

    A set of APIs for adding data sources to Spark SQL.

    Definition Classes
    sql
  • package streaming
    Definition Classes
    sql
  • package types

    Contains a type system for attributes produced by relations, including complex types like structs, arrays and maps.

    Contains a type system for attributes produced by relations, including complex types like structs, arrays and maps.

    Definition Classes
    sql
  • package util
    Definition Classes
    sql
  • package vectorized
    Definition Classes
    sql
  • ArrowColumnVector
  • ColumnVector
  • ColumnarArray
  • ColumnarBatch
  • ColumnarBatchRow
  • ColumnarMap
  • ColumnarRow
p

org.apache.spark.sql

vectorized

package vectorized

Type Members

  1. class ArrowColumnVector extends ColumnVector

    A column vector backed by Apache Arrow.

    A column vector backed by Apache Arrow.

    Annotations
    @DeveloperApi()
  2. abstract class ColumnVector extends AutoCloseable

    An interface representing in-memory columnar data in Spark.

    An interface representing in-memory columnar data in Spark. This interface defines the main APIs to access the data, as well as their batched versions. The batched versions are considered to be faster and preferable whenever possible.

    Most of the APIs take the rowId as a parameter. This is the batch local 0-based row id for values in this ColumnVector.

    Spark only calls specific get method according to the data type of this ColumnVector, e.g. if it's int type, Spark is guaranteed to only call #getInt(int) or int).

    ColumnVector supports all the data types including nested types. To handle nested types, ColumnVector can have children and is a tree structure. Please refer to #getStruct(int), #getArray(int) and #getMap(int) for the details about how to implement nested types.

    ColumnVector is expected to be reused during the entire data loading process, to avoid allocating memory again and again.

    ColumnVector is meant to maximize CPU efficiency but not to minimize storage footprint. Implementations should prefer computing efficiency over storage efficiency when design the format. Since it is expected to reuse the ColumnVector instance while loading data, the storage footprint is negligible.

    Annotations
    @Evolving()
  3. final class ColumnarArray extends ArrayData

    Array abstraction in ColumnVector.

    Array abstraction in ColumnVector.

    Annotations
    @Evolving()
  4. class ColumnarBatch extends AutoCloseable

    This class wraps multiple ColumnVectors as a row-wise table.

    This class wraps multiple ColumnVectors as a row-wise table. It provides a row view of this batch so that Spark can access the data row by row. Instance of it is meant to be reused during the entire data loading process. A data source may extend this class with customized logic.

    Annotations
    @DeveloperApi()
  5. final class ColumnarBatchRow extends InternalRow

    This class wraps an array of ColumnVector and provides a row view.

    This class wraps an array of ColumnVector and provides a row view.

    Annotations
    @DeveloperApi()
    Since

    3.3.0

  6. final class ColumnarMap extends MapData

    Map abstraction in ColumnVector.

  7. final class ColumnarRow extends InternalRow

    Row abstraction in ColumnVector.

    Row abstraction in ColumnVector.

    Annotations
    @Evolving()

Ungrouped