org.apache.spark.sql.columnar.SimpleMetricsCachedBatchSerializer

All Implemented Interfaces:: Serializable, org.apache.spark.internal.Logging, CachedBatchSerializer, scala.Serializable

public abstract class SimpleMetricsCachedBatchSerializer extends Object implements CachedBatchSerializer, org.apache.spark.internal.Logging

Provides basic filtering for CachedBatchSerializer implementations. The requirement to extend this is that all of the batches produced by your serializer are instances of SimpleMetricsCachedBatch. This does not calculate the metrics needed to be stored in the batches. That is up to each implementation. The metrics required are really just min and max values and those are optional especially for complex types. Because those metrics are simple and it is likely that compression will also be done on the data we thought it best to let each implementation decide on the most efficient way to calculate the metrics, possibly combining them with compression passes that might also be done across the data.

See Also:

Serialized Form

Nested Class Summary

Nested classes/interfaces inherited from interface org.apache.spark.internal.Logging
org.apache.spark.internal.Logging.SparkShellLoggingFilter
Constructor Summary

Constructors

Constructor

Description

SimpleMetricsCachedBatchSerializer()
Method Summary

Modifier and Type

Method

Description

scala.Function2<Object,scala.collection.Iterator<CachedBatch>,scala.collection.Iterator<CachedBatch>>

buildFilter(scala.collection.Seq<org.apache.spark.sql.catalyst.expressions.Expression> predicates, scala.collection.Seq<org.apache.spark.sql.catalyst.expressions.Attribute> cachedAttributes)

Builds a function that can be used to filter batches prior to being decompressed.

Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface org.apache.spark.sql.columnar.CachedBatchSerializer
convertCachedBatchToColumnarBatch, convertCachedBatchToInternalRow, convertColumnarBatchToCachedBatch, convertInternalRowToCachedBatch, supportsColumnarInput, supportsColumnarOutput, vectorTypes

Methods inherited from interface org.apache.spark.internal.Logging
initializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, isTraceEnabled, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning, org$apache$spark$internal$Logging$$log_, org$apache$spark$internal$Logging$$log__$eq

Constructor Details
- SimpleMetricsCachedBatchSerializer
  
  public SimpleMetricsCachedBatchSerializer()
Method Details
- buildFilter
  
  public scala.Function2<Object,scala.collection.Iterator<CachedBatch>,scala.collection.Iterator<CachedBatch>> buildFilter(scala.collection.Seq<org.apache.spark.sql.catalyst.expressions.Expression> predicates, scala.collection.Seq<org.apache.spark.sql.catalyst.expressions.Attribute> cachedAttributes)
  
  Description copied from interface: CachedBatchSerializer
  
  Builds a function that can be used to filter batches prior to being decompressed. In most cases extending SimpleMetricsCachedBatchSerializer will provide the filter logic necessary. You will need to provide metrics for this to work. SimpleMetricsCachedBatch provides the APIs to hold those metrics and explains the metrics used, really just min and max. Note that this is intended to skip batches that are not needed, and the actual filtering of individual rows is handled later.
  
  Specified by:
  
  buildFilter in interface CachedBatchSerializer
  
  Parameters:
  
  predicates - the set of expressions to use for filtering.
  
  cachedAttributes - the schema/attributes of the data that is cached. This can be helpful if you don't store it with the data.
  
  Returns:
  
  a function that takes the partition id and the iterator of batches in the partition. It returns an iterator of batches that should be decompressed.

Class SimpleMetricsCachedBatchSerializer

Nested Class Summary

Nested classes/interfaces inherited from interface org.apache.spark.internal.Logging

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Methods inherited from interface org.apache.spark.sql.columnar.CachedBatchSerializer

Methods inherited from interface org.apache.spark.internal.Logging

Constructor Details

SimpleMetricsCachedBatchSerializer

Method Details

buildFilter