Class JavaStreamingContext

Object
org.apache.spark.streaming.api.java.JavaStreamingContext
All Implemented Interfaces:
Closeable, AutoCloseable

public class JavaStreamingContext extends Object implements Closeable
Deprecated.
This is deprecated as of Spark 3.4.0. There are no longer updates to DStream and it's a legacy project. There is a newer and easier to use streaming engine in Spark called Structured Streaming. You should use Spark Structured Streaming for your streaming applications.
A Java-friendly version of StreamingContext which is the main entry point for Spark Streaming functionality. It provides methods to create JavaDStream and JavaPairDStream from input sources. The internal org.apache.spark.api.java.JavaSparkContext (see core Spark documentation) can be accessed using context.sparkContext. After creating and transforming DStreams, the streaming computation can be started and stopped using context.start() and context.stop(), respectively. context.awaitTermination() allows the current thread to wait for the termination of a context by stop() or by an exception.
  • Constructor Details

    • JavaStreamingContext

      public JavaStreamingContext(StreamingContext ssc)
      Deprecated.
    • JavaStreamingContext

      public JavaStreamingContext(String master, String appName, Duration batchDuration)
      Deprecated.
      Create a StreamingContext.
      Parameters:
      master - Name of the Spark Master
      appName - Name to be used when registering with the scheduler
      batchDuration - The time interval at which streaming data will be divided into batches
    • JavaStreamingContext

      public JavaStreamingContext(String master, String appName, Duration batchDuration, String sparkHome, String jarFile)
      Deprecated.
      Create a StreamingContext.
      Parameters:
      master - Name of the Spark Master
      appName - Name to be used when registering with the scheduler
      batchDuration - The time interval at which streaming data will be divided into batches
      sparkHome - The SPARK_HOME directory on the worker nodes
      jarFile - JAR file containing job code, to ship to cluster. This can be a path on the local file system or an HDFS, HTTP, HTTPS, or FTP URL.
    • JavaStreamingContext

      public JavaStreamingContext(String master, String appName, Duration batchDuration, String sparkHome, String[] jars)
      Deprecated.
      Create a StreamingContext.
      Parameters:
      master - Name of the Spark Master
      appName - Name to be used when registering with the scheduler
      batchDuration - The time interval at which streaming data will be divided into batches
      sparkHome - The SPARK_HOME directory on the worker nodes
      jars - Collection of JARs to send to the cluster. These can be paths on the local file system or HDFS, HTTP, HTTPS, or FTP URLs.
    • JavaStreamingContext

      public JavaStreamingContext(String master, String appName, Duration batchDuration, String sparkHome, String[] jars, Map<String,String> environment)
      Deprecated.
      Create a StreamingContext.
      Parameters:
      master - Name of the Spark Master
      appName - Name to be used when registering with the scheduler
      batchDuration - The time interval at which streaming data will be divided into batches
      sparkHome - The SPARK_HOME directory on the worker nodes
      jars - Collection of JARs to send to the cluster. These can be paths on the local file system or HDFS, HTTP, HTTPS, or FTP URLs.
      environment - Environment variables to set on worker nodes
    • JavaStreamingContext

      public JavaStreamingContext(JavaSparkContext sparkContext, Duration batchDuration)
      Deprecated.
      Create a JavaStreamingContext using an existing JavaSparkContext.
      Parameters:
      sparkContext - The underlying JavaSparkContext to use
      batchDuration - The time interval at which streaming data will be divided into batches
    • JavaStreamingContext

      public JavaStreamingContext(SparkConf conf, Duration batchDuration)
      Deprecated.
      Create a JavaStreamingContext using a SparkConf configuration.
      Parameters:
      conf - A Spark application configuration
      batchDuration - The time interval at which streaming data will be divided into batches
    • JavaStreamingContext

      public JavaStreamingContext(String path)
      Deprecated.
      Recreate a JavaStreamingContext from a checkpoint file.
      Parameters:
      path - Path to the directory that was specified as the checkpoint directory
    • JavaStreamingContext

      public JavaStreamingContext(String path, org.apache.hadoop.conf.Configuration hadoopConf)
      Deprecated.
      Re-creates a JavaStreamingContext from a checkpoint file.
      Parameters:
      path - Path to the directory that was specified as the checkpoint directory

      hadoopConf - (undocumented)
  • Method Details

    • getOrCreate

      public static JavaStreamingContext getOrCreate(String checkpointPath, Function0<JavaStreamingContext> creatingFunc)
      Deprecated.
      Either recreate a StreamingContext from checkpoint data or create a new StreamingContext. If checkpoint data exists in the provided checkpointPath, then StreamingContext will be recreated from the checkpoint data. If the data does not exist, then the provided factory will be used to create a JavaStreamingContext.

      Parameters:
      checkpointPath - Checkpoint directory used in an earlier JavaStreamingContext program
      creatingFunc - Function to create a new JavaStreamingContext
      Returns:
      (undocumented)
    • getOrCreate

      public static JavaStreamingContext getOrCreate(String checkpointPath, Function0<JavaStreamingContext> creatingFunc, org.apache.hadoop.conf.Configuration hadoopConf)
      Deprecated.
      Either recreate a StreamingContext from checkpoint data or create a new StreamingContext. If checkpoint data exists in the provided checkpointPath, then StreamingContext will be recreated from the checkpoint data. If the data does not exist, then the provided factory will be used to create a JavaStreamingContext.

      Parameters:
      checkpointPath - Checkpoint directory used in an earlier StreamingContext program
      creatingFunc - Function to create a new JavaStreamingContext
      hadoopConf - Hadoop configuration if necessary for reading from any HDFS compatible file system
      Returns:
      (undocumented)
    • getOrCreate

      public static JavaStreamingContext getOrCreate(String checkpointPath, Function0<JavaStreamingContext> creatingFunc, org.apache.hadoop.conf.Configuration hadoopConf, boolean createOnError)
      Deprecated.
      Either recreate a StreamingContext from checkpoint data or create a new StreamingContext. If checkpoint data exists in the provided checkpointPath, then StreamingContext will be recreated from the checkpoint data. If the data does not exist, then the provided factory will be used to create a JavaStreamingContext.

      Parameters:
      checkpointPath - Checkpoint directory used in an earlier StreamingContext program
      creatingFunc - Function to create a new JavaStreamingContext
      hadoopConf - Hadoop configuration if necessary for reading from any HDFS compatible file system
      createOnError - Whether to create a new JavaStreamingContext if there is an error in reading checkpoint data.
      Returns:
      (undocumented)
    • jarOfClass

      public static String[] jarOfClass(Class<?> cls)
      Deprecated.
      Find the JAR from which a given class was loaded, to make it easy for users to pass their JARs to StreamingContext.
      Parameters:
      cls - (undocumented)
      Returns:
      (undocumented)
    • union

      public <T> JavaDStream<T> union(JavaDStream<T>... jdstreams)
      Deprecated.
      Create a unified DStream from multiple DStreams of the same type and same slide duration.
      Parameters:
      jdstreams - (undocumented)
      Returns:
      (undocumented)
    • union

      public <K, V> JavaPairDStream<K,V> union(JavaPairDStream<K,V>... jdstreams)
      Deprecated.
      Create a unified DStream from multiple DStreams of the same type and same slide duration.
      Parameters:
      jdstreams - (undocumented)
      Returns:
      (undocumented)
    • ssc

      public StreamingContext ssc()
      Deprecated.
    • sparkContext

      public JavaSparkContext sparkContext()
      Deprecated.
      The underlying SparkContext
    • socketTextStream

      public JavaReceiverInputDStream<String> socketTextStream(String hostname, int port, StorageLevel storageLevel)
      Deprecated.
      Create an input stream from network source hostname:port. Data is received using a TCP socket and the receive bytes is interpreted as UTF8 encoded \n delimited lines.
      Parameters:
      hostname - Hostname to connect to for receiving data
      port - Port to connect to for receiving data
      storageLevel - Storage level to use for storing the received objects
      Returns:
      (undocumented)
    • socketTextStream

      public JavaReceiverInputDStream<String> socketTextStream(String hostname, int port)
      Deprecated.
      Create an input stream from network source hostname:port. Data is received using a TCP socket and the receive bytes is interpreted as UTF8 encoded \n delimited lines. Storage level of the data will be the default StorageLevel.MEMORY_AND_DISK_SER_2.
      Parameters:
      hostname - Hostname to connect to for receiving data
      port - Port to connect to for receiving data
      Returns:
      (undocumented)
    • socketStream

      public <T> JavaReceiverInputDStream<T> socketStream(String hostname, int port, Function<InputStream,Iterable<T>> converter, StorageLevel storageLevel)
      Deprecated.
      Create an input stream from network source hostname:port. Data is received using a TCP socket and the receive bytes it interpreted as object using the given converter.
      Parameters:
      hostname - Hostname to connect to for receiving data
      port - Port to connect to for receiving data
      converter - Function to convert the byte stream to objects
      storageLevel - Storage level to use for storing the received objects
      Returns:
      (undocumented)
    • textFileStream

      public JavaDStream<String> textFileStream(String directory)
      Deprecated.
      Create an input stream that monitors a Hadoop-compatible filesystem for new files and reads them as text files (using key as LongWritable, value as Text and input format as TextInputFormat). Files must be written to the monitored directory by "moving" them from another location within the same file system. File names starting with . are ignored. The text files must be encoded as UTF-8.

      Parameters:
      directory - HDFS directory to monitor for new file
      Returns:
      (undocumented)
    • binaryRecordsStream

      public JavaDStream<byte[]> binaryRecordsStream(String directory, int recordLength)
      Deprecated.
      Create an input stream that monitors a Hadoop-compatible filesystem for new files and reads them as flat binary files with fixed record lengths, yielding byte arrays

      Parameters:
      directory - HDFS directory to monitor for new files
      recordLength - The length at which to split the records

      Returns:
      (undocumented)
      Note:
      We ensure that the byte array for each record in the resulting RDDs of the DStream has the provided record length.
    • rawSocketStream

      public <T> JavaReceiverInputDStream<T> rawSocketStream(String hostname, int port, StorageLevel storageLevel)
      Deprecated.
      Create an input stream from network source hostname:port, where data is received as serialized blocks (serialized using the Spark's serializer) that can be directly pushed into the block manager without deserializing them. This is the most efficient way to receive data.
      Parameters:
      hostname - Hostname to connect to for receiving data
      port - Port to connect to for receiving data
      storageLevel - Storage level to use for storing the received objects
      Returns:
      (undocumented)
    • rawSocketStream

      public <T> JavaReceiverInputDStream<T> rawSocketStream(String hostname, int port)
      Deprecated.
      Create an input stream from network source hostname:port, where data is received as serialized blocks (serialized using the Spark's serializer) that can be directly pushed into the block manager without deserializing them. This is the most efficient way to receive data.
      Parameters:
      hostname - Hostname to connect to for receiving data
      port - Port to connect to for receiving data
      Returns:
      (undocumented)
    • fileStream

      public <K, V, F extends org.apache.hadoop.mapreduce.InputFormat<K, V>> JavaPairInputDStream<K,V> fileStream(String directory, Class<K> kClass, Class<V> vClass, Class<F> fClass)
      Deprecated.
      Create an input stream that monitors a Hadoop-compatible filesystem for new files and reads them using the given key-value types and input format. Files must be written to the monitored directory by "moving" them from another location within the same file system. File names starting with . are ignored.
      Parameters:
      directory - HDFS directory to monitor for new file
      kClass - class of key for reading HDFS file
      vClass - class of value for reading HDFS file
      fClass - class of input format for reading HDFS file
      Returns:
      (undocumented)
    • fileStream

      public <K, V, F extends org.apache.hadoop.mapreduce.InputFormat<K, V>> JavaPairInputDStream<K,V> fileStream(String directory, Class<K> kClass, Class<V> vClass, Class<F> fClass, Function<org.apache.hadoop.fs.Path,Boolean> filter, boolean newFilesOnly)
      Deprecated.
      Create an input stream that monitors a Hadoop-compatible filesystem for new files and reads them using the given key-value types and input format. Files must be written to the monitored directory by "moving" them from another location within the same file system. File names starting with . are ignored.
      Parameters:
      directory - HDFS directory to monitor for new file
      kClass - class of key for reading HDFS file
      vClass - class of value for reading HDFS file
      fClass - class of input format for reading HDFS file
      filter - Function to filter paths to process
      newFilesOnly - Should process only new files and ignore existing files in the directory
      Returns:
      (undocumented)
    • fileStream

      public <K, V, F extends org.apache.hadoop.mapreduce.InputFormat<K, V>> JavaPairInputDStream<K,V> fileStream(String directory, Class<K> kClass, Class<V> vClass, Class<F> fClass, Function<org.apache.hadoop.fs.Path,Boolean> filter, boolean newFilesOnly, org.apache.hadoop.conf.Configuration conf)
      Deprecated.
      Create an input stream that monitors a Hadoop-compatible filesystem for new files and reads them using the given key-value types and input format. Files must be written to the monitored directory by "moving" them from another location within the same file system. File names starting with . are ignored.
      Parameters:
      directory - HDFS directory to monitor for new file
      kClass - class of key for reading HDFS file
      vClass - class of value for reading HDFS file
      fClass - class of input format for reading HDFS file
      filter - Function to filter paths to process
      newFilesOnly - Should process only new files and ignore existing files in the directory
      conf - Hadoop configuration
      Returns:
      (undocumented)
    • queueStream

      public <T> JavaDStream<T> queueStream(Queue<JavaRDD<T>> queue)
      Deprecated.
      Create an input stream from a queue of RDDs. In each batch, it will process either one or all of the RDDs returned by the queue.

      Parameters:
      queue - Queue of RDDs
      Returns:
      (undocumented)
      Note:
      1. Changes to the queue after the stream is created will not be recognized. 2. Arbitrary RDDs can be added to queueStream, there is no way to recover data of those RDDs, so queueStream doesn't support checkpointing.
    • queueStream

      public <T> JavaInputDStream<T> queueStream(Queue<JavaRDD<T>> queue, boolean oneAtATime)
      Deprecated.
      Create an input stream from a queue of RDDs. In each batch, it will process either one or all of the RDDs returned by the queue.

      Parameters:
      queue - Queue of RDDs
      oneAtATime - Whether only one RDD should be consumed from the queue in every interval
      Returns:
      (undocumented)
      Note:
      1. Changes to the queue after the stream is created will not be recognized. 2. Arbitrary RDDs can be added to queueStream, there is no way to recover data of those RDDs, so queueStream doesn't support checkpointing.
    • queueStream

      public <T> JavaInputDStream<T> queueStream(Queue<JavaRDD<T>> queue, boolean oneAtATime, JavaRDD<T> defaultRDD)
      Deprecated.
      Create an input stream from a queue of RDDs. In each batch, it will process either one or all of the RDDs returned by the queue.

      Parameters:
      queue - Queue of RDDs
      oneAtATime - Whether only one RDD should be consumed from the queue in every interval
      defaultRDD - Default RDD is returned by the DStream when the queue is empty
      Returns:
      (undocumented)
      Note:
      1. Changes to the queue after the stream is created will not be recognized. 2. Arbitrary RDDs can be added to queueStream, there is no way to recover data of those RDDs, so queueStream doesn't support checkpointing.

    • receiverStream

      public <T> JavaReceiverInputDStream<T> receiverStream(Receiver<T> receiver)
      Deprecated.
      Create an input stream with any arbitrary user implemented receiver. Find more details at: https://spark.apache.org/docs/latest/streaming-custom-receivers.html
      Parameters:
      receiver - Custom implementation of Receiver
      Returns:
      (undocumented)
    • union

      public <T> JavaDStream<T> union(scala.collection.Seq<JavaDStream<T>> jdstreams)
      Deprecated.
      Create a unified DStream from multiple DStreams of the same type and same slide duration.
      Parameters:
      jdstreams - (undocumented)
      Returns:
      (undocumented)
    • union

      public <K, V> JavaPairDStream<K,V> union(scala.collection.Seq<JavaPairDStream<K,V>> jdstreams)
      Deprecated.
      Create a unified DStream from multiple DStreams of the same type and same slide duration.
      Parameters:
      jdstreams - (undocumented)
      Returns:
      (undocumented)
    • transform

      public <T> JavaDStream<T> transform(List<JavaDStream<?>> dstreams, Function2<List<JavaRDD<?>>,Time,JavaRDD<T>> transformFunc)
      Deprecated.
      Create a new DStream in which each RDD is generated by applying a function on RDDs of the DStreams. The order of the JavaRDDs in the transform function parameter will be the same as the order of corresponding DStreams in the list.

      Parameters:
      dstreams - (undocumented)
      transformFunc - (undocumented)
      Returns:
      (undocumented)
      Note:
      For adding a JavaPairDStream in the list of JavaDStreams, convert it to a JavaDStream using JavaPairDStream.toJavaDStream(). In the transform function, convert the JavaRDD corresponding to that JavaDStream to a JavaPairRDD using org.apache.spark.api.java.JavaPairRDD.fromJavaRDD().
    • transformToPair

      public <K, V> JavaPairDStream<K,V> transformToPair(List<JavaDStream<?>> dstreams, Function2<List<JavaRDD<?>>,Time,JavaPairRDD<K,V>> transformFunc)
      Deprecated.
      Create a new DStream in which each RDD is generated by applying a function on RDDs of the DStreams. The order of the JavaRDDs in the transform function parameter will be the same as the order of corresponding DStreams in the list.

      Parameters:
      dstreams - (undocumented)
      transformFunc - (undocumented)
      Returns:
      (undocumented)
      Note:
      For adding a JavaPairDStream in the list of JavaDStreams, convert it to a JavaDStream using JavaPairDStream.toJavaDStream(). In the transform function, convert the JavaRDD corresponding to that JavaDStream to a JavaPairRDD using org.apache.spark.api.java.JavaPairRDD.fromJavaRDD().
    • checkpoint

      public void checkpoint(String directory)
      Deprecated.
      Sets the context to periodically checkpoint the DStream operations for master fault-tolerance. The graph will be checkpointed every batch interval.
      Parameters:
      directory - HDFS-compatible directory where the checkpoint data will be reliably stored
    • remember

      public void remember(Duration duration)
      Deprecated.
      Sets each DStreams in this context to remember RDDs it generated in the last given duration. DStreams remember RDDs only for a limited duration of duration and releases them for garbage collection. This method allows the developer to specify how long to remember the RDDs ( if the developer wishes to query old data outside the DStream computation).
      Parameters:
      duration - Minimum duration that each DStream should remember its RDDs
    • addStreamingListener

      public void addStreamingListener(StreamingListener streamingListener)
      Deprecated.
      Add a StreamingListener object for receiving system events related to streaming.
      Parameters:
      streamingListener - (undocumented)
    • getState

      public StreamingContextState getState()
      Deprecated.
      :: DeveloperApi ::

      Return the current state of the context. The context can be in three possible states -

      • StreamingContextState.INITIALIZED - The context has been created, but not been started yet. Input DStreams, transformations and output operations can be created on the context.
      • StreamingContextState.ACTIVE - The context has been started, and been not stopped. Input DStreams, transformations and output operations cannot be created on the context.
      • StreamingContextState.STOPPED - The context has been stopped and cannot be used any more.
      Returns:
      (undocumented)
    • start

      public void start()
      Deprecated.
      Start the execution of the streams.
    • awaitTermination

      public void awaitTermination() throws InterruptedException
      Deprecated.
      Wait for the execution to stop. Any exceptions that occurs during the execution will be thrown in this thread.
      Throws:
      InterruptedException
    • awaitTerminationOrTimeout

      public boolean awaitTerminationOrTimeout(long timeout) throws InterruptedException
      Deprecated.
      Wait for the execution to stop. Any exceptions that occurs during the execution will be thrown in this thread.

      Parameters:
      timeout - time to wait in milliseconds
      Returns:
      true if it's stopped; or throw the reported error during the execution; or false if the waiting time elapsed before returning from the method.
      Throws:
      InterruptedException
    • stop

      public void stop()
      Deprecated.
      Stop the execution of the streams. Will stop the associated JavaSparkContext as well.
    • stop

      public void stop(boolean stopSparkContext)
      Deprecated.
      Stop the execution of the streams.
      Parameters:
      stopSparkContext - Stop the associated SparkContext or not
    • stop

      public void stop(boolean stopSparkContext, boolean stopGracefully)
      Deprecated.
      Stop the execution of the streams.
      Parameters:
      stopSparkContext - Stop the associated SparkContext or not
      stopGracefully - Stop gracefully by waiting for the processing of all received data to be completed
    • close

      public void close()
      Deprecated.
      Specified by:
      close in interface AutoCloseable
      Specified by:
      close in interface Closeable