Class Observation

Object
org.apache.spark.sql.Observation

public class Observation extends Object
Helper class to simplify usage of Dataset.observe(String, Column, Column*):


   // Observe row count (rows) and highest id (maxid) in the Dataset while writing it
   val observation = Observation("my metrics")
   val observed_ds = ds.observe(observation, count(lit(1)).as("rows"), max($"id").as("maxid"))
   observed_ds.write.parquet("ds.parquet")
   val metrics = observation.get
 

This collects the metrics while the first action is executed on the observed dataset. Subsequent actions do not modify the metrics returned by get(). Retrieval of the metric via get() blocks until the first action has finished and metrics become available.

This class does not support streaming datasets.

param: name name of the metric

Since:
3.3.0
  • Constructor Details

    • Observation

      public Observation(String name)
    • Observation

      public Observation()
      Create an Observation instance without providing a name. This generates a random name.
  • Method Details

    • apply

      public static Observation apply()
      Observation constructor for creating an anonymous observation.
      Returns:
      (undocumented)
    • apply

      public static Observation apply(String name)
      Observation constructor for creating a named observation.
      Parameters:
      name - (undocumented)
      Returns:
      (undocumented)
    • name

      public String name()
    • get

      public scala.collection.immutable.Map<String,?> get() throws InterruptedException
      (Scala-specific) Get the observed metrics. This waits for the observed dataset to finish its first action. Only the result of the first action is available. Subsequent actions do not modify the result.

      Returns:
      the observed metrics as a Map[String, Any]
      Throws:
      InterruptedException - interrupted while waiting
    • getAsJava

      public Map<String,Object> getAsJava() throws InterruptedException
      (Java-specific) Get the observed metrics. This waits for the observed dataset to finish its first action. Only the result of the first action is available. Subsequent actions do not modify the result.

      Returns:
      the observed metrics as a java.util.Map[String, Object]
      Throws:
      InterruptedException - interrupted while waiting