class StreamingKMeansModel extends KMeansModel with Logging
StreamingKMeansModel extends MLlib's KMeansModel for streaming algorithms, so it can keep track of a continuously updated weight associated with each cluster, and also update the model by doing a single iteration of the standard k-means algorithm.
The update algorithm uses the "mini-batch" KMeans rule, generalized to incorporate forgetfulness (i.e. decay). The update rule (for each cluster) is:
$$ \begin{align} c_{t+1} &= [(c_t * n_t * a) + (x_t * m_t)] / [n_t + m_t] \\ n_{t+1} &= n_t * a + m_t \end{align} $$
Where c_t is the previously estimated centroid for that cluster, n_t is the number of points assigned to it thus far, x_t is the centroid estimated on the current batch, and m_t is the number of points assigned to that centroid in the current batch.
The decay factor 'a' scales the contribution of the clusters as estimated thus far, by applying a as a discount weighting on the current point when evaluating new incoming data. If a=1, all batches are weighted equally. If a=0, new centroids are determined entirely by recent data. Lower values correspond to more forgetting.
Decay can optionally be specified by a half life and associated time unit. The time unit can either be a batch of data or a single data point. Considering data arrived at time t, the half life h is defined such that at time t + h the discount applied to the data from t is 0.5. The definition remains the same whether the time unit is given as batches or points.
- Annotations
- @Since("1.2.0")
- Source
- StreamingKMeans.scala
- Alphabetic
- By Inheritance
- StreamingKMeansModel
- Logging
- KMeansModel
- PMMLExportable
- Serializable
- Saveable
- AnyRef
- Any
- Hide All
- Show All
- Public
- Protected
Instance Constructors
Type Members
-   implicit  class LogStringContext extends AnyRef- Definition Classes
- Logging
 
Value Members
-   final  def !=(arg0: Any): Boolean- Definition Classes
- AnyRef → Any
 
-   final  def ##: Int- Definition Classes
- AnyRef → Any
 
-   final  def ==(arg0: Any): Boolean- Definition Classes
- AnyRef → Any
 
-   final  def asInstanceOf[T0]: T0- Definition Classes
- Any
 
-    def clone(): AnyRef- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.CloneNotSupportedException]) @IntrinsicCandidate() @native()
 
-    val clusterCenters: Array[Vector]- Definition Classes
- StreamingKMeansModel → KMeansModel
- Annotations
- @Since("1.2.0")
 
-    val clusterWeights: Array[Double]- Annotations
- @Since("1.2.0")
 
-    def computeCost(data: RDD[Vector]): DoubleReturn the K-means cost (sum of squared distances of points to their nearest center) for this model on the given data. Return the K-means cost (sum of squared distances of points to their nearest center) for this model on the given data. - Definition Classes
- KMeansModel
- Annotations
- @Since("0.8.0")
 
-    val distanceMeasure: String- Definition Classes
- KMeansModel
- Annotations
- @Since("2.4.0")
 
-   final  def eq(arg0: AnyRef): Boolean- Definition Classes
- AnyRef
 
-    def equals(arg0: AnyRef): Boolean- Definition Classes
- AnyRef → Any
 
-   final  def getClass(): Class[_ <: AnyRef]- Definition Classes
- AnyRef → Any
- Annotations
- @IntrinsicCandidate() @native()
 
-    def hashCode(): Int- Definition Classes
- AnyRef → Any
- Annotations
- @IntrinsicCandidate() @native()
 
-    def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean- Attributes
- protected
- Definition Classes
- Logging
 
-    def initializeLogIfNecessary(isInterpreter: Boolean): Unit- Attributes
- protected
- Definition Classes
- Logging
 
-   final  def isInstanceOf[T0]: Boolean- Definition Classes
- Any
 
-    def isTraceEnabled(): Boolean- Attributes
- protected
- Definition Classes
- Logging
 
-    def k: IntTotal number of clusters. Total number of clusters. - Definition Classes
- KMeansModel
- Annotations
- @Since("0.8.0")
 
-    def log: Logger- Attributes
- protected
- Definition Classes
- Logging
 
-    def logBasedOnLevel(level: Level)(f: => MessageWithContext): Unit- Attributes
- protected
- Definition Classes
- Logging
 
-    def logDebug(msg: => String, throwable: Throwable): Unit- Attributes
- protected
- Definition Classes
- Logging
 
-    def logDebug(entry: LogEntry, throwable: Throwable): Unit- Attributes
- protected
- Definition Classes
- Logging
 
-    def logDebug(entry: LogEntry): Unit- Attributes
- protected
- Definition Classes
- Logging
 
-    def logDebug(msg: => String): Unit- Attributes
- protected
- Definition Classes
- Logging
 
-    def logError(msg: => String, throwable: Throwable): Unit- Attributes
- protected
- Definition Classes
- Logging
 
-    def logError(entry: LogEntry, throwable: Throwable): Unit- Attributes
- protected
- Definition Classes
- Logging
 
-    def logError(entry: LogEntry): Unit- Attributes
- protected
- Definition Classes
- Logging
 
-    def logError(msg: => String): Unit- Attributes
- protected
- Definition Classes
- Logging
 
-    def logInfo(msg: => String, throwable: Throwable): Unit- Attributes
- protected
- Definition Classes
- Logging
 
-    def logInfo(entry: LogEntry, throwable: Throwable): Unit- Attributes
- protected
- Definition Classes
- Logging
 
-    def logInfo(entry: LogEntry): Unit- Attributes
- protected
- Definition Classes
- Logging
 
-    def logInfo(msg: => String): Unit- Attributes
- protected
- Definition Classes
- Logging
 
-    def logName: String- Attributes
- protected
- Definition Classes
- Logging
 
-    def logTrace(msg: => String, throwable: Throwable): Unit- Attributes
- protected
- Definition Classes
- Logging
 
-    def logTrace(entry: LogEntry, throwable: Throwable): Unit- Attributes
- protected
- Definition Classes
- Logging
 
-    def logTrace(entry: LogEntry): Unit- Attributes
- protected
- Definition Classes
- Logging
 
-    def logTrace(msg: => String): Unit- Attributes
- protected
- Definition Classes
- Logging
 
-    def logWarning(msg: => String, throwable: Throwable): Unit- Attributes
- protected
- Definition Classes
- Logging
 
-    def logWarning(entry: LogEntry, throwable: Throwable): Unit- Attributes
- protected
- Definition Classes
- Logging
 
-    def logWarning(entry: LogEntry): Unit- Attributes
- protected
- Definition Classes
- Logging
 
-    def logWarning(msg: => String): Unit- Attributes
- protected
- Definition Classes
- Logging
 
-   final  def ne(arg0: AnyRef): Boolean- Definition Classes
- AnyRef
 
-   final  def notify(): Unit- Definition Classes
- AnyRef
- Annotations
- @IntrinsicCandidate() @native()
 
-   final  def notifyAll(): Unit- Definition Classes
- AnyRef
- Annotations
- @IntrinsicCandidate() @native()
 
-    def predict(points: JavaRDD[Vector]): JavaRDD[Integer]Maps given points to their cluster indices. Maps given points to their cluster indices. - Definition Classes
- KMeansModel
- Annotations
- @Since("1.0.0")
 
-    def predict(points: RDD[Vector]): RDD[Int]Maps given points to their cluster indices. Maps given points to their cluster indices. - Definition Classes
- KMeansModel
- Annotations
- @Since("1.0.0")
 
-    def predict(point: Vector): IntReturns the cluster index that a given point belongs to. Returns the cluster index that a given point belongs to. - Definition Classes
- KMeansModel
- Annotations
- @Since("0.8.0")
 
-    def save(sc: SparkContext, path: String): UnitSave this model to the given path. Save this model to the given path. This saves: - human-readable (JSON) model metadata to path/metadata/
- Parquet formatted data to path/data/
 The model may be loaded using Loader.load.- sc
- Spark context used to save model data. 
- path
- Path specifying the directory in which to save this model. If the directory already exists, this method throws an exception. 
 - Definition Classes
- KMeansModel → Saveable
- Annotations
- @Since("1.4.0")
 
-   final  def synchronized[T0](arg0: => T0): T0- Definition Classes
- AnyRef
 
-    def toPMML(): StringExport the model to a String in PMML format Export the model to a String in PMML format - Definition Classes
- PMMLExportable
- Annotations
- @Since("1.4.0")
 
-    def toPMML(outputStream: OutputStream): UnitExport the model to the OutputStream in PMML format Export the model to the OutputStream in PMML format - Definition Classes
- PMMLExportable
- Annotations
- @Since("1.4.0")
 
-    def toPMML(sc: SparkContext, path: String): UnitExport the model to a directory on a distributed file system in PMML format Export the model to a directory on a distributed file system in PMML format - Definition Classes
- PMMLExportable
- Annotations
- @Since("1.4.0")
 
-    def toPMML(localPath: String): UnitExport the model to a local file in PMML format Export the model to a local file in PMML format - Definition Classes
- PMMLExportable
- Annotations
- @Since("1.4.0")
 
-    def toString(): String- Definition Classes
- AnyRef → Any
 
-    val trainingCost: Double- Definition Classes
- KMeansModel
- Annotations
- @Since("2.4.0")
 
-    def update(data: RDD[Vector], decayFactor: Double, timeUnit: String): StreamingKMeansModelPerform a k-means update on a batch of data. Perform a k-means update on a batch of data. - Annotations
- @Since("1.2.0")
 
-   final  def wait(arg0: Long, arg1: Int): Unit- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
 
-   final  def wait(arg0: Long): Unit- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException]) @native()
 
-   final  def wait(): Unit- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
 
-    def withLogContext(context: Map[String, String])(body: => Unit): Unit- Attributes
- protected
- Definition Classes
- Logging
 
Deprecated Value Members
-    def finalize(): Unit- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.Throwable]) @Deprecated
- Deprecated
- (Since version 9)