class StreamingKMeansModel extends KMeansModel with Logging
StreamingKMeansModel extends MLlib's KMeansModel for streaming algorithms, so it can keep track of a continuously updated weight associated with each cluster, and also update the model by doing a single iteration of the standard k-means algorithm.
The update algorithm uses the "mini-batch" KMeans rule, generalized to incorporate forgetfulness (i.e. decay). The update rule (for each cluster) is:
$$ \begin{align} c_{t+1} &= [(c_t * n_t * a) + (x_t * m_t)] / [n_t + m_t] \\ n_{t+1} &= n_t * a + m_t \end{align} $$
Where c_t is the previously estimated centroid for that cluster, n_t is the number of points assigned to it thus far, x_t is the centroid estimated on the current batch, and m_t is the number of points assigned to that centroid in the current batch.
The decay factor 'a' scales the contribution of the clusters as estimated thus far, by applying a as a discount weighting on the current point when evaluating new incoming data. If a=1, all batches are weighted equally. If a=0, new centroids are determined entirely by recent data. Lower values correspond to more forgetting.
Decay can optionally be specified by a half life and associated time unit. The time unit can either be a batch of data or a single data point. Considering data arrived at time t, the half life h is defined such that at time t + h the discount applied to the data from t is 0.5. The definition remains the same whether the time unit is given as batches or points.
- Annotations
 - @Since( "1.2.0" )
 - Source
 - StreamingKMeans.scala
 
- Alphabetic
 - By Inheritance
 
- StreamingKMeansModel
 - Logging
 - KMeansModel
 - PMMLExportable
 - Serializable
 - Serializable
 - Saveable
 - AnyRef
 - Any
 
- Hide All
 - Show All
 
- Public
 - All
 
Instance Constructors
Value Members
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        !=(arg0: Any): Boolean
      
      
      
- Definition Classes
 - AnyRef → Any
 
 - 
      
      
      
        
      
    
      
        final 
        def
      
      
        ##(): Int
      
      
      
- Definition Classes
 - AnyRef → Any
 
 - 
      
      
      
        
      
    
      
        final 
        def
      
      
        ==(arg0: Any): Boolean
      
      
      
- Definition Classes
 - AnyRef → Any
 
 - 
      
      
      
        
      
    
      
        final 
        def
      
      
        asInstanceOf[T0]: T0
      
      
      
- Definition Classes
 - Any
 
 - 
      
      
      
        
      
    
      
        
        def
      
      
        clone(): AnyRef
      
      
      
- Attributes
 - protected[lang]
 - Definition Classes
 - AnyRef
 - Annotations
 - @throws( ... ) @native() @IntrinsicCandidate()
 
 - 
      
      
      
        
      
    
      
        
        val
      
      
        clusterCenters: Array[Vector]
      
      
      
- Definition Classes
 - StreamingKMeansModel → KMeansModel
 - Annotations
 - @Since( "1.2.0" )
 
 - 
      
      
      
        
      
    
      
        
        val
      
      
        clusterWeights: Array[Double]
      
      
      
- Annotations
 - @Since( "1.2.0" )
 
 - 
      
      
      
        
      
    
      
        
        def
      
      
        computeCost(data: RDD[Vector]): Double
      
      
      
Return the K-means cost (sum of squared distances of points to their nearest center) for this model on the given data.
Return the K-means cost (sum of squared distances of points to their nearest center) for this model on the given data.
- Definition Classes
 - KMeansModel
 - Annotations
 - @Since( "0.8.0" )
 
 - 
      
      
      
        
      
    
      
        
        val
      
      
        distanceMeasure: String
      
      
      
- Definition Classes
 - KMeansModel
 - Annotations
 - @Since( "2.4.0" )
 
 - 
      
      
      
        
      
    
      
        final 
        def
      
      
        eq(arg0: AnyRef): Boolean
      
      
      
- Definition Classes
 - AnyRef
 
 - 
      
      
      
        
      
    
      
        
        def
      
      
        equals(arg0: Any): Boolean
      
      
      
- Definition Classes
 - AnyRef → Any
 
 - 
      
      
      
        
      
    
      
        final 
        def
      
      
        getClass(): Class[_]
      
      
      
- Definition Classes
 - AnyRef → Any
 - Annotations
 - @native() @IntrinsicCandidate()
 
 - 
      
      
      
        
      
    
      
        
        def
      
      
        hashCode(): Int
      
      
      
- Definition Classes
 - AnyRef → Any
 - Annotations
 - @native() @IntrinsicCandidate()
 
 - 
      
      
      
        
      
    
      
        
        def
      
      
        initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
      
      
      
- Attributes
 - protected
 - Definition Classes
 - Logging
 
 - 
      
      
      
        
      
    
      
        
        def
      
      
        initializeLogIfNecessary(isInterpreter: Boolean): Unit
      
      
      
- Attributes
 - protected
 - Definition Classes
 - Logging
 
 - 
      
      
      
        
      
    
      
        final 
        def
      
      
        isInstanceOf[T0]: Boolean
      
      
      
- Definition Classes
 - Any
 
 - 
      
      
      
        
      
    
      
        
        def
      
      
        isTraceEnabled(): Boolean
      
      
      
- Attributes
 - protected
 - Definition Classes
 - Logging
 
 - 
      
      
      
        
      
    
      
        
        def
      
      
        k: Int
      
      
      
Total number of clusters.
Total number of clusters.
- Definition Classes
 - KMeansModel
 - Annotations
 - @Since( "0.8.0" )
 
 - 
      
      
      
        
      
    
      
        
        def
      
      
        log: Logger
      
      
      
- Attributes
 - protected
 - Definition Classes
 - Logging
 
 - 
      
      
      
        
      
    
      
        
        def
      
      
        logDebug(msg: ⇒ String, throwable: Throwable): Unit
      
      
      
- Attributes
 - protected
 - Definition Classes
 - Logging
 
 - 
      
      
      
        
      
    
      
        
        def
      
      
        logDebug(msg: ⇒ String): Unit
      
      
      
- Attributes
 - protected
 - Definition Classes
 - Logging
 
 - 
      
      
      
        
      
    
      
        
        def
      
      
        logError(msg: ⇒ String, throwable: Throwable): Unit
      
      
      
- Attributes
 - protected
 - Definition Classes
 - Logging
 
 - 
      
      
      
        
      
    
      
        
        def
      
      
        logError(msg: ⇒ String): Unit
      
      
      
- Attributes
 - protected
 - Definition Classes
 - Logging
 
 - 
      
      
      
        
      
    
      
        
        def
      
      
        logInfo(msg: ⇒ String, throwable: Throwable): Unit
      
      
      
- Attributes
 - protected
 - Definition Classes
 - Logging
 
 - 
      
      
      
        
      
    
      
        
        def
      
      
        logInfo(msg: ⇒ String): Unit
      
      
      
- Attributes
 - protected
 - Definition Classes
 - Logging
 
 - 
      
      
      
        
      
    
      
        
        def
      
      
        logName: String
      
      
      
- Attributes
 - protected
 - Definition Classes
 - Logging
 
 - 
      
      
      
        
      
    
      
        
        def
      
      
        logTrace(msg: ⇒ String, throwable: Throwable): Unit
      
      
      
- Attributes
 - protected
 - Definition Classes
 - Logging
 
 - 
      
      
      
        
      
    
      
        
        def
      
      
        logTrace(msg: ⇒ String): Unit
      
      
      
- Attributes
 - protected
 - Definition Classes
 - Logging
 
 - 
      
      
      
        
      
    
      
        
        def
      
      
        logWarning(msg: ⇒ String, throwable: Throwable): Unit
      
      
      
- Attributes
 - protected
 - Definition Classes
 - Logging
 
 - 
      
      
      
        
      
    
      
        
        def
      
      
        logWarning(msg: ⇒ String): Unit
      
      
      
- Attributes
 - protected
 - Definition Classes
 - Logging
 
 - 
      
      
      
        
      
    
      
        final 
        def
      
      
        ne(arg0: AnyRef): Boolean
      
      
      
- Definition Classes
 - AnyRef
 
 - 
      
      
      
        
      
    
      
        final 
        def
      
      
        notify(): Unit
      
      
      
- Definition Classes
 - AnyRef
 - Annotations
 - @native() @IntrinsicCandidate()
 
 - 
      
      
      
        
      
    
      
        final 
        def
      
      
        notifyAll(): Unit
      
      
      
- Definition Classes
 - AnyRef
 - Annotations
 - @native() @IntrinsicCandidate()
 
 - 
      
      
      
        
      
    
      
        
        def
      
      
        predict(points: JavaRDD[Vector]): JavaRDD[Integer]
      
      
      
Maps given points to their cluster indices.
Maps given points to their cluster indices.
- Definition Classes
 - KMeansModel
 - Annotations
 - @Since( "1.0.0" )
 
 - 
      
      
      
        
      
    
      
        
        def
      
      
        predict(points: RDD[Vector]): RDD[Int]
      
      
      
Maps given points to their cluster indices.
Maps given points to their cluster indices.
- Definition Classes
 - KMeansModel
 - Annotations
 - @Since( "1.0.0" )
 
 - 
      
      
      
        
      
    
      
        
        def
      
      
        predict(point: Vector): Int
      
      
      
Returns the cluster index that a given point belongs to.
Returns the cluster index that a given point belongs to.
- Definition Classes
 - KMeansModel
 - Annotations
 - @Since( "0.8.0" )
 
 - 
      
      
      
        
      
    
      
        
        def
      
      
        save(sc: SparkContext, path: String): Unit
      
      
      
Save this model to the given path.
Save this model to the given path.
This saves:
- human-readable (JSON) model metadata to path/metadata/
 - Parquet formatted data to path/data/
 
The model may be loaded using
Loader.load.- sc
 Spark context used to save model data.
- path
 Path specifying the directory in which to save this model. If the directory already exists, this method throws an exception.
- Definition Classes
 - KMeansModel → Saveable
 - Annotations
 - @Since( "1.4.0" )
 
 - 
      
      
      
        
      
    
      
        final 
        def
      
      
        synchronized[T0](arg0: ⇒ T0): T0
      
      
      
- Definition Classes
 - AnyRef
 
 - 
      
      
      
        
      
    
      
        
        def
      
      
        toPMML(): String
      
      
      
Export the model to a String in PMML format
Export the model to a String in PMML format
- Definition Classes
 - PMMLExportable
 - Annotations
 - @Since( "1.4.0" )
 
 - 
      
      
      
        
      
    
      
        
        def
      
      
        toPMML(outputStream: OutputStream): Unit
      
      
      
Export the model to the OutputStream in PMML format
Export the model to the OutputStream in PMML format
- Definition Classes
 - PMMLExportable
 - Annotations
 - @Since( "1.4.0" )
 
 - 
      
      
      
        
      
    
      
        
        def
      
      
        toPMML(sc: SparkContext, path: String): Unit
      
      
      
Export the model to a directory on a distributed file system in PMML format
Export the model to a directory on a distributed file system in PMML format
- Definition Classes
 - PMMLExportable
 - Annotations
 - @Since( "1.4.0" )
 
 - 
      
      
      
        
      
    
      
        
        def
      
      
        toPMML(localPath: String): Unit
      
      
      
Export the model to a local file in PMML format
Export the model to a local file in PMML format
- Definition Classes
 - PMMLExportable
 - Annotations
 - @Since( "1.4.0" )
 
 - 
      
      
      
        
      
    
      
        
        def
      
      
        toString(): String
      
      
      
- Definition Classes
 - AnyRef → Any
 
 - 
      
      
      
        
      
    
      
        
        val
      
      
        trainingCost: Double
      
      
      
- Definition Classes
 - KMeansModel
 - Annotations
 - @Since( "2.4.0" )
 
 - 
      
      
      
        
      
    
      
        
        def
      
      
        update(data: RDD[Vector], decayFactor: Double, timeUnit: String): StreamingKMeansModel
      
      
      
Perform a k-means update on a batch of data.
Perform a k-means update on a batch of data.
- Annotations
 - @Since( "1.2.0" )
 
 - 
      
      
      
        
      
    
      
        final 
        def
      
      
        wait(arg0: Long, arg1: Int): Unit
      
      
      
- Definition Classes
 - AnyRef
 - Annotations
 - @throws( ... )
 
 - 
      
      
      
        
      
    
      
        final 
        def
      
      
        wait(arg0: Long): Unit
      
      
      
- Definition Classes
 - AnyRef
 - Annotations
 - @throws( ... ) @native()
 
 - 
      
      
      
        
      
    
      
        final 
        def
      
      
        wait(): Unit
      
      
      
- Definition Classes
 - AnyRef
 - Annotations
 - @throws( ... )
 
 
Deprecated Value Members
- 
      
      
      
        
      
    
      
        
        def
      
      
        finalize(): Unit
      
      
      
- Attributes
 - protected[lang]
 - Definition Classes
 - AnyRef
 - Annotations
 - @throws( classOf[java.lang.Throwable] ) @Deprecated
 - Deprecated