# Multivariate Gaussian Mixture Model (GMM)

`spark.gaussianMixture.Rd`

Fits multivariate gaussian mixture model against a SparkDataFrame, similarly to R's
mvnormalmixEM(). Users can call `summary`

to print a summary of the fitted model,
`predict`

to make predictions on new data, and `write.ml`

/`read.ml`

to save/load fitted models.

## Usage

```
spark.gaussianMixture(data, formula, ...)
# S4 method for SparkDataFrame,formula
spark.gaussianMixture(data, formula, k = 2, maxIter = 100, tol = 0.01)
# S4 method for GaussianMixtureModel
summary(object)
# S4 method for GaussianMixtureModel
predict(object, newData)
# S4 method for GaussianMixtureModel,character
write.ml(object, path, overwrite = FALSE)
```

## Arguments

- data
a SparkDataFrame for training.

- formula
a symbolic description of the model to be fitted. Currently only a few formula operators are supported, including '~', '.', ':', '+', and '-'. Note that the response variable of formula is empty in spark.gaussianMixture.

- ...
additional arguments passed to the method.

- k
number of independent Gaussians in the mixture model.

- maxIter
maximum iteration number.

- tol
the convergence tolerance.

- object
a fitted gaussian mixture model.

- newData
a SparkDataFrame for testing.

- path
the directory where the model is saved.

- overwrite
overwrites or not if the output path already exists. Default is FALSE which means throw exception if the output path exists.

## Value

`spark.gaussianMixture`

returns a fitted multivariate gaussian mixture model.

`summary`

returns summary of the fitted model, which is a list.
The list includes the model's `lambda`

(lambda), `mu`

(mu),

`sigma`

(sigma), `loglik`

(loglik), and `posterior`

(posterior).

`predict`

returns a SparkDataFrame containing predicted labels in a column named
"prediction".

## Note

spark.gaussianMixture since 2.1.0

summary(GaussianMixtureModel) since 2.1.0

predict(GaussianMixtureModel) since 2.1.0

write.ml(GaussianMixtureModel, character) since 2.1.0

## Examples

```
if (FALSE) {
sparkR.session()
library(mvtnorm)
set.seed(100)
a <- rmvnorm(4, c(0, 0))
b <- rmvnorm(6, c(3, 4))
data <- rbind(a, b)
df <- createDataFrame(as.data.frame(data))
model <- spark.gaussianMixture(df, ~ V1 + V2, k = 2)
summary(model)
# fitted values on training data
fitted <- predict(model, df)
head(select(fitted, "V1", "prediction"))
# save fitted model to input path
path <- "path/to/model"
write.ml(model, path)
# can also read back the saved model and print
savedModel <- read.ml(path)
summary(savedModel)
}
```