SparseVector

class pyspark.mllib.linalg.SparseVector(size: int, *args: Union[bytes, Tuple[int, float], Iterable[float], Iterable[Tuple[int, float]], Dict[int, float]])[source]

A simple sparse vector class for passing data to MLlib. Users may alternatively pass SciPy’s {scipy.sparse} data types.

Methods

asML()

Convert this vector to the new mllib-local representation.

dot(other)

Dot product with a SparseVector or 1- or 2-dimensional Numpy array.

norm(p)

Calculates the norm of a SparseVector.

numNonzeros()

Number of nonzero elements.

parse(s)

Parse string representation back into the SparseVector.

squared_distance(other)

Squared distance from a SparseVector or 1-dimensional NumPy array.

toArray()

Returns a copy of this SparseVector as a 1-dimensional NumPy array.

Methods Documentation

asML()pyspark.ml.linalg.SparseVector[source]

Convert this vector to the new mllib-local representation. This does NOT copy the data; it copies references.

New in version 2.0.0.

Returns
pyspark.ml.linalg.SparseVector
dot(other: Iterable[float]) → numpy.float64[source]

Dot product with a SparseVector or 1- or 2-dimensional Numpy array.

Examples

>>> a = SparseVector(4, [1, 3], [3.0, 4.0])
>>> a.dot(a)
25.0
>>> a.dot(array.array('d', [1., 2., 3., 4.]))
22.0
>>> b = SparseVector(4, [2], [1.0])
>>> a.dot(b)
0.0
>>> a.dot(np.array([[1, 1], [2, 2], [3, 3], [4, 4]]))
array([ 22.,  22.])
>>> a.dot([1., 2., 3.])
Traceback (most recent call last):
    ...
AssertionError: dimension mismatch
>>> a.dot(np.array([1., 2.]))
Traceback (most recent call last):
    ...
AssertionError: dimension mismatch
>>> a.dot(DenseVector([1., 2.]))
Traceback (most recent call last):
    ...
AssertionError: dimension mismatch
>>> a.dot(np.zeros((3, 2)))
Traceback (most recent call last):
    ...
AssertionError: dimension mismatch
norm(p: NormType) → numpy.float64[source]

Calculates the norm of a SparseVector.

Examples

>>> a = SparseVector(4, [0, 1], [3., -4.])
>>> a.norm(1)
7.0
>>> a.norm(2)
5.0
numNonzeros() → int[source]

Number of nonzero elements. This scans all active values and count non zeros.

static parse(s: str)pyspark.mllib.linalg.SparseVector[source]

Parse string representation back into the SparseVector.

Examples

>>> SparseVector.parse(' (4, [0,1 ],[ 4.0,5.0] )')
SparseVector(4, {0: 4.0, 1: 5.0})
squared_distance(other: Iterable[float]) → numpy.float64[source]

Squared distance from a SparseVector or 1-dimensional NumPy array.

Examples

>>> a = SparseVector(4, [1, 3], [3.0, 4.0])
>>> a.squared_distance(a)
0.0
>>> a.squared_distance(array.array('d', [1., 2., 3., 4.]))
11.0
>>> a.squared_distance(np.array([1., 2., 3., 4.]))
11.0
>>> b = SparseVector(4, [2], [1.0])
>>> a.squared_distance(b)
26.0
>>> b.squared_distance(a)
26.0
>>> b.squared_distance([1., 2.])
Traceback (most recent call last):
    ...
AssertionError: dimension mismatch
>>> b.squared_distance(SparseVector(3, [1,], [1.0,]))
Traceback (most recent call last):
    ...
AssertionError: dimension mismatch
toArray() → numpy.ndarray[source]

Returns a copy of this SparseVector as a 1-dimensional NumPy array.