pyspark.pandas.Series.dot

Series.dot(other: Union[Series, pyspark.pandas.frame.DataFrame]) → Union[int, float, bool, str, bytes, decimal.Decimal, datetime.date, datetime.datetime, None, pyspark.pandas.series.Series][source]

Compute the dot product between the Series and the columns of other.

This method computes the dot product between the Series and another one, or the Series and each columns of a DataFrame.

It can also be called using self @ other in Python >= 3.5.

Note

This API is slightly different from pandas when indexes from both Series are not aligned and config ‘compute.eager_check’ is False. pandas raise an exception; however, pandas-on-Spark just proceeds and performs by ignoring mismatches with NaN permissively.

>>> pdf1 = pd.Series([1, 2, 3], index=[0, 1, 2])
>>> pdf2 = pd.Series([1, 2, 3], index=[0, 1, 3])
>>> pdf1.dot(pdf2)  
...
ValueError: matrices are not aligned
>>> psdf1 = ps.Series([1, 2, 3], index=[0, 1, 2])
>>> psdf2 = ps.Series([1, 2, 3], index=[0, 1, 3])
>>> with ps.option_context("compute.eager_check", False):
...     psdf1.dot(psdf2)  
...
5
Parameters
otherSeries, DataFrame.

The other object to compute the dot product with its columns.

Returns
scalar, Series

Return the dot product of the Series and other if other is a Series, the Series of the dot product of Series and each row of other if other is a DataFrame.

Notes

The Series and other must share the same index if other are a Series or a DataFrame.

Examples

>>> s = ps.Series([0, 1, 2, 3])
>>> s.dot(s)
14
>>> s @ s
14
>>> psdf = ps.DataFrame({'x': [0, 1, 2, 3], 'y': [0, -1, -2, -3]})
>>> psdf
   x  y
0  0  0
1  1 -1
2  2 -2
3  3 -3
>>> with ps.option_context("compute.ops_on_diff_frames", True):
...     s.dot(psdf)
...
x    14
y   -14
dtype: int64