pyspark.pandas.DataFrame.plot.bar#

plot.bar(x=None, y=None, **kwds)#

Vertical bar plot.

A bar plot is a plot that presents categorical data with rectangular bars with lengths proportional to the values that they represent. A bar plot shows comparisons among discrete categories. One axis of the plot shows the specific categories being compared, and the other axis represents a measured value.

Parameters
xlabel or position, optional

Allows plotting of one column versus another. If not specified, the index of the DataFrame is used.

ylabel or position, optional

Allows plotting of one column versus another. If not specified, all numerical columns are used.

**kwdsoptional

Additional keyword arguments are documented in pyspark.pandas.Series.plot() or pyspark.pandas.DataFrame.plot().

Returns
plotly.graph_objs.Figure

Return an custom object when backend!=plotly. Return an ndarray when subplots=True (matplotlib-only).

Examples

Basic plot.

For Series:

>>> s = ps.Series([1, 3, 2])
>>> s.plot.bar()  

For DataFrame:

>>> df = ps.DataFrame({'lab': ['A', 'B', 'C'], 'val': [10, 30, 20]})
>>> df.plot.bar(x='lab', y='val')  

Plot a whole dataframe to a bar plot. Each column is stacked with a distinct color along the horizontal axis.

>>> speed = [0.1, 17.5, 40, 48, 52, 69, 88]
>>> lifespan = [2, 8, 70, 1.5, 25, 12, 28]
>>> index = ['snail', 'pig', 'elephant',
...          'rabbit', 'giraffe', 'coyote', 'horse']
>>> df = ps.DataFrame({'speed': speed,
...                    'lifespan': lifespan}, index=index)
>>> df.plot.bar()  

Instead of stacking, the figure can be split by column with plotly APIs.

>>> from plotly.subplots import make_subplots
>>> speed = [0.1, 17.5, 40, 48, 52, 69, 88]
>>> lifespan = [2, 8, 70, 1.5, 25, 12, 28]
>>> index = ['snail', 'pig', 'elephant',
...          'rabbit', 'giraffe', 'coyote', 'horse']
>>> df = ps.DataFrame({'speed': speed,
...                    'lifespan': lifespan}, index=index)
>>> fig = (make_subplots(rows=2, cols=1)
...        .add_trace(df.plot.bar(y='speed').data[0], row=1, col=1)
...        .add_trace(df.plot.bar(y='speed').data[0], row=1, col=1)
...        .add_trace(df.plot.bar(y='lifespan').data[0], row=2, col=1))
>>> fig  

Plot a single column.

>>> speed = [0.1, 17.5, 40, 48, 52, 69, 88]
>>> lifespan = [2, 8, 70, 1.5, 25, 12, 28]
>>> index = ['snail', 'pig', 'elephant',
...          'rabbit', 'giraffe', 'coyote', 'horse']
>>> df = ps.DataFrame({'speed': speed,
...                    'lifespan': lifespan}, index=index)
>>> df.plot.bar(y='speed')  

Plot only selected categories for the DataFrame.

>>> speed = [0.1, 17.5, 40, 48, 52, 69, 88]
>>> lifespan = [2, 8, 70, 1.5, 25, 12, 28]
>>> index = ['snail', 'pig', 'elephant',
...          'rabbit', 'giraffe', 'coyote', 'horse']
>>> df = ps.DataFrame({'speed': speed,
...                    'lifespan': lifespan}, index=index)
>>> df.plot.bar(x='lifespan')