pyspark.pandas.DataFrame.stack#
- DataFrame.stack()[source]#
- Stack the prescribed level(s) from columns to index. - Return a reshaped DataFrame or Series having a multi-level index with one or more new inner-most levels compared to the current DataFrame. The new inner-most levels are created by pivoting the columns of the current dataframe: - if the columns have a single level, the output is a Series 
- if the columns have multiple levels, the new index level(s) is (are) taken from the prescribed level(s) and the output is a DataFrame. 
 - The new index levels are sorted. - Returns
- DataFrame or Series
- Stacked dataframe or series. 
 
 - See also - DataFrame.unstack
- Unstack prescribed level(s) from index axis onto column axis. 
- DataFrame.pivot
- Reshape dataframe from long format to wide format. 
- DataFrame.pivot_table
- Create a spreadsheet-style pivot table as a DataFrame. 
 - Notes - The function is named by analogy with a collection of books being reorganized from being side by side on a horizontal position (the columns of the dataframe) to being stacked vertically on top of each other (in the index of the dataframe). - Examples - Single level columns - >>> df_single_level_cols = ps.DataFrame([[0, 1], [2, 3]], ... index=['cat', 'dog'], ... columns=['weight', 'height']) - Stacking a dataframe with a single level column axis returns a Series: - >>> df_single_level_cols weight height cat 0 1 dog 2 3 >>> df_single_level_cols.stack().sort_index() cat height 1 weight 0 dog height 3 weight 2 dtype: int64 - Multi level columns: simple case - >>> multicol1 = pd.MultiIndex.from_tuples([('weight', 'kg'), ... ('weight', 'pounds')]) >>> df_multi_level_cols1 = ps.DataFrame([[1, 2], [2, 4]], ... index=['cat', 'dog'], ... columns=multicol1) - Stacking a dataframe with a multi-level column axis: - >>> df_multi_level_cols1 weight kg pounds cat 1 2 dog 2 4 >>> df_multi_level_cols1.stack().sort_index() weight cat kg 1 pounds 2 dog kg 2 pounds 4 - Missing values - >>> multicol2 = pd.MultiIndex.from_tuples([('weight', 'kg'), ... ('height', 'm')]) >>> df_multi_level_cols2 = ps.DataFrame([[1.0, 2.0], [3.0, 4.0]], ... index=['cat', 'dog'], ... columns=multicol2) - It is common to have missing values when stacking a dataframe with multi-level columns, as the stacked dataframe typically has more values than the original dataframe. Missing values are filled with NaNs: - >>> df_multi_level_cols2 weight height kg m cat 1.0 2.0 dog 3.0 4.0 >>> df_multi_level_cols2.stack().sort_index() weight height cat kg 1.0 NaN m NaN 2.0 dog kg 3.0 NaN m NaN 4.0