pyspark.pandas.DataFrame.pivot#
- DataFrame.pivot(index=None, columns=None, values=None)[source]#
Return reshaped DataFrame organized by given index / column values.
Reshape data (produce a βpivotβ table) based on column values. Uses unique values from specified index / columns to form axes of the resulting DataFrame. This function does not support data aggregation.
- Parameters
- indexstring, optional
Column to use to make new frameβs index. If None, uses existing index.
- columnsstring
Column to use to make new frameβs columns.
- valuesstring, object or a list of the previous
Column(s) to use for populating new frameβs values.
- Returns
- DataFrame
Returns reshaped DataFrame.
See also
DataFrame.pivot_table
Generalization of pivot that can handle duplicate values for one index/column pair.
Examples
>>> df = ps.DataFrame({'foo': ['one', 'one', 'one', 'two', 'two', ... 'two'], ... 'bar': ['A', 'B', 'C', 'A', 'B', 'C'], ... 'baz': [1, 2, 3, 4, 5, 6], ... 'zoo': ['x', 'y', 'z', 'q', 'w', 't']}, ... columns=['foo', 'bar', 'baz', 'zoo']) >>> df foo bar baz zoo 0 one A 1 x 1 one B 2 y 2 one C 3 z 3 two A 4 q 4 two B 5 w 5 two C 6 t
>>> df.pivot(index='foo', columns='bar', values='baz').sort_index() ... bar A B C foo one 1 2 3 two 4 5 6
>>> df.pivot(columns='bar', values='baz').sort_index() bar A B C 0 1.0 NaN NaN 1 NaN 2.0 NaN 2 NaN NaN 3.0 3 4.0 NaN NaN 4 NaN 5.0 NaN 5 NaN NaN 6.0
Notice that, unlike pandas raises an ValueError when duplicated values are found. Pandas-on-Sparkβs pivot still works with its first value it meets during operation because pivot is an expensive operation, and it is preferred to permissively execute over failing fast when processing large data.
>>> df = ps.DataFrame({"foo": ['one', 'one', 'two', 'two'], ... "bar": ['A', 'A', 'B', 'C'], ... "baz": [1, 2, 3, 4]}, columns=['foo', 'bar', 'baz']) >>> df foo bar baz 0 one A 1 1 one A 2 2 two B 3 3 two C 4
>>> df.pivot(index='foo', columns='bar', values='baz').sort_index() ... bar A B C foo one 1.0 NaN NaN two NaN 3.0 4.0
It also supports multi-index and multi-index column. >>> df.columns = pd.MultiIndex.from_tuples([(βaβ, βfooβ), (βaβ, βbarβ), (βbβ, βbazβ)])
>>> df = df.set_index(('a', 'bar'), append=True) >>> df a b foo baz (a, bar) 0 A one 1 1 A one 2 2 B two 3 3 C two 4
>>> df.pivot(columns=('a', 'foo'), values=('b', 'baz')).sort_index() ... ('a', 'foo') one two (a, bar) 0 A 1.0 NaN 1 A 2.0 NaN 2 B NaN 3.0 3 C NaN 4.0