pyarrow.table#

pyarrow.table(data, names=None, schema=None, metadata=None, nthreads=None)#

Create a pyarrow.Table from a Python data structure or sequence of arrays.

Parameters:
datadict, list, pandas.DataFrame, Arrow-compatible table

A mapping of strings to Arrays or Python lists, a list of arrays or chunked arrays, a pandas DataFame, or any tabular object implementing the Arrow PyCapsule Protocol (has an __arrow_c_array__, __arrow_c_device_array__ or __arrow_c_stream__ method).

nameslist, default None

Column names if list of arrays passed as data. Mutually exclusive with β€˜schema’ argument.

schemaSchema, default None

The expected schema of the Arrow Table. If not passed, will be inferred from the data. Mutually exclusive with β€˜names’ argument. If passed, the output will have exactly this schema (raising an error when columns are not found in the data and ignoring additional data not specified in the schema, when data is a dict or DataFrame).

metadatadict or Mapping, default None

Optional metadata for the schema (if schema not passed).

nthreadsint, default None

For pandas.DataFrame inputs: if greater than 1, convert columns to Arrow in parallel using indicated number of threads. By default, this follows pyarrow.cpu_count() (may use up to system CPU count threads).

Returns:
Table

Examples

>>> import pyarrow as pa
>>> n_legs = pa.array([2, 4, 5, 100])
>>> animals = pa.array(["Flamingo", "Horse", "Brittle stars", "Centipede"])
>>> names = ["n_legs", "animals"]

Construct a Table from a python dictionary:

>>> pa.table({"n_legs": n_legs, "animals": animals})
pyarrow.Table
n_legs: int64
animals: string
----
n_legs: [[2,4,5,100]]
animals: [["Flamingo","Horse","Brittle stars","Centipede"]]

Construct a Table from arrays:

>>> pa.table([n_legs, animals], names=names)
pyarrow.Table
n_legs: int64
animals: string
----
n_legs: [[2,4,5,100]]
animals: [["Flamingo","Horse","Brittle stars","Centipede"]]

Construct a Table from arrays with metadata:

>>> my_metadata={"n_legs": "Number of legs per animal"}
>>> pa.table([n_legs, animals], names=names, metadata = my_metadata).schema
n_legs: int64
animals: string
-- schema metadata --
n_legs: 'Number of legs per animal'

Construct a Table from pandas DataFrame:

>>> import pandas as pd
>>> df = pd.DataFrame({'year': [2020, 2022, 2019, 2021],
...                    'n_legs': [2, 4, 5, 100],
...                    'animals': ["Flamingo", "Horse", "Brittle stars", "Centipede"]})
>>> pa.table(df)
pyarrow.Table
year: int64
n_legs: int64
animals: string
----
year: [[2020,2022,2019,2021]]
n_legs: [[2,4,5,100]]
animals: [["Flamingo","Horse","Brittle stars","Centipede"]]

Construct a Table from pandas DataFrame with pyarrow schema:

>>> my_schema = pa.schema([
...     pa.field('n_legs', pa.int64()),
...     pa.field('animals', pa.string())],
...     metadata={"n_legs": "Number of legs per animal"})
>>> pa.table(df, my_schema).schema
n_legs: int64
animals: string
-- schema metadata --
n_legs: 'Number of legs per animal'
pandas: '{"index_columns": [], "column_indexes": [{"name": null, ...

Construct a Table from chunked arrays:

>>> n_legs = pa.chunked_array([[2, 2, 4], [4, 5, 100]])
>>> animals = pa.chunked_array([["Flamingo", "Parrot", "Dog"], ["Horse", "Brittle stars", "Centipede"]])
>>> table = pa.table([n_legs, animals], names=names)
>>> table
pyarrow.Table
n_legs: int64
animals: string
----
n_legs: [[2,2,4],[4,5,100]]
animals: [["Flamingo","Parrot","Dog"],["Horse","Brittle stars","Centipede"]]