pyarrow.record_batch#

pyarrow.record_batch(data, names=None, schema=None, metadata=None)#

Create a pyarrow.RecordBatch from another Python data structure or sequence of arrays.

Parameters:
datadict, list, pandas.DataFrame, Arrow-compatible table

A mapping of strings to Arrays or Python lists, a list of Arrays, a pandas DataFame, or any tabular object implementing the Arrow PyCapsule Protocol (has an __arrow_c_array__ or __arrow_c_device_array__ method).

nameslist, default None

Column names if list of arrays passed as data. Mutually exclusive with β€˜schema’ argument.

schemaSchema, default None

The expected schema of the RecordBatch. If not passed, will be inferred from the data. Mutually exclusive with β€˜names’ argument.

metadatadict or Mapping, default None

Optional metadata for the schema (if schema not passed).

Returns:
RecordBatch

Examples

>>> import pyarrow as pa
>>> n_legs = pa.array([2, 2, 4, 4, 5, 100])
>>> animals = pa.array(["Flamingo", "Parrot", "Dog", "Horse", "Brittle stars", "Centipede"])
>>> names = ["n_legs", "animals"]

Construct a RecordBatch from a python dictionary:

>>> pa.record_batch({"n_legs": n_legs, "animals": animals})
pyarrow.RecordBatch
n_legs: int64
animals: string
----
n_legs: [2,2,4,4,5,100]
animals: ["Flamingo","Parrot","Dog","Horse","Brittle stars","Centipede"]
>>> pa.record_batch({"n_legs": n_legs, "animals": animals}).to_pandas()
   n_legs        animals
0       2       Flamingo
1       2         Parrot
2       4            Dog
3       4          Horse
4       5  Brittle stars
5     100      Centipede

Creating a RecordBatch from a list of arrays with names:

>>> pa.record_batch([n_legs, animals], names=names)
pyarrow.RecordBatch
n_legs: int64
animals: string
----
n_legs: [2,2,4,4,5,100]
animals: ["Flamingo","Parrot","Dog","Horse","Brittle stars","Centipede"]

Creating a RecordBatch from a list of arrays with names and metadata:

>>> my_metadata={"n_legs": "How many legs does an animal have?"}
>>> pa.record_batch([n_legs, animals],
...                  names=names,
...                  metadata = my_metadata)
pyarrow.RecordBatch
n_legs: int64
animals: string
----
n_legs: [2,2,4,4,5,100]
animals: ["Flamingo","Parrot","Dog","Horse","Brittle stars","Centipede"]
>>> pa.record_batch([n_legs, animals],
...                  names=names,
...                  metadata = my_metadata).schema
n_legs: int64
animals: string
-- schema metadata --
n_legs: 'How many legs does an animal have?'

Creating a RecordBatch from a pandas DataFrame:

>>> import pandas as pd
>>> df = pd.DataFrame({'year': [2020, 2022, 2021, 2022],
...                    'month': [3, 5, 7, 9],
...                    'day': [1, 5, 9, 13],
...                    'n_legs': [2, 4, 5, 100],
...                    'animals': ["Flamingo", "Horse", "Brittle stars", "Centipede"]})
>>> pa.record_batch(df)
pyarrow.RecordBatch
year: int64
month: int64
day: int64
n_legs: int64
animals: string
----
year: [2020,2022,2021,2022]
month: [3,5,7,9]
day: [1,5,9,13]
n_legs: [2,4,5,100]
animals: ["Flamingo","Horse","Brittle stars","Centipede"]
>>> pa.record_batch(df).to_pandas()
   year  month  day  n_legs        animals
0  2020      3    1       2       Flamingo
1  2022      5    5       4          Horse
2  2021      7    9       5  Brittle stars
3  2022      9   13     100      Centipede

Creating a RecordBatch from a pandas DataFrame with schema:

>>> my_schema = pa.schema([
...     pa.field('n_legs', pa.int64()),
...     pa.field('animals', pa.string())],
...     metadata={"n_legs": "Number of legs per animal"})
>>> pa.record_batch(df, my_schema).schema
n_legs: int64
animals: string
-- schema metadata --
n_legs: 'Number of legs per animal'
pandas: ...
>>> pa.record_batch(df, my_schema).to_pandas()
   n_legs        animals
0       2       Flamingo
1       4          Horse
2       5  Brittle stars
3     100      Centipede