Documentation

Use the PyArrow library to analyze data

Use PyArrow to read and analyze query results from InfluxDB Cloud Serverless. The PyArrow library provides efficient computation, aggregation, serialization, and conversion of Arrow format data.

Apache Arrow is a development platform for in-memory analytics. It contains a set of technologies that enable big data systems to store, process and move data fast.

The Arrow Python bindings (also named β€œPyArrow”) have first-class integration with NumPy, pandas, and built-in Python objects. They are based on the C++ implementation of Arrow.

Install prerequisites

The examples in this guide assume using a Python virtual environment and the InfluxDB 3 influxdb3-python Python client library. For more information, see how to get started using Python to query InfluxDB.

Installing influxdb3-python also installs the pyarrow library that provides Python bindings for Apache Arrow.

Use PyArrow to read query results

The following example shows how to use influxdb3-python and pyarrow to query InfluxDB and view Arrow data as a PyArrow Table.

  1. In your editor, copy and paste the following sample code to a new file–for example, pyarrow-example.py:

    # pyarrow-example.py
    
    from influxdb_client_3 import InfluxDBClient3
    import pandas
    
    def querySQL():
      
      # Instantiate an InfluxDB client configured for a bucket
      client = InfluxDBClient3(
        "https://cloud2.influxdata.com",
        database="
    BUCKET_NAME
    "
    ,
    token="
    API_TOKEN
    "
    )
    # Execute the query to retrieve all record batches in the stream formatted as a PyArrow Table. table = client.query( '''SELECT * FROM home WHERE time >= now() - INTERVAL '90 days' ORDER BY time''' ) client.close() print(querySQL())
  2. Replace the following configuration values:

    • API_TOKEN: An InfluxDB token with read permissions on the buckets you want to query.
    • BUCKET_NAME: The name of the InfluxDB bucket to query.
  3. In your terminal, use the Python interpreter to run the file:

    python pyarrow-example.py

The InfluxDBClient3.query() method sends the query request, and then returns a pyarrow.Table that contains all the Arrow record batches from the response stream.

Next, use PyArrow to analyze data.

Use PyArrow to analyze data

Group and aggregate data

With a pyarrow.Table, you can use values in a column as keys for grouping.

The following example shows how to query InfluxDB, and then use PyArrow to group the table data and calculate an aggregate value for each group:

# pyarrow-example.py

from influxdb_client_3 import InfluxDBClient3
import pandas

def querySQL():
  
  # Instantiate an InfluxDB client configured for a bucket
  client = InfluxDBClient3(
    "https://cloud2.influxdata.com",
    database="
BUCKET_NAME
"
,
token="
API_TOKEN
"
)
# Execute the query to retrieve data # formatted as a PyArrow Table table = client.query( '''SELECT * FROM home WHERE time >= now() - INTERVAL '90 days' ORDER BY time''' ) client.close() return table table = querySQL() # Use PyArrow to aggregate data print(table.group_by('room').aggregate([('temp', 'mean')]))

Replace the following:

  • API_TOKEN: An InfluxDB token with read permissions on the buckets you want to query.
  • BUCKET_NAME: The name of the InfluxDB bucket to query.

View example results

For more detail and examples, see the PyArrow documentation and the Apache Arrow Python Cookbook.


Was this page helpful?

Thank you for your feedback!


New in InfluxDB 3.5

Key enhancements in InfluxDB 3.5 and the InfluxDB 3 Explorer 1.3.

See the Blog Post

InfluxDB 3.5 is now available for both Core and Enterprise, introducing custom plugin repository support, enhanced operational visibility with queryable CLI parameters and manual node management, stronger security controls, and general performance improvements.

InfluxDB 3 Explorer 1.3 brings powerful new capabilities including Dashboards (beta) for saving and organizing your favorite queries, and cache querying for instant access to Last Value and Distinct Value cachesβ€”making Explorer a more comprehensive workspace for time series monitoring and analysis.

For more information, check out:

InfluxDB Docker latest tag changing to InfluxDB 3 Core

On November 3, 2025, the latest tag for InfluxDB Docker images will point to InfluxDB 3 Core. To avoid unexpected upgrades, use specific version tags in your Docker deployments.

If using Docker to install and run InfluxDB, the latest tag will point to InfluxDB 3 Core. To avoid unexpected upgrades, use specific version tags in your Docker deployments. For example, if using Docker to run InfluxDB v2, replace the latest version tag with a specific version tag in your Docker pull command–for example:

docker pull influxdb:2

InfluxDB Cloud Serverless