Skip to content

DataFusion evaluates -0.0 >= 0.0 as false #22490

@Fly-a-Kite

Description

@Fly-a-Kite

Describe the bug

Summary

DataFusion 53.0.0 evaluates -0.0 >= 0.0 as false.

Python and DuckDB evaluate the same comparison as true. This also affects filters using IS TRUE / IS NOT TRUE.

Environment

  • datafusion==53.0.0
  • pyarrow==24.0.0
  • duckdb==1.5.3
  • Python 3.12.3

Reproduction

#!/usr/bin/env python3

from __future__ import annotations

import datafusion
import duckdb
import pyarrow as pa
from datafusion import SessionContext


def main() -> None:
    python_cmp = (-0.0 >= 0.0)
    duckdb_cmp = duckdb.connect(database=":memory:").execute(
        "SELECT (-0.0 >= 0.0)"
    ).fetchone()[0]

    ctx = SessionContext()
    batch = pa.RecordBatch.from_pylist(
        [{"id": 1, "y": 0.0}],
        schema=pa.schema([
            pa.field("id", pa.int64()),
            pa.field("y", pa.float64(), nullable=True),
        ]),
    )
    ctx.register_record_batches("t0", [[batch]])

    diagnostic = ctx.sql(
        "SELECT id, y * -1 AS m_0, (y * -1) >= 0.0 AS cmp FROM t0"
    ).to_pandas()

    filtered = ctx.sql(
        "SELECT id, y * -1 AS m_0 "
        "FROM t0 "
        "WHERE NOT (((y * -1) >= 0.0) IS TRUE)"
    ).to_pandas()

    print(f"datafusion={getattr(datafusion, '__version__', 'unknown')}")
    print(f"pyarrow={pa.__version__}")
    print(f"duckdb={duckdb.__version__}")

    print(f"python comparison: {python_cmp!r}")
    print(f"duckdb comparison: {duckdb_cmp!r}")

    print("datafusion diagnostic:")
    print(diagnostic)

    print("datafusion filtered rows:")
    print(filtered)

    assert python_cmp is True
    assert duckdb_cmp is True
    assert bool(diagnostic.iloc[0]["cmp"]) is True
    assert len(filtered) == 0


if __name__ == "__main__":
    main()

Expected behavior

y * -1 is -0.0, and -0.0 >= 0.0 should be true.

Therefore this filter should remove the row:

WHERE NOT (((y * -1) >= 0.0) IS TRUE)

Python and DuckDB agree with this expectation.

Actual behavior

DataFusion reports the comparison as false and keeps the row.

Observed output:

python comparison: True
duckdb comparison: True

datafusion diagnostic:
   id  m_0    cmp
0   1 -0.0  False

datafusion filtered rows:
   id  m_0
0   1 -0.0

To Reproduce

No response

Expected behavior

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No fields configured for Bug.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions