Describe the bug
Summary
DataFusion 53.0.0 evaluates -0.0 >= 0.0 as false.
Python and DuckDB evaluate the same comparison as true. This also affects filters using IS TRUE / IS NOT TRUE.
Environment
datafusion==53.0.0
pyarrow==24.0.0
duckdb==1.5.3
- Python
3.12.3
Reproduction
#!/usr/bin/env python3
from __future__ import annotations
import datafusion
import duckdb
import pyarrow as pa
from datafusion import SessionContext
def main() -> None:
python_cmp = (-0.0 >= 0.0)
duckdb_cmp = duckdb.connect(database=":memory:").execute(
"SELECT (-0.0 >= 0.0)"
).fetchone()[0]
ctx = SessionContext()
batch = pa.RecordBatch.from_pylist(
[{"id": 1, "y": 0.0}],
schema=pa.schema([
pa.field("id", pa.int64()),
pa.field("y", pa.float64(), nullable=True),
]),
)
ctx.register_record_batches("t0", [[batch]])
diagnostic = ctx.sql(
"SELECT id, y * -1 AS m_0, (y * -1) >= 0.0 AS cmp FROM t0"
).to_pandas()
filtered = ctx.sql(
"SELECT id, y * -1 AS m_0 "
"FROM t0 "
"WHERE NOT (((y * -1) >= 0.0) IS TRUE)"
).to_pandas()
print(f"datafusion={getattr(datafusion, '__version__', 'unknown')}")
print(f"pyarrow={pa.__version__}")
print(f"duckdb={duckdb.__version__}")
print(f"python comparison: {python_cmp!r}")
print(f"duckdb comparison: {duckdb_cmp!r}")
print("datafusion diagnostic:")
print(diagnostic)
print("datafusion filtered rows:")
print(filtered)
assert python_cmp is True
assert duckdb_cmp is True
assert bool(diagnostic.iloc[0]["cmp"]) is True
assert len(filtered) == 0
if __name__ == "__main__":
main()
Expected behavior
y * -1 is -0.0, and -0.0 >= 0.0 should be true.
Therefore this filter should remove the row:
WHERE NOT (((y * -1) >= 0.0) IS TRUE)
Python and DuckDB agree with this expectation.
Actual behavior
DataFusion reports the comparison as false and keeps the row.
Observed output:
python comparison: True
duckdb comparison: True
datafusion diagnostic:
id m_0 cmp
0 1 -0.0 False
datafusion filtered rows:
id m_0
0 1 -0.0
To Reproduce
No response
Expected behavior
No response
Additional context
No response
Describe the bug
Summary
DataFusion
53.0.0evaluates-0.0 >= 0.0asfalse.Python and DuckDB evaluate the same comparison as
true. This also affects filters usingIS TRUE/IS NOT TRUE.Environment
datafusion==53.0.0pyarrow==24.0.0duckdb==1.5.33.12.3Reproduction
Expected behavior
y * -1is-0.0, and-0.0 >= 0.0should betrue.Therefore this filter should remove the row:
Python and DuckDB agree with this expectation.
Actual behavior
DataFusion reports the comparison as
falseand keeps the row.Observed output:
To Reproduce
No response
Expected behavior
No response
Additional context
No response