Skip to content

#4202 fix: [postgres] advertise scalar columns with native OIDs#4208

Merged
robfrank merged 7 commits into
mainfrom
fix/4202-postgres-scalar-type-fidelity
May 20, 2026
Merged

#4202 fix: [postgres] advertise scalar columns with native OIDs#4208
robfrank merged 7 commits into
mainfrom
fix/4202-postgres-scalar-type-fidelity

Conversation

@robfrank
Copy link
Copy Markdown
Collaborator

Summary

  • Extends PR #4200 fix: [postgres] advertise Long scalar columns as INT8 to fix id() IN $array round-trip #4201 which limited the wire-level type fidelity fix to LONG/INT8. PostgresNetworkExecutor.getColumns() now passes through all native scalar types with their proper Postgres OIDs instead of collapsing to VARCHAR.
  • Adds a new TIMESTAMP (OID 1114) PostgresType entry so LocalDateTime values (and DATETIME schema columns) round-trip as native timestamps, not date strings.
  • Fixes related serialization bugs surfaced by the audit: BOOL now uses canonical t/f, DATE is emitted as YYYY-MM-DD (was timestamp format), and the DATE size field is corrected to 4 bytes.

Why

Postgres clients (pgjdbc, psycopg) choose a deserializer based on the announced OID. Advertising VARCHAR for typed scalar columns causes values to round-trip as strings - the exact class of bug fixed for Long/id() in #4201 - which silently breaks numeric, boolean, and temporal parameter comparisons against client-side types.

Changes

postgresw/src/main/java/com/arcadedb/postgres/PostgresType.java

  • Added TIMESTAMP(1114, LocalDateTime.class, 8, ...) enum entry with text parser and binary (microseconds since 2000-01-01T00:00:00Z) deserialization
  • getTypeForValue(LocalDateTime) -> TIMESTAMP (was DATE)
  • getTypeFromArcade(DATETIME) -> TIMESTAMP (was DATE)
  • serializeAsText for Boolean emits canonical t/f
  • serializeAsText for Date emits YYYY-MM-DD (was timestamp format, wrong for DATE OID 1082)
  • DATE.size corrected from 8 to 4 (matches pg_type binary size)
  • New isNativeScalarType() helper

postgresw/src/main/java/com/arcadedb/postgres/PostgresNetworkExecutor.java

  • getColumns() condition changed from pgType.isArrayType() || pgType == LONG to pgType.isArrayType() || pgType.isNativeScalarType() - covers SMALLINT, INTEGER, LONG, REAL, DOUBLE, CHAR, BOOLEAN, DATE, TIMESTAMP

Test plan

  • PostgresWJdbcIT.scalarColumnTypeFidelity (new) - Cypher RETURN path asserts int4/float8/float4/bool column-type announcement and round-trip values
  • PostgresWJdbcIT.temporalColumnTypeFidelity (new) - schema-based SELECT asserts timestamp column type and Timestamp round-trip
  • PostgresTypeTest (160 unit tests, 4 updated to reflect new behavior)
  • PostgresWJdbcIT (31 tests pass, 1 pre-existing skip)
  • PostgresProtocolIT + TomcatConnectionPoolPostgresWJdbcIT (73 tests pass)
  • No regressions in existing date/timestamp round-trip tests (dateTimeSerializationFormat, queryVertices)

Closes #4202

🤖 Generated with Claude Code

@codacy-production
Copy link
Copy Markdown

codacy-production Bot commented May 11, 2026

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 38 complexity

Metric Results
Complexity 38

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

@claude
Copy link
Copy Markdown

claude Bot commented May 11, 2026

Code Review - PR #4208: Postgres scalar column OID fidelity

Overview

This PR correctly extends PR #4201's type-fidelity fix from just LONG to all native scalar types. The motivation is sound: advertising VARCHAR for typed columns causes silent round-trip failures in Postgres clients. The TIMESTAMP enum entry, isNativeScalarType() helper, and BOOL text canonicalization (t/f) are all good additions.


Bugs introduced by this PR

1. DATE binary deserialization - reads 8 bytes after size corrected to 4

DATE.size was fixed from 8 to 4 (correct), but deserializeBinary still calls buffer.getLong():

case DATE -> new Date(buffer.getLong()); // reads 8 bytes - wrong, binary DATE is int32

PostgreSQL binary DATE is int32 days since 2000-01-01, not milliseconds since Unix epoch. Now that DATE is in isNativeScalarType() clients may actually send binary DATE parameters, which will over-read the buffer and corrupt subsequent reads. The fix needs both the width and epoch conversion:

case DATE -> {
    final int daysSince2000 = buffer.getInt(); // 4 bytes
    yield new Date((POSTGRES_EPOCH_DAYS + daysSince2000) * 86_400_000L);
}
// with: private static final long POSTGRES_EPOCH_DAYS = 10957L;

2. DATE text parser - will throw NumberFormatException for client-sent dates

DATE(1082, Date.class, 4, value -> new Date(Long.parseLong(value))),

This enum lambda is the client-to-server parser. Before this PR, DATE was collapsed to VARCHAR, so clients never sent date-typed parameters. Now that isNativeScalarType() includes DATE, clients will send "2024-05-19" as a text parameter - and Long.parseLong("2024-05-19") throws NumberFormatException. The parser needs to handle ISO_LOCAL_DATE format.

3. BOOLEAN text parser - canonical "t" is not recognized as true

BOOLEAN(16, Boolean.class, 1, value -> value.equalsIgnoreCase("true")),

PostgreSQL clients send "t"/"f" for boolean text parameters, not "true"/"false". Since BOOLEAN was not in isNativeScalarType() before this PR, this was never exercised. Now it will silently parse client-sent "t" as false. Should be:

value -> value.equalsIgnoreCase("true") || value.equals("t")

Code quality

4. Magic epoch constant as a local variable

case TIMESTAMP -> {
    final long microseconds = buffer.getLong();
    final long POSTGRES_EPOCH_SECONDS = 946684800L; // defined inside switch case
    ...
}

POSTGRES_EPOCH_SECONDS should be a private static final long at the class level - both for reuse (the DATE fix above will need a sibling constant) and because local final constants in naming convention SCREAMING_SNAKE_CASE are unusual in Java.


Test coverage gaps

  • No binary round-trip test for DATE after the size correction - this is the case most likely to regress silently.
  • No test for BOOLEAN parameter binding from client side (only checks server-to-client direction with rs.getBoolean()).
  • The integration tests create vertex types (SCALAR_COL_TYPES, TEMPORAL_COL_TYPES) but have no explicit cleanup/teardown. This may leave state if an assertion fails mid-test. Check whether other tests in the class follow a teardown pattern.

Minor

  • The new test methods have multi-line Javadoc blocks. Per project convention (CLAUDE.md), comments should be one short line maximum - the doc content is better expressed by a descriptive method name.
  • TIMESTAMPTZ (OID 1184) is not handled - ZonedDateTime/OffsetDateTime values would still fall through to VARCHAR. Fine to leave for a follow-up, but worth tracking as an issue.

Summary

The direction is right and the fixes for serialization text format are correct. The three bugs above (DATE binary width, DATE text parser, BOOLEAN text parser) are new regressions introduced by adding these types to isNativeScalarType() without updating their corresponding parsers. These need to be fixed before merging.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the TIMESTAMP type and ensures that native scalar types, such as numeric, boolean, and temporal values, are advertised to PostgreSQL clients using their native OIDs instead of being treated as VARCHAR. This change prevents silent type mismatches during round-trips. Key updates include new serialization logic for booleans and dates, and the addition of integration tests for type fidelity. However, the DATE type implementation contains critical bugs: the text parser still expects a long instead of the new YYYY-MM-DD format, and the binary deserializer attempts to read 8 bytes for a field now defined as 4 bytes, which will cause runtime exceptions.

CHAR(18, Character.class, 1, value -> value.charAt(0)),
BOOLEAN(16, Boolean.class, 1, value -> value.equalsIgnoreCase("true")),
DATE(1082, Date.class, 8, value -> new Date(Long.parseLong(value))),
DATE(1082, Date.class, 4, value -> new Date(Long.parseLong(value))),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The textParser for DATE still uses Long.parseLong(value), which will fail now that serializeAsText (line 272) has been updated to produce YYYY-MM-DD strings. This parser must be updated to handle the new date string format. Additionally, note that the binary deserializer in deserializeBinary (line 509) still calls buffer.getLong(), which will cause a BufferUnderflowException now that the DATE size has been reduced to 4 bytes; it should be updated to use buffer.getInt() and handle the PostgreSQL epoch (2000-01-01).

Suggested change
DATE(1082, Date.class, 4, value -> new Date(Long.parseLong(value))),
DATE(1082, Date.class, 4, value -> java.util.Date.from(java.time.LocalDate.parse(value).atStartOfDay(java.time.ZoneOffset.UTC).toInstant())),

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 27873a4. The DATE textParser now calls parseDateText() which accepts ISO YYYY-MM-DD (Postgres's canonical text form) with an epoch-millis fallback for legacy callers. The binary path also now reads int32 days since 2000-01-01 and converts to Unix epoch via the new POSTGRES_EPOCH_DAYS class constant. Added round-trip tests serializeAsBinaryDateRoundTrip and deserializeTextDateIsoFormat.

case REAL -> buffer.getFloat();
case DOUBLE -> buffer.getDouble();
case DATE -> new Date(buffer.getLong());
case TIMESTAMP -> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

While adding TIMESTAMP support here, please note that the DATE case in the preceding line (line 509) is now incorrect because its size was changed to 4 bytes in this PR (line 56). Currently, it still calls buffer.getLong(), which will cause a BufferUnderflowException when a client sends a 4-byte binary date parameter. It also needs to be adjusted to use the PostgreSQL epoch (2000-01-01) instead of the Java epoch (1970-01-01).

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 27873a4. DATE binary deserializer now reads int32 days and converts from 2000-01-01 to a java.util.Date via POSTGRES_EPOCH_DAYS. Updated deserializeBinaryDate test to match. The class-level epoch constants (POSTGRES_EPOCH_SECONDS, POSTGRES_EPOCH_DAYS) replace the previous local final in the TIMESTAMP case.

robfrank added a commit that referenced this pull request May 11, 2026
…N text parser, binary result format

Review of PR #4208 surfaced three deserializer regressions introduced by adding DATE, TIMESTAMP, and
BOOLEAN to isNativeScalarType() without updating their parsers, plus a stale magic constant. Also
adds binary result-format serialization so asyncpg (which requests binary results in Bind for every
native OID) works against the newly-advertised column types.

- DATE binary deserialization now reads int32 days since 2000-01-01 (was reading 8 bytes as a Unix
  millis long, which over-read after DATE.size was corrected to 4).
- DATE text parser accepts ISO "YYYY-MM-DD" (clients send this format now that DATE is advertised
  natively) with a legacy epoch-millis fallback.
- BOOLEAN text parser accepts t/f/1/0/y/yes/n/no/on/off alongside the verbose forms - asyncpg and
  pgjdbc both send the canonical single-char form.
- POSTGRES_EPOCH_SECONDS / POSTGRES_EPOCH_DAYS hoisted to class-level constants and reused.
- PostgresType.serializeAsBinary() added for INT2/4/8, FLOAT4/8, BOOL, CHAR, DATE, TIMESTAMP.
- writeRowDescription and writeDataRows now honor portal.resultFormats from the Bind message; falls
  back to text format when no portal context or when the type has no binary mapping.
- e2e-python tests updated to expect native int values for INTEGER/LONG columns (previously the
  tests asserted string forms because everything was advertised as VARCHAR).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@robfrank
Copy link
Copy Markdown
Collaborator Author

Addressed in 27873a4. Going through the items:

1. DATE binary deserialization (8 bytes after size corrected to 4) - Fixed. Reader now consumes int32 days and converts from 2000-01-01 via POSTGRES_EPOCH_DAYS.

2. DATE text parser (Long.parseLong("2024-05-19") would throw) - Fixed. parseDateText() accepts ISO YYYY-MM-DD with an epoch-millis fallback.

3. BOOLEAN text parser ("t" not recognized) - Fixed. parseBooleanText() accepts t/f/1/0/y/yes/n/no/on/off plus the verbose forms.

4. Magic epoch constant - Hoisted POSTGRES_EPOCH_SECONDS and POSTGRES_EPOCH_DAYS to class-level private static final constants.

Test coverage gaps - Added: serializeAsBinaryDateRoundTrip, serializeAsBinaryTimestampRoundTrip, serializeAsBinaryInteger, serializeAsBinaryDouble, serializeAsBinaryBoolean, serializeAsTextBoolean, deserializeTextDateIsoFormat, deserializeBinaryTimestamp, extended deserializeTextBoolean with canonical forms.

Bonus fix surfaced by python-e2e CI failure - asyncpg requests binary results in Bind for every native OID it has a binary codec for (INT2/4/8, FLOAT4/8, BOOL, DATE, TIMESTAMP, ...). The server was always ignoring portal.resultFormats and emitting text, so asyncpg's binary decoder underflowed on text bytes. Now writeRowDescription and writeDataRows honor the per-column format codes, and PostgresType.serializeAsBinary provides the binary encoders for all native scalar types. Falls back to text for types without a binary mapping (VARCHAR, JSON, arrays). Updated the two python-e2e tests that asserted the old VARCHAR string behavior for integer columns.

Minor items:

  • Javadoc style: kept multi-line javadocs to match the existing pattern in PostgresWJdbcIT (other methods like cypherWithArrayParameterInClause use multi-line blocks for the same reason - explaining the wire-level invariant under test). Happy to convert to single-line if you'd prefer the stricter reading of CLAUDE.md.
  • TIMESTAMPTZ (OID 1184) - tracked as a follow-up; this PR is already large enough.

Full unit test suite: 168/168 pass. Integration tests: 31 + 72 + 1 = 104/104 pass.

@claude
Copy link
Copy Markdown

claude Bot commented May 11, 2026

Review: PR #4208 - Postgres Scalar Column OID Fidelity

This PR is well-reasoned and extends the wire-level type fidelity fix from #4201 to all native scalar types. The follow-up fixes in commit 27873a4bd resolved the three critical bugs flagged earlier (DATE binary byte-width, DATE text parser, BOOLEAN text parser). Two issues remain before merge.


Issues

1. CHAR binary serialize/deserialize width mismatch (regression - blocks merge)

serializeAsBinary for CHAR writes 1 byte:

typeBuffer.putInt(1);
typeBuffer.putByte((byte) c);   // 1 byte

But deserializeBinary reads 2 bytes:

case CHAR -> buffer.getChar();  // buffer.getChar() = 2 bytes in Java (UTF-16 char)

This is a real regression introduced by this PR: before it, CHAR was advertised as VARCHAR so the binary path was never reached. Now that CHAR is in isNativeScalarType(), clients will request binary CHAR values, the deserializer will read 2 bytes for a 1-byte value, and subsequent field reads in the row will be offset/corrupt.

Fix: change the deserialize side to (char) buffer.get() (single byte, matching PostgreSQL's internal "char" OID 18 binary format).

A test for CHAR binary round-trip would have caught this - please add one.

2. Misleading comment on TIMESTAMP text format (cosmetic)

Line ~300 has the comment "YYYY-MM-DD HH:MM:SS.ffffff" - MM means months in Java's DateTimeFormatter, not minutes. Should be HH:mm:ss.SSSSSS. Minor, but it will confuse future readers.


Minor observations

  • TIMESTAMP text parser robustness: The parser value -> LocalDateTime.parse(value.replace(' ', 'T')) works for the server-emitted "yyyy-MM-dd HH:mm:ss.SSSSSS" format, but will throw for a client-sent "2024-01-01 12:30:45" (no fractional seconds). A defensive fallback similar to the parseDateText pattern would improve robustness, though in practice the server always emits microseconds.

  • Byte/BYTE OID inconsistency (pre-existing, not introduced here): getTypeForValue maps a Byte Java instance to INTEGER (OID 23), while getTypeFromArcade(BYTE) maps to SMALLINT (OID 21). Worth noting for a follow-up.


What's good

  • isNativeScalarType() helper is clean and self-documenting.
  • resolveResultFormat() correctly implements all three Postgres Bind protocol cases (empty list, single entry for all, per-column).
  • parseBooleanText() and parseDateText() are well-factored with good edge case handling.
  • POSTGRES_EPOCH_SECONDS/POSTGRES_EPOCH_DAYS hoisted to class-level constants - correct.
  • Pre-2000 timestamp binary handling (the nanos < 0 borrow logic) is correct.
  • Test coverage is comprehensive for the happy paths. The unit tests for binary numeric/boolean serialization, date/timestamp round-trips, and text boolean serialization are all good additions.

The CHAR binary width mismatch is the only blocker. Everything else is solid.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 11, 2026

Codecov Report

❌ Patch coverage is 63.88889% with 52 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.68%. Comparing base (d4c3c5a) to head (1e89810).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
.../main/java/com/arcadedb/postgres/PostgresType.java 63.24% 27 Missing and 16 partials ⚠️
...com/arcadedb/postgres/PostgresNetworkExecutor.java 66.66% 7 Missing and 2 partials ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##             main    #4208    +/-   ##
========================================
  Coverage   64.67%   64.68%            
========================================
  Files        1644     1644            
  Lines      127266   127408   +142     
  Branches    27277    27299    +22     
========================================
+ Hits        82312    82416   +104     
- Misses      33433    33449    +16     
- Partials    11521    11543    +22     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

robfrank and others added 2 commits May 19, 2026 20:40
…, FLOAT8, FLOAT4, BOOL, TIMESTAMP)

Extends PR #4201 which limited the wire-level type fidelity fix to LONG/INT8. PostgresNetworkExecutor.getColumns()
now passes through all native scalar types (Integer, Short, Byte, Float, Double, Boolean, Character, Date,
LocalDateTime) with their proper Postgres OIDs instead of collapsing to VARCHAR. Clients (pgjdbc, psycopg) use the
announced OID to choose a deserializer; VARCHAR for typed columns caused values to round-trip as strings and
silently broke typed parameter comparisons.

- Added TIMESTAMP (OID 1114) entry for LocalDateTime with text + binary deserialization
- getTypeForValue(LocalDateTime) and getTypeFromArcade(DATETIME) now return TIMESTAMP
- serializeAsText emits canonical "t"/"f" for Boolean and "YYYY-MM-DD" for Date (was timestamp-format)
- DATE size corrected from 8 to 4 to match pg_type binary size
- New isNativeScalarType() drives the getColumns() condition

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…N text parser, binary result format

Review of PR #4208 surfaced three deserializer regressions introduced by adding DATE, TIMESTAMP, and
BOOLEAN to isNativeScalarType() without updating their parsers, plus a stale magic constant. Also
adds binary result-format serialization so asyncpg (which requests binary results in Bind for every
native OID) works against the newly-advertised column types.

- DATE binary deserialization now reads int32 days since 2000-01-01 (was reading 8 bytes as a Unix
  millis long, which over-read after DATE.size was corrected to 4).
- DATE text parser accepts ISO "YYYY-MM-DD" (clients send this format now that DATE is advertised
  natively) with a legacy epoch-millis fallback.
- BOOLEAN text parser accepts t/f/1/0/y/yes/n/no/on/off alongside the verbose forms - asyncpg and
  pgjdbc both send the canonical single-char form.
- POSTGRES_EPOCH_SECONDS / POSTGRES_EPOCH_DAYS hoisted to class-level constants and reused.
- PostgresType.serializeAsBinary() added for INT2/4/8, FLOAT4/8, BOOL, CHAR, DATE, TIMESTAMP.
- writeRowDescription and writeDataRows now honor portal.resultFormats from the Bind message; falls
  back to text format when no portal context or when the type has no binary mapping.
- e2e-python tests updated to expect native int values for INTEGER/LONG columns (previously the
  tests asserted string forms because everything was advertised as VARCHAR).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@robfrank robfrank force-pushed the fix/4202-postgres-scalar-type-fidelity branch from 27873a4 to de5d2c6 Compare May 19, 2026 18:40
@claude
Copy link
Copy Markdown

claude Bot commented May 19, 2026

Code Review

Overview

This PR correctly extends the PostgreSQL wire-protocol type-fidelity fix from #4201. Instead of only mapping LONG/INT8, it now advertises all native scalar OIDs (numeric, boolean, temporal) so Postgres clients deserialize native values instead of strings. It also adds a TIMESTAMP type, fixes BOOL text serialization, fixes DATE text/binary format, corrects DATE.size, and wires in binary-format support via the Bind message's result-format codes.

The change is well-motivated and the implementation is mostly solid. A few items worth discussing before merge:


Potential Bugs

1. TIMESTAMP text parser does not handle timezone offsets or sub-second fractions in all formats

TIMESTAMP(1114, LocalDateTime.class, 8, value -> LocalDateTime.parse(value.replace(' ', 'T'))),

LocalDateTime.parse(String) uses ISO_LOCAL_DATE_TIME, which accepts 2024-06-15T12:30:45 and 2024-06-15T12:30:45.123456 but will throw DateTimeParseException for a value like 2024-06-15 12:30:45+00 (timezone suffix) that a real Postgres server or client could send. Consider wrapping in a try-catch with a broader formatter fallback, or normalising the string more carefully before parsing.

2. Negative-epoch pre-2000 timestamps in binary deserialization are correct, but not tested

The nanos-correction block in deserializeBinary(TIMESTAMP) handles negative microseconds:

int nanos = (int) (microseconds % 1_000_000L) * 1000;
if (nanos < 0) {
  secs -= 1;
  nanos += 1_000_000_000;
}

The arithmetic is correct, but there is no test for a pre-2000 timestamp (negative microseconds). A round-trip test for e.g. 1999-12-31T23:59:59.999999 would give confidence the branch is exercised.

3. parseBooleanText(null) silently returns false

When a column value is null, returning false instead of null could cause subtle data-integrity issues - a null boolean field would appear as false to the client. The other type parsers throw on null. Consider returning null here and letting the caller handle the null case via the existing -1 length prefix in writeString.

4. parseDateText fallback to Long.parseLong is surprising

The comment says "legacy callers" but there is no code path that would produce a bare epoch-millis string under the new wire format. If it is truly dead, remove it. If it must stay for safety, a log warning would help diagnose unexpected inputs rather than silently producing a wrong date.


Minor Issues

5. resolveResultFormat is package-private but could be private

The method is only used within PostgresNetworkExecutor. Making it private reduces surface area. Same for the no-arg overloads of writeRowDescription/writeDataRows - if the only callers are now the new overloads, they could be removed to eliminate dead code.

6. CHAR binary serialization truncates non-ASCII

typeBuffer.putByte((byte) c);

PostgreSQL OID 18 ("char") is indeed a single byte, so this is technically correct per the spec. However, a comment noting this limitation would prevent future confusion.

7. Column-index comment style in writeRowDescription

The comment block removed at line 117-118 was the only documentation of the format-code field. The new comment is good, but it would be even clearer to note which Postgres message spec paragraph this comes from (Frontend/Backend Protocol - RowDescription).


Test Coverage

Good additions:

  • Binary round-trip tests for DATE and TIMESTAMP
  • Text serialization tests for BOOLEAN
  • Integration tests scalarColumnTypeFidelity and temporalColumnTypeFidelity

Missing:

  • Pre-2000 (negative-epoch) TIMESTAMP binary round-trip (see point 2 above)
  • resolveResultFormat unit test covering: null list, empty list, single-entry list, per-column list, colIndex >= list.size()
  • parseBooleanText(null) behavior documented/tested

Performance

No concerns. The resolveResultFormat lookup is O(1) per column and the switch in serializeAsBinary is compiled to a table dispatch. Both are on the data-serialization hot path but are trivially cheap compared to I/O.


Summary

The core approach is correct and the fix is necessary for proper Postgres client compatibility. The main items to address before merge are the TIMESTAMP text-parser edge cases (point 1) and the parseBooleanText(null) silent-false behaviour (point 3). The pre-2000 timestamp test (point 2) is strongly recommended. The rest are minor polish.

🤖 Generated with Claude Code

… parser hardening

Address PR #4208 review feedback (May 11 + May 19) and python-e2e failure surfaced
in CI run 26117714838:

PostgresType.java
- serializeAsBinary defensive coercion for SMALLINT/INTEGER/LONG/REAL/DOUBLE/
  BOOLEAN/DATE/TIMESTAMP via toNumber/toBooleanValue/toLocalDateValue/
  toLocalDateTimeValue helpers. Schemaless documents can hold mixed runtime
  types per column; the wire announces one format code per column for the
  whole result, so the serializer must coerce rather than throw. Fixes the
  ClassCastException seen in e2e-python test_parameterized_insert when value
  was stored as String "300" but column was announced as INT8 from a prior
  row's Long.
- CHAR binary deserializer reads 1 byte (Postgres "char" OID 18 is single
  byte) to match serializer; previously read 2 bytes via buffer.getChar().
- TIMESTAMP text parser now accepts ISO_LOCAL_DATE_TIME plus
  ISO_OFFSET_DATE_TIME suffix (e.g. "2024-06-15 12:30:45+00:00").
- parseBooleanText throws on null and unknown values; accepts canonical
  t/f/true/false/0/1/y/n/yes/no/on/off only.
- parseDateText: removed dead Long.parseLong legacy fallback.
- TIMESTAMP text-format comment corrected from "HH:MM:SS.ffffff" (months)
  to "HH:mm:ss.SSSSSS" (minutes).

PostgresTypeTest.java (+21 tests, 204 unit tests pass)
- deserializeBinaryChar updated to 1-byte spec; serializeAsBinaryCharRoundTrip
  added.
- serializeAsBinary{Integer,Long,Double,Boolean,Date,Timestamp}Coerces*
  cover mixed-runtime-type round-trips.
- serializeAsBinaryTimestampPre2000RoundTrip exercises the nanos<0 borrow
  branch in deserializeBinary.
- parseBooleanText{NullThrows,AcceptsCanonicalForms,RejectsUnknown} +
  parseTimestampTextAcceptsOffset + parseDateTextRejectsNonIso.

e2e-python
- # nosec B101 on the two delta-modified asserts to clear Codacy churn
  (Bandit flags assert in compiled bytecode; pytest convention).
@claude
Copy link
Copy Markdown

claude Bot commented May 19, 2026

Code Review

Overview

This PR extends the PostgreSQL wire-level type fidelity fix from #4201 (which only covered LONG/INT8) to all native scalar types. It adds TIMESTAMP (OID 1114) for LocalDateTime, fixes BOOL serialization to canonical t/f, corrects DATE to emit YYYY-MM-DD (not a timestamp string), fixes DATE binary size from 8 to 4 bytes, and wires portal.resultFormats through to writeRowDescription/writeDataRows so binary-format Bind requests are honoured.

The change is well-motivated: Postgres clients (pgjdbc, psycopg) select a deserializer from the announced OID, so advertising VARCHAR for typed scalar columns silently breaks numeric and boolean comparisons.


Positive Aspects

  • Excellent test coverage: 160+ unit tests, pre-2000 timestamp edge case covered, round-trip tests for every new binary path, coercion tests for schemaless documents.
  • resolveResultFormat correctly mirrors the three-case Postgres Bind protocol rule (empty = text, single entry = all columns, otherwise per-column).
  • CHAR binary deserialization is correctly changed from buffer.getChar() (2-byte UTF-16) to (char) buffer.get() (1-byte), which was a genuine protocol bug.
  • Pre-2000 timestamp negative-microsecond borrow is correctly handled and explicitly tested.
  • The default -> serializeAsText(...) fallback in serializeAsBinary is a sound degradation strategy for arrays and strings.

Issues

1. Timezone-sensitive assertion in temporalColumnTypeFidelity

assertThat(ts.toString()).startsWith("2024-06-15 12:30:45");

Timestamp.toString() formats in the JVM's local timezone, not UTC. On a machine with a non-UTC offset this assertion will fail even though the round-trip is correct. Prefer:

assertThat(ts.toLocalDateTime()).isEqualTo(LocalDateTime.of(2024, 6, 15, 12, 30, 45));

2. Missing IF NOT EXISTS in scalarColumnTypeFidelity

st.execute("CREATE VERTEX TYPE SCALAR_COL_TYPES");

If the type already exists (e.g., from a previous failed run that left state behind), this throws and the test fails with a confusing error. Add IF NOT EXISTS:

st.execute("CREATE VERTEX TYPE IF NOT EXISTS SCALAR_COL_TYPES");

(The CREATE PROPERTY lines already use IF NOT EXISTS, so this is just the type creation line that's missing it.)

3. Breaking change in DATE binary deserialization

The old deserializeBinary for DATE read 8 bytes (buffer.getLong()), treating the value as epoch milliseconds. The new code reads 4 bytes (buffer.getInt()), treating it as days since 2000-01-01. This is the correct Postgres protocol format, but any client that previously sent binary DATE parameters using the (broken) 8-byte path will now be mis-decoded. Worth a sentence in the PR description or a migration note, since this is a wire-level compatibility break even though the old behavior was wrong.

4. Arrays sent as text when binary format is requested

default -> serializeAsText(pgType, typeBuffer, value);

When a client sends a Bind message requesting binary result formats (format code = 1) for a column that resolves to an array type, the server silently downgrades to text format. This is technically a protocol violation - the announced format code in RowDescription says binary (from resolveResultFormat in writeRowDescription), but the data row is text. A client may misparse the result. Either:

  • advertise text format (0) in RowDescription for array columns regardless of what the client requested, or
  • log a warning so this is detectable.

5. Nano precision loss in TIMESTAMP serialization

final long secsFromPgEpoch = ldt.toEpochSecond(ZoneOffset.UTC) - POSTGRES_EPOCH_SECONDS;
typeBuffer.putLong(secsFromPgEpoch * 1_000_000L + ldt.getNano() / 1000L);

ldt.getNano() / 1000L truncates sub-microsecond nanoseconds (i.e., drops the last 3 digits). This is correct because PostgreSQL's timestamp type has microsecond resolution, not nanosecond. However, the round-trip test uses 123_456_000 nanos (clean microseconds), so the truncation isn't exercised. Consider adding a test with 123_456_789 nanos to document that sub-microsecond precision is intentionally lost on the wire.


Minor Notes

  • POSTGRES_EPOCH_DAYS = 10957L - extra alignment spaces are inconsistent with the surrounding constant declarations.
  • parseBooleanText throws PostgresProtocolException for null input, but the caller (deserializeBinary) already guards against null via the value == null check at the top of serializeAsBinary. For the text path in deserializeText, an empty byte array decodes to "" which falls to the default throw - this is tested and correct, but the comment // empty string after charset decode goes to switch in the test is helpful.
  • The self-passing convention columnType.serializeAsBinary(columnType, ...) is pre-existing but odd. No change needed here.

Summary

This is a solid, protocol-compliant fix with good coverage. The four items above are worth addressing before merge, with items 1, 2, and 4 being the most important to fix.

@robfrank
Copy link
Copy Markdown
Collaborator Author

Round 3 review-iteration in fb5d16170 addresses the May 19 review plus a python-e2e regression surfaced in CI run 26117714838.

Python-e2e test_parameterized_insert (was failing with String cannot be cast to Number)

Root cause: getColumns() derives each column's PostgresType from the first row's runtime value type, but schemaless DOCUMENT TYPE rows can hold mixed types per column. The wire protocol fixes one format code per column for the whole result, so serializeAsBinary cannot fall back to text per row - it must coerce. serializeAsBinary now goes through toNumber / toBooleanValue / toLocalDateValue / toLocalDateTimeValue helpers that accept the announced type's native Java class, a Number/Boolean/Character for numerics, a Date/LocalDate/LocalDateTime for temporals, or fall back to parsing value.toString() via the existing text parsers. Regression tests: serializeAsBinary{Integer,Long,Double}CoercesNumericString, serializeAsBinaryBooleanCoercesString, serializeAsBinaryBooleanCoercesNumber, serializeAsBinary{Date,Timestamp}Coerces*.

CHAR binary width mismatch (claude[bot] May 11 review)

serializeAsBinary writes 1 byte (putByte((byte) c)) but deserializeBinary was reading buffer.getChar() = 2 bytes. Fixed to (char) buffer.get() per Postgres "char" OID 18 spec. deserializeBinaryChar updated to a 1-byte payload; new serializeAsBinaryCharRoundTrip covers the write/read pair.

TIMESTAMP text parser (May 19 review #1)

Promoted the inline lambda to parseTimestampText helper that first attempts ISO_LOCAL_DATE_TIME (handles 2024-06-15 12:30:45 and 2024-06-15T12:30:45.123456) and falls back to ISO_OFFSET_DATE_TIME for offset-suffixed values like 2024-06-15 12:30:45+00:00. New parseTimestampTextAcceptsOffset test.

parseBooleanText(null) silent-false (May 19 review #3)

Now throws PostgresProtocolException on null and on unrecognized strings, consistent with parseDateText. Accepts only canonical t/f/true/false/0/1/y/n/yes/no/on/off. New parseBooleanText{NullThrows,AcceptsCanonicalForms,RejectsUnknown} tests.

parseDateText epoch-millis fallback (May 19 review #4)

Removed. There are no live callers; per CLAUDE.md "no speculative code." New parseDateTextRejectsNonIso asserts the previously-allowed "1716138311000" form now throws DateTimeParseException.

Pre-2000 TIMESTAMP binary round-trip (May 19 review #2)

serializeAsBinaryTimestampPre2000RoundTrip covers 1999-12-31T23:59:59.999999, exercising the nanos < 0 borrow branch in deserializeBinary.

Comment fix (claude[bot] May 11 review)

"YYYY-MM-DD HH:MM:SS.ffffff" (MM = months) -> "yyyy-MM-dd HH:mm:ss.SSSSSS" (mm = minutes).

Visibility / overloads (May 19 review #5)

resolveResultFormat was already private static - no change needed. The no-arg writeRowDescription(columns) and writeDataRows(resultSet, columns) overloads are called from the simple-query path (no Bind message, hence no resultFormats), so they're kept as legitimate convenience wrappers, not dead code.

Codacy 2 high-severity findings

Both are Bandit B101 (assert in test code) on the two delta-modified asserts in e2e-python/tests/test_*.py. The files already contain dozens of asserts because that's the pytest convention; the bot flagged the deltas as "new" against the prior line content. Added # nosec B101 on the two changed lines.

Test totals: 204 unit tests pass (was 183), +21 new/updated. Postgresw IT: 104 pass, 1 pre-existing skip.

CI status on fb5d16170: python-e2e, Codacy, integration-tests, slow-unit-tests, all CodeQL analyses green. The single unit-tests failure is QueryEngineManagerPoolTest.submittedTasksRunOnPoolThread - a pool-thread scheduling race (1L not equal to 1L) unrelated to this PR and different from the prior run's ScriptExecutionTest.commitRetryMultiThreadsSQLIncrementRepeatableRead flake.

Tracked as follow-up (out of scope, deliberately)

  • TIMESTAMPTZ (OID 1184) for ZonedDateTime / OffsetDateTime.
  • Pre-existing Byte/BYTE OID inconsistency between getTypeForValue (-> INTEGER) and getTypeFromArcade(BYTE) (-> SMALLINT).

…t fragility

claude[bot] PR #4208 review feedback (May 19):

PostgresType / PostgresNetworkExecutor (item 4 - protocol violation)
- Add PostgresType.hasBinaryEncoding(): false for ARRAY_* (no binary encoder
  implemented), true for everything else including VARCHAR/TEXT/BPCHAR/JSON
  where text and binary wire bytes are identical.
- Add PostgresNetworkExecutor.effectiveResultFormat() that forces text format
  (0) for columns whose type lacks a binary encoder, regardless of what the
  client requested in Bind.
- writeRowDescription and writeDataRows now go through effectiveResultFormat
  so the announced format code in RowDescription always agrees with the bytes
  written in DataRow. Previously, a client requesting binary on an array
  column got "binary" announced but text bytes - a protocol violation that
  would mis-decode on the client side.

PostgresTypeTest (items 4 + 5)
- hasBinaryEncodingScalarTypesTrue, hasBinaryEncodingArrayTypesFalse cover
  the new helper.
- serializeAsBinaryTimestampTruncatesSubMicrosecondNanos documents that
  Postgres timestamp is microsecond-resolution by design - input 123_456_789
  nanos round-trips as 123_456_000.

PostgresWJdbcIT (items 1 + 2)
- temporalColumnTypeFidelity asserts ts.toLocalDateTime() instead of
  ts.toString().startsWith(...) - Timestamp.toString() formats in the JVM
  default timezone and would fail on non-UTC hosts.
- scalarColumnTypeFidelity and temporalColumnTypeFidelity now use the
  CREATE VERTEX TYPE <name> IF NOT EXISTS suffix syntax (ArcadeDB's
  grammar puts IF NOT EXISTS after the type name, not before).

PostgresType (minor alignment)
- Removed extra spaces around the = on POSTGRES_EPOCH_DAYS to match the
  surrounding constant declarations.

Item 3 (DATE binary wire-format change) intentionally not changed: the old
8-byte epoch-millis path was a protocol-spec violation. Any compliant client
followed Postgres OID 1082 = int32 days since 2000-01-01, which is what the
new code does. There is no compatibility shim to write.

Tests: 207 unit tests pass (+3), 104 IT pass (1 pre-existing skip).
@robfrank
Copy link
Copy Markdown
Collaborator Author

Round-4 fixes in 650d9ff67 address the May 19 review.

Item 4 (array + binary format mismatch — protocol violation, real bug)

Verified against the code. writeRowDescription was emitting whatever format code the client requested via resolveResultFormat for every column, including arrays, but serializeAsBinary had no array encoder and fell back to text. RowDescription advertised binary, DataRow shipped text bytes - clients would misparse.

Fix: added PostgresType.hasBinaryEncoding() (returns false for ARRAY_*, true for the rest including VARCHAR/TEXT/BPCHAR/JSON where text and binary bytes are identical), and PostgresNetworkExecutor.effectiveResultFormat() that forces format 0 when hasBinaryEncoding() is false. Both writeRowDescription and writeDataRows route through the helper so the announced format and the written bytes always agree, regardless of what the client asked for in Bind.

Tests hasBinaryEncodingScalarTypesTrue and hasBinaryEncodingArrayTypesFalse cover the helper; existing array IT (floatArrayPropertyRoundTrip, dateTimeArrayPropertyRoundTrip) continues to pass with the new path.

Item 5 (sub-microsecond truncation documentation)

Added serializeAsBinaryTimestampTruncatesSubMicrosecondNanos - LocalDateTime with 123_456_789 nanos round-trips as 123_456_000, documenting that Postgres timestamp is microsecond-resolution by design.

Item 1 (Timestamp.toString timezone-sensitive assertion)

Switched temporalColumnTypeFidelity from ts.toString().startsWith("2024-06-15 12:30:45") to ts.toLocalDateTime().isEqualTo(LocalDateTime.of(2024, 6, 15, 12, 30, 45)). The toString path uses the JVM default timezone and would fail on non-UTC hosts even when the round-trip is correct.

Item 2 (CREATE VERTEX TYPE IF NOT EXISTS)

Added IF NOT EXISTS to both new fidelity tests. Worth noting that ArcadeDB SQL puts the clause AFTER the type name (CREATE VERTEX TYPE <name> IF NOT EXISTS), not before - the prefix form is a parse error. Fixed accordingly.

Item 3 (DATE binary wire-format change) — pushing back

Calling this a "wire-level compatibility break worth a migration note" overstates it. The old code read buffer.getLong() for a 4-byte DATE.size field, which is structurally wrong - any compliant Postgres client follows OID 1082 = int32 days since 2000-01-01, and there's no way to "send a binary DATE the old way" because there was no path that could ever have produced a buffer the old reader would correctly decode. The fix is correctness; there is no compat surface to preserve.

Minor alignment

Removed the extra spaces around POSTGRES_EPOCH_DAYS = 10957L.

Self-passing convention columnType.serializeAsBinary(columnType, ...)

Acknowledged - pre-existing pattern, intentionally kept as-is.

Results

207 unit tests pass (+3 over previous round), 104 IT pass (1 pre-existing skip).

@claude
Copy link
Copy Markdown

claude Bot commented May 20, 2026

Code Review

Overview

This PR extends the PostgreSQL wire-protocol type-fidelity work from #4201, advertising all native scalar types (numeric, boolean, temporal) with proper Postgres OIDs instead of collapsing to VARCHAR. It also adds binary response support keyed off Bind-message format codes, and fixes pre-existing bugs in DATE/BOOL serialization. The approach is correct and the fix is a real protocol compliance improvement - clients that depend on OID-based deserializer selection will silently break without this.


Correctness

Date/Timestamp epoch math - correct. POSTGRES_EPOCH_DAYS = 10957L matches LocalDate.of(2000,1,1).toEpochDay(). The pre-2000 borrow (nanos < 0 branch in deserializeBinary) is well-exercised by serializeAsBinaryTimestampPre2000RoundTrip.

CHAR binary fix - fixing from buffer.getChar() (2 bytes) to (char) buffer.get() (1 byte) is correct for Postgres OID 18 which is a single-byte ASCII type.

DATE size - 8 -> 4 is the right fix; Postgres binary DATE is int32 days.

resolveResultFormat boundary - correctly handles the three Postgres Bind-message rules (empty = text, one entry = all columns, N entries = per-column), with a safe fallback to text for out-of-bounds.

Sub-microsecond truncation - the test serializeAsBinaryTimestampTruncatesSubMicrosecondNanos explicitly documents the precision loss. Good.


Issues

Style violations (CLAUDE.md: "don't use fully qualified names if possible, always import the class and just use the name")

PostgresType.java uses two fully-qualified names that should be imports:

// toNumber()
return new java.math.BigDecimal(value.toString());   // should be: import java.math.BigDecimal

// parseTimestampText()
return java.time.OffsetDateTime.parse(iso).toLocalDateTime();  // should be: import java.time.OffsetDateTime

Boolean text parsing - case sensitivity gap

parseBooleanText calls value.toLowerCase() before the switch, so "T", "YES", "ON" are accepted. This is fine (psql accepts them), but note the mapping is slightly broader than what Postgres itself emits (t/f). Not a bug, but worth documenting in the switch comment.

toLocalDateTimeValue - missing OffsetDateTime branch

If a caller ever passes an OffsetDateTime (e.g., from a Polyglot/scripting context), the method falls through to the catch-all throw. This is low risk but inconsistent with how parseTimestampText handles the offset path. Consider adding:

if (value instanceof OffsetDateTime odt)
    return odt.toLocalDateTime();

hasBinaryEncoding() contract vs. implementation for VARCHAR/JSON

hasBinaryEncoding() returns true for VARCHAR/JSON on the basis that "their text and binary wire formats are identical raw bytes." This is true for the current serializeAsBinary fallback, but it means a client that sends Bind(resultFormats=[1]) will receive RowDescription with format=1 and DataRow bytes that are technically text format. This is functionally harmless today but could confuse strict clients in the future. Consider either:

  • Making the javadoc explicit that "binary format" here means "raw UTF-8 bytes, same as text", or
  • Returning false for these types (force text; they're strings anyway).

Integration test cleanup

scalarColumnTypeFidelity creates types and data with IF NOT EXISTS but does not clean up in a finally block. If the test fails mid-way, leftover schema can affect subsequent runs. The existing tests in PostgresWJdbcIT use @AfterEach / tearDown for cleanup - the new tests should follow the same pattern.


Performance

  • toNumber() BigDecimal fallback allocates an object on the string-coercion path; this is acceptable as it only fires for schema-mismatched rows (schemaless documents whose runtime type differs from the announced column type).
  • effectiveResultFormat is called once per cell per row - two int comparisons, no allocation. Fine.
  • No concerns.

Test Coverage

Excellent. Highlights:

  • Round-trip tests for all new binary encoders
  • Pre-2000 timestamp (negative microsecond) coverage
  • Sub-microsecond truncation documented and tested
  • Integration tests via real JDBC connection covering both Cypher (value-based type detection) and SQL (schema-based type detection) paths
  • Python e2e tests updated to match the new native-type behavior

The only gap noted above is cleanup in the two new PostgresWJdbcIT tests.


Summary

Two required fixes before merge:

  1. Import java.math.BigDecimal and java.time.OffsetDateTime instead of using FQNs (CLAUDE.md style rule)
  2. Add teardown for the schema types created in scalarColumnTypeFidelity and temporalColumnTypeFidelity

Everything else is either correct or a minor design note. The core logic is sound and well-tested.

🤖 Generated with Claude Code

@claude
Copy link
Copy Markdown

claude Bot commented May 20, 2026

Code Review - PR #4208: Postgres native OID advertising for all scalar types

Overview

This PR correctly generalizes the type-fidelity fix from PR #4201 (LONG-only) to all native scalar types. The core problem is real: advertising VARCHAR for typed columns causes silent round-trip failures because Postgres clients (pgjdbc, psycopg) choose a deserializer based on the announced OID. All three fixes (BOOL canonical form, DATE format, TIMESTAMP OID) address distinct, genuine bugs.


What the previous Claude review flagged (now fixed)

The existing comment on this PR identified three bugs in an earlier iteration. This diff addresses all of them:

  • DATE binary deserialization now reads int32 days via buffer.getInt() and applies the 2000-01-01 epoch offset (was buffer.getLong() reading raw milliseconds)
  • DATE text parser now uses LocalDate.parse() via parseDateText() (was Long.parseLong())
  • BOOLEAN text parser now accepts canonical t/f via parseBooleanText() (was only equalsIgnoreCase("true"))

Remaining issues

1. Static import of parse reduces readability

import static java.time.OffsetDateTime.parse;
// ...
return parse(iso).toLocalDateTime();

The bare parse(iso) in parseTimestampText is ambiguous on first read - a reader needs to trace the import to know it refers to OffsetDateTime, not LocalDateTime. Since this is a single call site, drop the static import and write OffsetDateTime.parse(iso).toLocalDateTime() directly.

2. toLocalDateValue String branch creates unnecessary intermediate Date

if (value instanceof String s)
  return parseDateText(s).toInstant().atZone(ZoneOffset.UTC).toLocalDate();

parseDateText does Date.from(LocalDate.parse(s)...) - it converts LocalDate -> Date. Then this line converts Date -> Instant -> LocalDate. Two extra object allocations. Direct replacement:

if (value instanceof String s)
  return LocalDate.parse(s);

3. colIndex ordering assumption - should be documented

Both writeRowDescription and writeDataRows independently iterate the same columns map, incrementing a local colIndex to look up format codes from resultFormats. This is only correct if the map maintains insertion order (i.e., it is a LinkedHashMap). The type is not visible in the diff. If it ever changes to HashMap, format codes silently get assigned to the wrong columns in binary mode - a subtle correctness hazard. A short comment asserting the ordering invariant (or using a List<Map.Entry<...>> snapshot to make it structural) would prevent a future regression.


Correctness highlights

  • Timestamp binary encoding/decoding is correct. The pre-2000 borrow branch is exercised by serializeAsBinaryTimestampPre2000RoundTrip. The sub-microsecond truncation test documents intentional precision loss matching PostgreSQL's microsecond ceiling.
  • hasBinaryEncoding() returning true for VARCHAR/JSON is correct because those types' text and binary wire formats are identical raw bytes. The Javadoc on the method explains this well.
  • effectiveResultFormat short-circuits arrays to text before checking resolveResultFormat, so a Bind message requesting all-columns-binary still forces text for array columns - correct behavior.
  • CHAR binary round-trip: changed from buffer.getChar() (2 bytes, wrong) to (char) buffer.get() (1 byte, correct for OID 18).

Test coverage

Excellent. The PR adds:

  • 16 new unit tests in PostgresTypeTest covering binary serialization, round-trips, coercion, and text-parser edge cases
  • 2 integration tests in PostgresWJdbcIT covering the value-based (scalarColumnTypeFidelity) and schema-based (temporalColumnTypeFidelity) column-type announcement paths
  • Updated e2e Python tests to expect native int/bool types instead of strings

The coercion tests for schemaless documents (a string arriving in a column declared as INTEGER) are a nice defensive addition.


Summary

The three bugs from the previous review are fixed and the implementation is correct. Issues 1 and 2 are minor quality improvements; issue 3 is the only one that could cause a latent bug if the map type ever changes. Overall this is solid work and in good shape to merge after addressing the static import and the ordering invariant documentation.

Generated with Claude Code

@claude
Copy link
Copy Markdown

claude Bot commented May 20, 2026

Code Review

Overview

This PR extends the PostgreSQL wire-level type fidelity work from #4201, advertising all native scalar types (numeric, boolean, temporal) with their correct Postgres OIDs instead of collapsing to VARCHAR. It also adds proper TIMESTAMP support for LocalDateTime/DATETIME, fixes the BOOL canonical text format, corrects the DATE binary wire size (8 -> 4 bytes), and introduces binary format negotiation from Bind message result-format codes. The changes are well-scoped and the test coverage is comprehensive.


Correctness - All Good

  • TIMESTAMP binary encoding: microseconds since 2000-01-01T00:00:00Z is the correct Postgres TIMESTAMP wire format.
  • DATE binary fix: Changing from getLong() (8 bytes) to getInt() (4 bytes as days since 2000-01-01) is correct. The old code was silently misreading the wire.
  • BOOL text "t"/"f": Correct canonical Postgres text format. The old value.toString() on a Boolean would have produced "true"/"false" which some client parsers reject.
  • CHAR binary fix: Old buffer.getChar() read 2 bytes (Java UTF-16); new (char) buffer.get() reads 1 byte. OID 18 is indeed a single byte on the wire.
  • Pre-2000 timestamp borrow path: The nanos-borrowing logic in deserializeBinary TIMESTAMP is correct and the serializeAsBinaryTimestampPre2000RoundTrip test covers the exact branch.
  • POSTGRES_EPOCH constants: POSTGRES_EPOCH_DAYS=10957 and POSTGRES_EPOCH_SECONDS=946684800 are both correct.

Issues and Suggestions

1. CHAR (OID 18) in isNativeScalarType() may surprise pgjdbc clients

OID 18 is Postgres's internal "char" type (single-byte, not CHAR(n)). pgjdbc maps OID 18 to java.sql.Types.OTHER, not Types.CHAR, so JDBC callers doing rs.getString() on a CHAR column will get an unexpected type code in ResultSetMetaData.getColumnType(). If Character-typed ArcadeDB properties are rare in practice, this may not matter - but it is a silent behaviour change for any schema that has CHAR properties. Consider whether CHAR should stay excluded from isNativeScalarType() until a JDBC-level test exercises it, or add a note in the Javadoc.

2. Nanos multiplication silently truncates as int - worth an explicit cast

// PostgresNetworkExecutor line ~451 (deserializeBinary TIMESTAMP)
int nanos = (int) (microseconds % 1_000_000L) * 1000;

The cast to int happens before the * 1000 multiply, so the arithmetic is done in int. The value is always in range (-999_999 * 1000 = -999_999_000, which fits int), so there is no actual overflow. But the precedence is subtle enough that a future reader might misread it as (int)(x * 1000). A small clarification:

int nanos = (int) ((microseconds % 1_000_000L) * 1000L);
// or
long microRemainder = microseconds % 1_000_000L;
int nanos = (int) (microRemainder * 1000L);

3. getTypeForValue array branch now throws instead of falling through - document the intent

The refactored switch:

return switch (val) {
  case int[] ints -> PostgresType.ARRAY_INT;
  ...
  default -> throw new IllegalStateException("Unexpected value: " + val);
};

The old chain fell through to return PostgresType.VARCHAR for unknown array types (e.g., Long[]). The new code throws. This is the more correct behaviour (VARCHAR for a Long[] was silently wrong), but it is a runtime-visible change if any caller passes a Long[] or Integer[]. A brief comment on the default case or a test for Long[] -> exception would make the intent explicit.

4. Breaking change to DATE text deserialization - document it at the caller level

// was: value -> new Date(Long.parseLong(value))
// now: value -> parseDateText(value)   (ISO "YYYY-MM-DD" only)

The test parseDateTextRejectsNonIso explicitly documents that epoch-millis strings are no longer accepted. That is the right approach. The only risk is if there is any other path in ArcadeDB that sends a DATE value as an epoch-millis string over the Postgres wire (e.g., internal SQL emitter, a migration path). A quick grep for any Date -> text serialization going through the old Long.parseLong path would confirm there is no hidden caller.

5. hasBinaryEncoding() semantics are slightly misleading

hasBinaryEncoding() returns !isArrayType(), which means VARCHAR, TEXT, and JSON also return true. The Javadoc explains that their text and binary wire bytes are identical, so this is intentional. The name could mislead someone into thinking binary serialisation is distinct. Consider naming it supportsBinaryFormatCode() or keeping the current name but strengthening the Javadoc to say "returns true when serializeAsBinary is safe to call, including for types whose binary format is identical to text". Current comment is already reasonable - just flagging.


Test Coverage

Coverage is excellent:

  • Binary round-trips for every new scalar type
  • Pre-2000 timestamp borrow path
  • Sub-microsecond truncation
  • Coercion from string/numeric mismatches (schemaless document scenario)
  • Integration tests scalarColumnTypeFidelity and temporalColumnTypeFidelity cover the two column-type-detection paths (value-based and schema-based)
  • E2E Python tests updated to expect native types instead of strings

The removal of @Disabled void waitForConnectionFromExternal is good cleanup.


Minor Nits

  • The double-space alignment of the new constants (POSTGRES_TIMESTAMP_FORMAT, MAX_ARRAY_DIMENSIONS, etc.) was introduced in #4200 fix: [postgres] advertise Long scalar columns as INT8 to fix id() IN $array round-trip #4201 but is still inconsistent with the surrounding enum field formatting. Not worth blocking over.
  • resolveResultFormat and effectiveResultFormat are private static - correct scope.
  • The overloaded writeRowDescription(columns) / writeDataRows(columns) no-arg-formats delegates are fine for backward compatibility within the class.

Summary

The implementation is correct, the protocol constants are verified, and the test suite is thorough including edge cases (pre-2000 dates, negative nanos, coercion). The main action items before merging are:

  1. Decide whether CHAR (OID 18) should be in isNativeScalarType() given pgjdbc type mapping.
  2. Clarify the nanos int-cast in the TIMESTAMP binary deserializer.
  3. Optionally audit for any internal DATE-as-epoch-millis emitters to confirm the text parser change has no hidden callers.

🤖 Generated with Claude Code

@codacy-production
Copy link
Copy Markdown

codacy-production Bot commented May 20, 2026

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 44 complexity

Metric Results
Complexity 44

View in Codacy

🟢 Coverage 76.39% diff coverage · -7.47% coverage variation

Metric Results
Coverage variation -7.47% coverage variation
Diff coverage 76.39% diff coverage

View coverage diff in Codacy

Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (d4c3c5a) 127266 93862 73.75%
Head commit (1e89810) 159086 (+31820) 105451 (+11589) 66.29% (-7.47%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#4208) 144 110 76.39%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

@claude
Copy link
Copy Markdown

claude Bot commented May 20, 2026

Code Review - PR #4208

Correctness Verification

The implementation is correct on the current diff. Key fixes confirmed:

  • DATE binary: reads 4 bytes as int32 days since 2000-01-01 (POSTGRES_EPOCH_DAYS=10957 is correct). The old getLong() 8-byte read is gone.
  • BOOL text: parseBooleanText now accepts t/f/yes/no/on/off/1/0 - all canonical Postgres forms.
  • DATE text parser: uses LocalDate.parse(value) - no more Long.parseLong which would throw for ISO date strings.
  • CHAR binary: changed from buffer.getChar() (2 bytes, Java UTF-16) to (char) buffer.get() (1 byte, correct for OID 18).
  • TIMESTAMP binary: microseconds since 2000-01-01T00:00:00Z is the correct Postgres wire format. Pre-2000 borrow path is covered by serializeAsBinaryTimestampPre2000RoundTrip.

Issues Worth Addressing

1. CHAR (OID 18) in isNativeScalarType() - pgjdbc type-mapping mismatch

OID 18 is Postgres's internal single-byte "char" type, not SQL CHAR(n). pgjdbc maps OID 18 to java.sql.Types.OTHER (not Types.CHAR), so JDBC callers checking ResultSetMetaData.getColumnType() on a Character-typed property will get an unexpected type code. This is a silent behaviour change for any schema that has a CHAR property. Consider excluding CHAR from isNativeScalarType() until a JDBC-level test exercises the round-trip, or add a Javadoc note.

2. getTypeForValue array branch - default throws instead of falling back

default -> throw new IllegalStateException("Unexpected value: " + val);

Previously, unknown array types (e.g., Object[]) fell through to return PostgresType.VARCHAR. The new switch throws. Long[] and Integer[] are now handled by the new boxed-array cases, but Object[] (a heterogeneous schemaless list stored as Object[]) is not. Since ArcadeDB is schemaless this path could occur at runtime. A safe option: change the default to yield PostgresType.VARCHAR (restoring the old fallback) or yield PostgresType.ARRAY_TEXT.

3. Nanos cast - precedence is safe but subtle

int nanos = (int) (microseconds % 1_000_000L) * 1000;

The (int) cast applies before * 1000, so the multiply is int * int. The range is safe (-999_999 * 1000 = -999_999_000 fits in int), but a reader could misread the precedence. A clearer form:

int nanos = (int) ((microseconds % 1_000_000L) * 1000L);

4. colIndex ordering assumption in writeRowDescription / writeDataRows

Both methods iterate the same columns map with a local colIndex to look up format codes from resultFormats. This is only correct if the map maintains insertion order (LinkedHashMap). The map type is not visible in the diff. A short comment asserting the ordering invariant protects against a future refactoring that changes the map type to HashMap.


Minor

  • toLocalDateValue's String branch calls parseDateText(s) (which allocates a Date) then immediately converts back to LocalDate. Simplify to return LocalDate.parse(s);.
  • In parseTimestampText, the bare parse(iso) in the catch-fallback is ambiguous on first read - a reader needs to trace the static import to confirm it is OffsetDateTime.parse, not LocalDateTime.parse. Writing OffsetDateTime.parse(iso).toLocalDateTime() makes the intent explicit.

Test Coverage

Excellent: binary round-trips for every new scalar type, pre-2000 timestamps, sub-microsecond truncation, coercion from string/numeric mismatches (schemaless scenario), and two new integration tests covering both value-based and schema-based column-type-announcement paths. Removal of @Disabled void waitForConnectionFromExternal is good cleanup.


The two items most worth resolving before merge are #1 (CHAR OID mapping may surprise pgjdbc callers) and #2 (array default throw could break schemaless queries with Object[] values). Items #3 and #4 are cosmetic but worth a one-line fix given the protocol sensitivity of this code path.

Generated with Claude Code

@robfrank robfrank merged commit 22a87bf into main May 20, 2026
26 of 29 checks passed
@robfrank robfrank added this to the 26.6.1 milestone May 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[postgres] Extend wire-layer type fidelity to Integer, Double, Boolean, Date scalar columns

1 participant