feat: V3 DeleteFile fields end-to-end through Java + protocol + bridge (#27789) by apurva-meta · Pull Request #27789 · prestodb/presto · GitHub

apurva-meta · 2026-05-13T09:07:26Z

Summary:

Stacked on D104959593 (which added V3 typed fields to
velox::IcebergDeleteFile and the dataSequenceNumber parameter on
the production HiveIcebergSplit constructor). This diff plumbs all
of those fields through the Java side and the Prestissimo bridge so
the typed Velox structs are populated end-to-end.

Adds:

dataSequenceNumber, referencedDataFile, contentOffset,
contentSize on presto-iceberg/.../delete/DeleteFile.java
The same four fields on the C++ protocol struct
presto::protocol::iceberg::DeleteFile
All bridge wiring in
presto_cpp/main/connectors/IcebergPrestoToVeloxConnector.cpp:
- Forwards the four DeleteFile fields into the Velox
  IcebergDeleteFile struct (fields added in D104959593)
- Forwards protocol::iceberg::IcebergSplit::dataSequenceNumber
  into the Velox HiveIcebergSplit ctor (parameter added in
  D104959593)
- Populates infoColumns[$first_row_id] from
  icebergSplit->firstRowId alongside the existing
  $data_sequence_number entry, so the V3 _row_id system
  column finally returns non-NULL values in production
- Uses the constants from IcebergMetadataColumns.h instead of
  raw string literals (no behavior change beyond removing
  duplication)
- Uses folly::to<std::string> in place of std::to_string
  (Android-portable per Meta C++ convention)
- Routes the protocol pointer through a const-reference local
  after the VELOX_CHECK_NOT_NULL so clang-tidy
  bugprone-unchecked-pointer-access can prove null-safety

Why these are needed

Equality-delete conflict resolution (V2+ spec): without the typed
dataSequenceNumber plumbed all the way from manifest -> protocol
-> bridge -> Velox struct, shouldSkipBySequenceNumber() always
evaluates against 0 and the filter never fires -- equality deletes
can over-apply to rows from later snapshots that share the same
equality key.
V3 _row_id system column: without $first_row_id in
infoColumns, IcebergSplitReader::firstRowId_ stays nullopt
and _row_id returns NULL for every row instead of the per-row
file-position-based ID.
DV blob location: the existing lowerBounds[100] /
upperBounds[101] smuggling kept the C++ reader functional but
meant the Velox struct couldn't be reasoned about by anyone who
didn't know about the secret field IDs. Typed fields make DV
blob location a first-class concern.
referencedDataFile filter: not strictly required while the
coordinator filters per-split, but adds protection against
pre-V3-spec compliance bugs in the planner.

Pairing requirement

D104959593 MUST land first, or the Prestissimo bridge will fail to
compile when it tries to pass contentOffset / contentLength /
referencedDataFile (and the trailing dataSequenceNumber ctor
arg) into Velox structs that don't yet declare them.

Differential Revision: D104959645

sourcery-ai · 2026-05-13T09:07:34Z

Reviewer's Guide

Plumbs newly added Iceberg V3 delete file metadata (data sequence number, content offset/size, referenced data file) from the Java Iceberg DeleteFile representation through the Presto native protocol into Velox, uses these typed fields in the C++ DeletionVectorReader and IcebergSplitReader, and adds tests to cover the new code paths and migration away from the old bounds-map hack.

Sequence diagram for plumbed Iceberg V3 DeleteFile metadata

sequenceDiagram
  participant JavaDeleteFile as Java_DeleteFile
  participant ProtoDeleteFile as Proto_DeleteFile
  participant Connector as IcebergPrestoToVeloxConnector
  participant VeloxDeleteFile as Velox_IcebergDeleteFile
  participant SplitReader as IcebergSplitReader
  participant DVReader as DeletionVectorReader

  JavaDeleteFile->>JavaDeleteFile: fromIceberg(deleteFile)
  JavaDeleteFile->>ProtoDeleteFile: new DeleteFile(..., dataSequenceNumber, referencedDataFile, contentOffset, contentSize)

  ProtoDeleteFile->>ProtoDeleteFile: to_json(j, DeleteFile)
  ProtoDeleteFile->>ProtoDeleteFile: from_json(j, DeleteFile)

  ProtoDeleteFile->>Connector: toVeloxSplit(DeleteFile)
  Connector->>VeloxDeleteFile: IcebergDeleteFile(..., dataSequenceNumber, contentOffset, contentSize, referencedDataFile)
  Connector->>Connector: infoColumns[kDataSequenceNumberInfoColumn]
  Connector->>Connector: infoColumns[kFirstRowIdInfoColumn]

  VeloxDeleteFile->>SplitReader: prepareSplit()
  alt [referencedDataFile != split.dataFile]
    SplitReader-->>SplitReader: [skip deletion vector]
  end

  VeloxDeleteFile->>DVReader: [dvFile_.contentOffset, dvFile_.contentLength]
  alt [contentLength == 0]
    DVReader-->>DVReader: [fallback to bounds map]
  end

File-Level Changes

Change	Details	Files
Extend Presto Iceberg DeleteFile Java model and native protocol schema to carry V3 delete-file metadata fields.	Add dataSequenceNumber, referencedDataFile, contentOffset, and contentSize fields to the Java DeleteFile class, including JSON bindings and fromIceberg() wiring with null-safe defaults. Update the C++ DeleteFile protocol struct to include the same fields and extend to_json/from_json to serialize/deserialize them.	`presto-iceberg/src/main/java/com/facebook/presto/iceberg/delete/DeleteFile.java` `presto-native-execution/presto_cpp/presto_protocol/connector/iceberg/presto_protocol_iceberg.h` `presto-native-execution/presto_cpp/presto_protocol/connector/iceberg/presto_protocol_iceberg.cpp`
Wire new delete-file metadata through the Presto-to-Velox Iceberg bridge into Velox IcebergDeleteFile and infoColumns.	Pass dataSequenceNumber, contentOffset, contentSize, and referencedDataFile from the protocol DeleteFile into the Velox IcebergDeleteFile constructor. Populate the $first_row_id info column (using IcebergMetadataColumns constants) alongside $data_sequence_number when building HiveIcebergSplit infoColumns, instead of using string literals.	`presto-native-execution/presto_cpp/main/connectors/IcebergPrestoToVeloxConnector.cpp`
(Context from description, not fully shown in diff) Update DeletionVectorReader and IcebergSplitReader to prefer typed DV location fields and enforce referencedDataFile matching, and adjust tests accordingly.	Make DeletionVectorReader prefer dvFile_.contentOffset/contentLength over legacy bounds-map field IDs, falling back only when contentLength is zero. Add a referencedDataFile guard in IcebergSplitReader::prepareSplit() to skip mismatched deletion vectors, as a belt-and-suspenders safety check. Adjust DeletionVectorReader tests to construct IcebergDeleteFile using the new typed contentOffset/contentLength fields and ensure behavior parity with the old bounds-map path. Regenerate BUCK deps for the DeletionVectorReader test target after the rename/migration.	`velox/connectors/hive/iceberg/DeletionVectorReader.cpp` `velox/connectors/hive/iceberg/IcebergSplitReader.cpp` `velox/connectors/hive/iceberg/tests/DeletionVectorReaderTest.cpp` `velox/connectors/hive/iceberg/tests/BUCK`

Possibly linked issues

#native(Iceberg) V3 DVs: PR wires V3 DeleteFile metadata and DV reader behavior needed for correct Iceberg V3 deletion vector support.
#native-iceberg-v3-support: PR implements V3 DeleteFile fields and DV behavior in Prestissimo/Velox, contributing to Iceberg V3 native support

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey - I've left some high level feedback:

In IcebergPrestoToVeloxConnector::toVeloxSplit, consider also replacing the hardcoded "$path" key with a constant from IcebergMetadataColumns (or a similar central place) for consistency with the other metadata columns.
In DeleteFile (Java), representing referencedDataFile as an empty string when absent may conflate "unset" with a legitimately empty value; consider using Optional<String> or a nullable field in the protocol to preserve the distinction between unset and empty.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- In `IcebergPrestoToVeloxConnector::toVeloxSplit`, consider also replacing the hardcoded `"$path"` key with a constant from `IcebergMetadataColumns` (or a similar central place) for consistency with the other metadata columns.
- In `DeleteFile` (Java), representing `referencedDataFile` as an empty string when absent may conflate "unset" with a legitimately empty value; consider using `Optional<String>` or a nullable field in the protocol to preserve the distinction between unset and empty.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

…gh bridge + readers (prestodb#27789) Summary: Pull Request resolved: prestodb#27789 Stacked on the previous diff (which added the V3 fields to the Java, C++ protocol, and Velox structs). This diff wires the new fields through the Prestissimo->Velox bridge into the typed Velox struct, populates the missing `$first_row_id` info column, switches the C++ DeletionVectorReader from the bounds-map smuggling hack to the typed `contentOffset` / `contentLength` fields, and adds a belt-and-suspenders `referencedDataFile` filter to IcebergSplitReader. ## What this changes - `presto_cpp/main/connectors/IcebergPrestoToVeloxConnector.cpp`: - Forward `deleteFile.dataSequenceNumber`, `.contentOffset`, `.contentSize`, and `.referencedDataFile` into the Velox `IcebergDeleteFile` constructor. Without this, the typed fields stay at their defaults (0 / "") and the V3 reader paths silently degrade. - Populate `infoColumns[kFirstRowIdInfoColumn]` from `icebergSplit->firstRowId` alongside the existing `kDataSequenceNumberInfoColumn`. Without this entry, `IcebergSplitReader::firstRowId_` is always `nullopt` in production and the V3 `_row_id` system column always returns NULL. - Use the constants from `IcebergMetadataColumns.h` instead of new string literals (no behavior change beyond removing duplication). - `velox/connectors/hive/iceberg/DeletionVectorReader.cpp`: Prefer the typed `dvFile_.contentOffset` / `dvFile_.contentLength` fields over the legacy `kDvOffsetFieldId` / `kDvLengthFieldId` bounds-map smuggling. Falls back to the bounds map only when `contentLength == 0`, giving callers a one-release graceful migration window. - `velox/connectors/hive/iceberg/IcebergSplitReader.cpp`: In the `kDeletionVector` branch of `prepareSplit()`, skip the DV when `referencedDataFile` is set and does not equal the split's data file. Belt-and-suspenders -- the coordinator already filters per-split, so this guard rarely fires in practice but protects against split shipping bugs that would otherwise mask wrong rows. - `velox/connectors/hive/iceberg/tests/DeletionVectorReaderTest.cpp`: Migrate `makeDvDeleteFile` to populate `contentOffset` / `contentLength` directly on `IcebergDeleteFile` instead of via the bounds-map smuggling. Validates that the new typed-field path produces the same output as the old bounds-map path. - `velox/connectors/hive/iceberg/tests/BUCK`: autodeps regenerated the `DeletionVectorReader_cpp_lib` -> `DeletionVectorReader` rename consequence after the migration touched the test target. ## Why these are needed - Equality-delete conflict resolution (V2+ spec): without the typed `dataSequenceNumber` on the Velox `IcebergDeleteFile`, `shouldSkipBySequenceNumber()` always evaluates against 0 and the filter never fires -- equality deletes can over-apply to rows from *later* snapshots that share the same equality key. - V3 `_row_id` system column: without `$first_row_id` in `infoColumns`, `IcebergSplitReader::firstRowId_` stays `nullopt` and `_row_id` returns NULL for every row instead of the per-row file-position-based ID. - DV blob location: the existing `lowerBounds[100]` / `upperBounds[101]` smuggling kept the C++ reader functional but meant the Velox struct couldn't be reasoned about by anyone who didn't know about the secret field IDs. Typed fields make DV blob location a first-class concern. - `referencedDataFile` filter: not strictly required while the coordinator filters per-split, but adds protection against pre-V3-spec compliance bugs in the planner. Differential Revision: D104959645

linux-foundation-easycla · 2026-05-13T16:22:33Z

The committers listed above are authorized under a signed CLA.

✅ login: apurva-meta / name: Apurva Kumar (533ce46, f70f618)

…gh bridge + readers (prestodb#27789) Summary: Pull Request resolved: prestodb#27789 Stacked on the previous diff (which added the V3 fields to the Java, C++ protocol, and Velox structs). This diff wires the new fields through the Prestissimo->Velox bridge into the typed Velox struct, populates the missing `$first_row_id` info column, switches the C++ DeletionVectorReader from the bounds-map smuggling hack to the typed `contentOffset` / `contentLength` fields, and adds a belt-and-suspenders `referencedDataFile` filter to IcebergSplitReader. ## What this changes - `presto_cpp/main/connectors/IcebergPrestoToVeloxConnector.cpp`: - Forward `deleteFile.dataSequenceNumber`, `.contentOffset`, `.contentSize`, and `.referencedDataFile` into the Velox `IcebergDeleteFile` constructor. Without this, the typed fields stay at their defaults (0 / "") and the V3 reader paths silently degrade. - Populate `infoColumns[kFirstRowIdInfoColumn]` from `icebergSplit->firstRowId` alongside the existing `kDataSequenceNumberInfoColumn`. Without this entry, `IcebergSplitReader::firstRowId_` is always `nullopt` in production and the V3 `_row_id` system column always returns NULL. - Use the constants from `IcebergMetadataColumns.h` instead of new string literals (no behavior change beyond removing duplication). - `velox/connectors/hive/iceberg/DeletionVectorReader.cpp`: Prefer the typed `dvFile_.contentOffset` / `dvFile_.contentLength` fields over the legacy `kDvOffsetFieldId` / `kDvLengthFieldId` bounds-map smuggling. Falls back to the bounds map only when `contentLength == 0`, giving callers a one-release graceful migration window. - `velox/connectors/hive/iceberg/IcebergSplitReader.cpp`: In the `kDeletionVector` branch of `prepareSplit()`, skip the DV when `referencedDataFile` is set and does not equal the split's data file. Belt-and-suspenders -- the coordinator already filters per-split, so this guard rarely fires in practice but protects against split shipping bugs that would otherwise mask wrong rows. - `velox/connectors/hive/iceberg/tests/DeletionVectorReaderTest.cpp`: Migrate `makeDvDeleteFile` to populate `contentOffset` / `contentLength` directly on `IcebergDeleteFile` instead of via the bounds-map smuggling. Validates that the new typed-field path produces the same output as the old bounds-map path. - `velox/connectors/hive/iceberg/tests/BUCK`: autodeps regenerated the `DeletionVectorReader_cpp_lib` -> `DeletionVectorReader` rename consequence after the migration touched the test target. ## Why these are needed - Equality-delete conflict resolution (V2+ spec): without the typed `dataSequenceNumber` on the Velox `IcebergDeleteFile`, `shouldSkipBySequenceNumber()` always evaluates against 0 and the filter never fires -- equality deletes can over-apply to rows from *later* snapshots that share the same equality key. - V3 `_row_id` system column: without `$first_row_id` in `infoColumns`, `IcebergSplitReader::firstRowId_` stays `nullopt` and `_row_id` returns NULL for every row instead of the per-row file-position-based ID. - DV blob location: the existing `lowerBounds[100]` / `upperBounds[101]` smuggling kept the C++ reader functional but meant the Velox struct couldn't be reasoned about by anyone who didn't know about the secret field IDs. Typed fields make DV blob location a first-class concern. - `referencedDataFile` filter: not strictly required while the coordinator filters per-split, but adds protection against pre-V3-spec compliance bugs in the planner. Differential Revision: D104959645

…gh bridge + readers (prestodb#27789) Summary: Stacked on the previous diff (which added the V3 fields to the Java, C++ protocol, and Velox structs). This diff wires the new fields through the Prestissimo->Velox bridge into the typed Velox struct, populates the missing `$first_row_id` info column, switches the C++ DeletionVectorReader from the bounds-map smuggling hack to the typed `contentOffset` / `contentLength` fields, and adds a belt-and-suspenders `referencedDataFile` filter to IcebergSplitReader. ## What this changes - `presto_cpp/main/connectors/IcebergPrestoToVeloxConnector.cpp`: - Forward `deleteFile.dataSequenceNumber`, `.contentOffset`, `.contentSize`, and `.referencedDataFile` into the Velox `IcebergDeleteFile` constructor. Without this, the typed fields stay at their defaults (0 / "") and the V3 reader paths silently degrade. - Populate `infoColumns[kFirstRowIdInfoColumn]` from `icebergSplit->firstRowId` alongside the existing `kDataSequenceNumberInfoColumn`. Without this entry, `IcebergSplitReader::firstRowId_` is always `nullopt` in production and the V3 `_row_id` system column always returns NULL. - Use the constants from `IcebergMetadataColumns.h` instead of new string literals (no behavior change beyond removing duplication). - `velox/connectors/hive/iceberg/DeletionVectorReader.cpp`: Prefer the typed `dvFile_.contentOffset` / `dvFile_.contentLength` fields over the legacy `kDvOffsetFieldId` / `kDvLengthFieldId` bounds-map smuggling. Falls back to the bounds map only when `contentLength == 0`, giving callers a one-release graceful migration window. - `velox/connectors/hive/iceberg/IcebergSplitReader.cpp`: In the `kDeletionVector` branch of `prepareSplit()`, skip the DV when `referencedDataFile` is set and does not equal the split's data file. Belt-and-suspenders -- the coordinator already filters per-split, so this guard rarely fires in practice but protects against split shipping bugs that would otherwise mask wrong rows. - `velox/connectors/hive/iceberg/tests/DeletionVectorReaderTest.cpp`: Migrate `makeDvDeleteFile` to populate `contentOffset` / `contentLength` directly on `IcebergDeleteFile` instead of via the bounds-map smuggling. Validates that the new typed-field path produces the same output as the old bounds-map path. - `velox/connectors/hive/iceberg/tests/BUCK`: autodeps regenerated the `DeletionVectorReader_cpp_lib` -> `DeletionVectorReader` rename consequence after the migration touched the test target. ## Why these are needed - Equality-delete conflict resolution (V2+ spec): without the typed `dataSequenceNumber` on the Velox `IcebergDeleteFile`, `shouldSkipBySequenceNumber()` always evaluates against 0 and the filter never fires -- equality deletes can over-apply to rows from *later* snapshots that share the same equality key. - V3 `_row_id` system column: without `$first_row_id` in `infoColumns`, `IcebergSplitReader::firstRowId_` stays `nullopt` and `_row_id` returns NULL for every row instead of the per-row file-position-based ID. - DV blob location: the existing `lowerBounds[100]` / `upperBounds[101]` smuggling kept the C++ reader functional but meant the Velox struct couldn't be reasoned about by anyone who didn't know about the secret field IDs. Typed fields make DV blob location a first-class concern. - `referencedDataFile` filter: not strictly required while the coordinator filters per-split, but adds protection against pre-V3-spec compliance bugs in the planner. Differential Revision: D104959645

…e-iceberg][iceberg] (prestodb#27789) Summary: Stacked on the previous diff (which added the V3 fields to the Java, C++ protocol, and Velox structs). This diff wires the new fields through the Prestissimo->Velox bridge into the typed Velox struct, populates the missing `$first_row_id` info column, switches the C++ DeletionVectorReader from the bounds-map smuggling hack to the typed `contentOffset` / `contentLength` fields, and adds a belt-and-suspenders `referencedDataFile` filter to IcebergSplitReader. ## What this changes - `presto_cpp/main/connectors/IcebergPrestoToVeloxConnector.cpp`: - Forward `deleteFile.dataSequenceNumber`, `.contentOffset`, `.contentSize`, and `.referencedDataFile` into the Velox `IcebergDeleteFile` constructor. Without this, the typed fields stay at their defaults (0 / "") and the V3 reader paths silently degrade. - Populate `infoColumns[kFirstRowIdInfoColumn]` from `icebergSplit->firstRowId` alongside the existing `kDataSequenceNumberInfoColumn`. Without this entry, `IcebergSplitReader::firstRowId_` is always `nullopt` in production and the V3 `_row_id` system column always returns NULL. - Use the constants from `IcebergMetadataColumns.h` instead of new string literals (no behavior change beyond removing duplication). - `velox/connectors/hive/iceberg/DeletionVectorReader.cpp`: Prefer the typed `dvFile_.contentOffset` / `dvFile_.contentLength` fields over the legacy `kDvOffsetFieldId` / `kDvLengthFieldId` bounds-map smuggling. Falls back to the bounds map only when `contentLength == 0`, giving callers a one-release graceful migration window. - `velox/connectors/hive/iceberg/IcebergSplitReader.cpp`: In the `kDeletionVector` branch of `prepareSplit()`, skip the DV when `referencedDataFile` is set and does not equal the split's data file. Belt-and-suspenders -- the coordinator already filters per-split, so this guard rarely fires in practice but protects against split shipping bugs that would otherwise mask wrong rows. - `velox/connectors/hive/iceberg/tests/DeletionVectorReaderTest.cpp`: Migrate `makeDvDeleteFile` to populate `contentOffset` / `contentLength` directly on `IcebergDeleteFile` instead of via the bounds-map smuggling. Validates that the new typed-field path produces the same output as the old bounds-map path. - `velox/connectors/hive/iceberg/tests/BUCK`: autodeps regenerated the `DeletionVectorReader_cpp_lib` -> `DeletionVectorReader` rename consequence after the migration touched the test target. ## Why these are needed - Equality-delete conflict resolution (V2+ spec): without the typed `dataSequenceNumber` on the Velox `IcebergDeleteFile`, `shouldSkipBySequenceNumber()` always evaluates against 0 and the filter never fires -- equality deletes can over-apply to rows from *later* snapshots that share the same equality key. - V3 `_row_id` system column: without `$first_row_id` in `infoColumns`, `IcebergSplitReader::firstRowId_` stays `nullopt` and `_row_id` returns NULL for every row instead of the per-row file-position-based ID. - DV blob location: the existing `lowerBounds[100]` / `upperBounds[101]` smuggling kept the C++ reader functional but meant the Velox struct couldn't be reasoned about by anyone who didn't know about the secret field IDs. Typed fields make DV blob location a first-class concern. - `referencedDataFile` filter: not strictly required while the coordinator filters per-split, but adds protection against pre-V3-spec compliance bugs in the planner. === NO RELEASE NOTES === Differential Revision: D104959645

prestodb#27789) Summary: Stacked on D104959593 (which added V3 typed fields to `velox::IcebergDeleteFile` and the `dataSequenceNumber` parameter on the production `HiveIcebergSplit` constructor). This diff plumbs all of those fields through the Java side and the Prestissimo bridge so the typed Velox structs are populated end-to-end. Adds: - `dataSequenceNumber`, `referencedDataFile`, `contentOffset`, `contentSize` on `presto-iceberg/.../delete/DeleteFile.java` - The same four fields on the C++ protocol struct `presto::protocol::iceberg::DeleteFile` - All bridge wiring in `presto_cpp/main/connectors/IcebergPrestoToVeloxConnector.cpp`: - Forwards the four DeleteFile fields into the Velox `IcebergDeleteFile` struct (fields added in D104959593) - Forwards `protocol::iceberg::IcebergSplit::dataSequenceNumber` into the Velox `HiveIcebergSplit` ctor (parameter added in D104959593) - Populates `infoColumns[$first_row_id]` from `icebergSplit->firstRowId` alongside the existing `$data_sequence_number` entry, so the V3 `_row_id` system column finally returns non-NULL values in production - Uses the constants from `IcebergMetadataColumns.h` instead of raw string literals (no behavior change beyond removing duplication) - Uses `folly::to<std::string>` in place of `std::to_string` (Android-portable per Meta C++ convention) - Routes the protocol pointer through a const-reference local after the VELOX_CHECK_NOT_NULL so clang-tidy bugprone-unchecked-pointer-access can prove null-safety ## Why these are needed - Equality-delete conflict resolution (V2+ spec): without the typed `dataSequenceNumber` plumbed all the way from manifest -> protocol -> bridge -> Velox struct, `shouldSkipBySequenceNumber()` always evaluates against 0 and the filter never fires -- equality deletes can over-apply to rows from later snapshots that share the same equality key. - V3 `_row_id` system column: without `$first_row_id` in `infoColumns`, `IcebergSplitReader::firstRowId_` stays `nullopt` and `_row_id` returns NULL for every row instead of the per-row file-position-based ID. - DV blob location: the existing `lowerBounds[100]` / `upperBounds[101]` smuggling kept the C++ reader functional but meant the Velox struct couldn't be reasoned about by anyone who didn't know about the secret field IDs. Typed fields make DV blob location a first-class concern. - `referencedDataFile` filter: not strictly required while the coordinator filters per-split, but adds protection against pre-V3-spec compliance bugs in the planner. ## Pairing requirement D104959593 MUST land first, or the Prestissimo bridge will fail to compile when it tries to pass `contentOffset` / `contentLength` / `referencedDataFile` (and the trailing `dataSequenceNumber` ctor arg) into Velox structs that don't yet declare them. Differential Revision: D104959645

nmahadevuni · 2026-05-15T07:12:11Z

+  // reader VELOX_CHECK_GE-asserts non-negativity if the key is present, so
+  // emitting -1 would crash every V1/V2 read. Mirrors the Java guard in
+  // `IcebergPageSourceProvider.java`.
+  if (icebergSplit.firstRowId >= 0) {


Do we need this change in this PR? Its also present in https://github.com/prestodb/presto/pull/27743/changes where row lineage related changes are being made.

nmahadevuni · 2026-05-15T07:13:23Z

Thanks @apurva-meta for this change. You mentioned changes in D104959593 must precede this PR. Can you link the PR for D104959593 here?

apurva-meta · 2026-05-15T07:43:11Z

Thanks @apurva-meta for this change. You mentioned changes in D104959593 must precede this PR. Can you link the PR for D104959593 here?

It has already been merged in velox: facebookincubator/velox#17503

aditi-pandit · 2026-05-15T16:51:35Z

@apurva-meta : Can you check the test failures in prestocpp builds.

Copilot is suggesting
The failure is a compile break in presto-native-execution/presto_cpp/main/connectors/IcebergPrestoToVeloxConnector.cpp during the Build engine step of .github/workflows/prestocpp-linux-build-and-unit-test.yml at 36c8c14d2e3f4d45d381b33b9b8afcb6208cf4f7.

Failing code:

presto-native-execution/presto_cpp/main/connectors/IcebergPrestoToVeloxConnector.cpp#L203-L215

Error:

error: no matching function for call to '...IcebergDeleteFile::IcebergDeleteFile(...)'

Root cause:

The PR is constructing velox::connector::hive::iceberg::IcebergDeleteFile with an outdated argument list.
The new Velox submodule version changed the IcebergDeleteFile constructor signature, so this call no longer matches.

Solution

Update the construction of IcebergDeleteFile in toVeloxSplit() to match the current Velox API instead of passing the old positional parameter list directly.

Current code:

velox::connector::hive::iceberg::IcebergDeleteFile icebergDeleteFile(
    toVeloxFileContent(deleteFile.content),
    deleteFile.path,
    toVeloxFileFormat(deleteFile.format),
    deleteFile.recordCount,
    deleteFile.fileSizeInBytes,
    std::vector(deleteFile.equalityFieldIds),
    lowerBounds,
    upperBounds,
    deleteFile.dataSequenceNumber,
    deleteFile.contentOffset,
    deleteFile.contentSize,
    deleteFile.referencedDataFile);

Recommended fix

Adjust the constructor call to the updated Velox parameter order / field set. Most likely the break is from one of these:

a removed or reordered trailing argument
changed optional types for contentOffset, contentSize, or referencedDataFile
a move from positional construction to aggregate/struct-style initialization

A robust fix is to switch to named/aggregate initialization if IcebergDeleteFile is now an aggregate in Velox. For example:

velox::connector::hive::iceberg::IcebergDeleteFile icebergDeleteFile{
    .content = toVeloxFileContent(deleteFile.content),
    .path = deleteFile.path,
    .format = toVeloxFileFormat(deleteFile.format),
    .recordCount = deleteFile.recordCount,
    .fileSizeInBytes = deleteFile.fileSizeInBytes,
    .equalityFieldIds = std::vector<int32_t>(
        deleteFile.equalityFieldIds.begin(),
        deleteFile.equalityFieldIds.end()),
    .lowerBounds = std::move(lowerBounds),
    .upperBounds = std::move(upperBounds),
    .dataSequenceNumber = deleteFile.dataSequenceNumber,
    .contentOffset = deleteFile.contentOffset,
    .contentSize = deleteFile.contentSize,
    .referencedDataFile = deleteFile.referencedDataFile,
};

If Velox now expects fewer fields, remove the no-longer-supported trailing ones from the initializer/call. If it expects std::optional<...> wrappers, convert explicitly, e.g.:

.contentOffset = deleteFile.contentOffset ? std::optional(*deleteFile.contentOffset) : std::nullopt,
.contentSize = deleteFile.contentSize ? std::optional(*deleteFile.contentSize) : std::nullopt,
.referencedDataFile = deleteFile.referencedDataFile,

Minimal likely patch

If the API still uses a constructor but dropped the last field, try:

velox::connector::hive::iceberg::IcebergDeleteFile icebergDeleteFile(
    toVeloxFileContent(deleteFile.content),
    deleteFile.path,
    toVeloxFileFormat(deleteFile.format),
    deleteFile.recordCount,
    deleteFile.fileSizeInBytes,
    std::vector<int32_t>(
        deleteFile.equalityFieldIds.begin(),
        deleteFile.equalityFieldIds.end()),
    lowerBounds,
    upperBounds,
    deleteFile.dataSequenceNumber,
    deleteFile.contentOffset,
    deleteFile.contentSize);

Why this is the right place to fix

The workflow’s failing build step is:

.github/workflows/prestocpp-linux-build-and-unit-test.yml#L81-L101

The compile failure occurs specifically in:

presto-native-execution/presto_cpp/main/connectors/IcebergPrestoToVeloxConnector.cpp#L194-L217

Suggested final edit

Replace the direct positional constructor with whatever matches the current Velox IcebergDeleteFile declaration, preferably aggregate initialization if supported. This will make the code resilient to constructor churn in the updated submodule and should unblock the build.

aditi-pandit · 2026-05-15T16:56:56Z

Thanks — the overall direction makes sense, but I don’t think this is ready to approve yet. The main blocker is the Prestocpp build: this PR depends on the Velox-side API from PR 17503, and the new IcebergDeleteFile / HiveIcebergSplit construction in IcebergPrestoToVeloxConnector.cpp looks fragile against any dependency skew. Can you confirm the exact Velox version/submodule state expected by Presto CI and fix the current build failure first?

Two smaller nits:

referencedDataFile currently normalizes null to "" in Java — can you confirm losing null-vs-empty distinction is intentional?
Consider replacing the raw "$path" info column key with a constant as well for consistency.

apurva-meta requested review from a team, ZacBlanco and hantangwangd as code owners May 13, 2026 09:07

meta-codesync Bot added fb-exported meta-exported labels May 13, 2026

sourcery-ai Bot reviewed May 13, 2026

View reviewed changes

apurva-meta force-pushed the export-D104959645 branch from b4c8130 to 5d1a4d2 Compare May 13, 2026 16:22

meta-codesync Bot changed the title ~~feat: [velox][hive-iceberg][iceberg] Plumb V3 DeleteFile fields through bridge + readers~~ feat: [velox][hive-iceberg][iceberg] Plumb V3 DeleteFile fields through bridge + readers (#27789) May 13, 2026

apurva-meta force-pushed the export-D104959645 branch from 5d1a4d2 to 75b540f Compare May 13, 2026 16:28

apurva-meta force-pushed the export-D104959645 branch from 75b540f to 533ce46 Compare May 13, 2026 18:24

meta-codesync Bot changed the title ~~feat: [velox][hive-iceberg][iceberg] Plumb V3 DeleteFile fields through bridge + readers (#27789)~~ feat: Plumb V3 DeleteFile fields through bridge + readers [velox][hive-iceberg][iceberg] (#27789) May 13, 2026

apurva-meta force-pushed the export-D104959645 branch from 533ce46 to cd2c96c Compare May 13, 2026 18:30

meta-codesync Bot changed the title ~~feat: Plumb V3 DeleteFile fields through bridge + readers [velox][hive-iceberg][iceberg] (#27789)~~ feat: V3 DeleteFile fields end-to-end through Java + protocol + bridge May 13, 2026

apurva-meta force-pushed the export-D104959645 branch from cd2c96c to a164b19 Compare May 13, 2026 19:27

meta-codesync Bot changed the title ~~feat: V3 DeleteFile fields end-to-end through Java + protocol + bridge~~ feat: V3 DeleteFile fields end-to-end through Java + protocol + bridge (#27789) May 14, 2026

apurva-meta force-pushed the export-D104959645 branch from a164b19 to 36c8c14 Compare May 14, 2026 07:21

nmahadevuni reviewed May 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: V3 DeleteFile fields end-to-end through Java + protocol + bridge (#27789)#27789

feat: V3 DeleteFile fields end-to-end through Java + protocol + bridge (#27789)#27789
apurva-meta wants to merge 1 commit into
prestodb:masterfrom
apurva-meta:export-D104959645

apurva-meta commented May 13, 2026 •

edited by meta-codesync Bot

Loading

Uh oh!

sourcery-ai Bot commented May 13, 2026 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai Bot left a comment

Uh oh!

linux-foundation-easycla Bot commented May 13, 2026 •

edited

Loading

Uh oh!

nmahadevuni May 15, 2026

Uh oh!

nmahadevuni commented May 15, 2026

Uh oh!

apurva-meta commented May 15, 2026

Uh oh!

aditi-pandit commented May 15, 2026

Uh oh!

aditi-pandit commented May 15, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

apurva-meta commented May 13, 2026 • edited by meta-codesync Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why these are needed

Pairing requirement

Uh oh!

sourcery-ai Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for plumbed Iceberg V3 DeleteFile metadata

File-Level Changes

Possibly linked issues

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

linux-foundation-easycla Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nmahadevuni May 15, 2026

Choose a reason for hiding this comment

Uh oh!

nmahadevuni commented May 15, 2026

Uh oh!

apurva-meta commented May 15, 2026

Uh oh!

aditi-pandit commented May 15, 2026

Solution

Recommended fix

Minimal likely patch

Why this is the right place to fix

Suggested final edit

Uh oh!

aditi-pandit commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

apurva-meta commented May 13, 2026 •

edited by meta-codesync Bot

Loading

sourcery-ai Bot commented May 13, 2026 •

edited

Loading

linux-foundation-easycla Bot commented May 13, 2026 •

edited

Loading

aditi-pandit commented May 15, 2026 •

edited

Loading