feat: V3 DeleteFile fields end-to-end through Java + protocol + bridge (#27789)#27789
feat: V3 DeleteFile fields end-to-end through Java + protocol + bridge (#27789)#27789apurva-meta wants to merge 1 commit into
Conversation
Reviewer's GuidePlumbs newly added Iceberg V3 delete file metadata (data sequence number, content offset/size, referenced data file) from the Java Iceberg DeleteFile representation through the Presto native protocol into Velox, uses these typed fields in the C++ DeletionVectorReader and IcebergSplitReader, and adds tests to cover the new code paths and migration away from the old bounds-map hack. Sequence diagram for plumbed Iceberg V3 DeleteFile metadatasequenceDiagram
participant JavaDeleteFile as Java_DeleteFile
participant ProtoDeleteFile as Proto_DeleteFile
participant Connector as IcebergPrestoToVeloxConnector
participant VeloxDeleteFile as Velox_IcebergDeleteFile
participant SplitReader as IcebergSplitReader
participant DVReader as DeletionVectorReader
JavaDeleteFile->>JavaDeleteFile: fromIceberg(deleteFile)
JavaDeleteFile->>ProtoDeleteFile: new DeleteFile(..., dataSequenceNumber, referencedDataFile, contentOffset, contentSize)
ProtoDeleteFile->>ProtoDeleteFile: to_json(j, DeleteFile)
ProtoDeleteFile->>ProtoDeleteFile: from_json(j, DeleteFile)
ProtoDeleteFile->>Connector: toVeloxSplit(DeleteFile)
Connector->>VeloxDeleteFile: IcebergDeleteFile(..., dataSequenceNumber, contentOffset, contentSize, referencedDataFile)
Connector->>Connector: infoColumns[kDataSequenceNumberInfoColumn]
Connector->>Connector: infoColumns[kFirstRowIdInfoColumn]
VeloxDeleteFile->>SplitReader: prepareSplit()
alt [referencedDataFile != split.dataFile]
SplitReader-->>SplitReader: [skip deletion vector]
end
VeloxDeleteFile->>DVReader: [dvFile_.contentOffset, dvFile_.contentLength]
alt [contentLength == 0]
DVReader-->>DVReader: [fallback to bounds map]
end
File-Level Changes
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- In
IcebergPrestoToVeloxConnector::toVeloxSplit, consider also replacing the hardcoded"$path"key with a constant fromIcebergMetadataColumns(or a similar central place) for consistency with the other metadata columns. - In
DeleteFile(Java), representingreferencedDataFileas an empty string when absent may conflate "unset" with a legitimately empty value; consider usingOptional<String>or a nullable field in the protocol to preserve the distinction between unset and empty.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `IcebergPrestoToVeloxConnector::toVeloxSplit`, consider also replacing the hardcoded `"$path"` key with a constant from `IcebergMetadataColumns` (or a similar central place) for consistency with the other metadata columns.
- In `DeleteFile` (Java), representing `referencedDataFile` as an empty string when absent may conflate "unset" with a legitimately empty value; consider using `Optional<String>` or a nullable field in the protocol to preserve the distinction between unset and empty.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
…gh bridge + readers (prestodb#27789) Summary: Pull Request resolved: prestodb#27789 Stacked on the previous diff (which added the V3 fields to the Java, C++ protocol, and Velox structs). This diff wires the new fields through the Prestissimo->Velox bridge into the typed Velox struct, populates the missing `$first_row_id` info column, switches the C++ DeletionVectorReader from the bounds-map smuggling hack to the typed `contentOffset` / `contentLength` fields, and adds a belt-and-suspenders `referencedDataFile` filter to IcebergSplitReader. ## What this changes - `presto_cpp/main/connectors/IcebergPrestoToVeloxConnector.cpp`: - Forward `deleteFile.dataSequenceNumber`, `.contentOffset`, `.contentSize`, and `.referencedDataFile` into the Velox `IcebergDeleteFile` constructor. Without this, the typed fields stay at their defaults (0 / "") and the V3 reader paths silently degrade. - Populate `infoColumns[kFirstRowIdInfoColumn]` from `icebergSplit->firstRowId` alongside the existing `kDataSequenceNumberInfoColumn`. Without this entry, `IcebergSplitReader::firstRowId_` is always `nullopt` in production and the V3 `_row_id` system column always returns NULL. - Use the constants from `IcebergMetadataColumns.h` instead of new string literals (no behavior change beyond removing duplication). - `velox/connectors/hive/iceberg/DeletionVectorReader.cpp`: Prefer the typed `dvFile_.contentOffset` / `dvFile_.contentLength` fields over the legacy `kDvOffsetFieldId` / `kDvLengthFieldId` bounds-map smuggling. Falls back to the bounds map only when `contentLength == 0`, giving callers a one-release graceful migration window. - `velox/connectors/hive/iceberg/IcebergSplitReader.cpp`: In the `kDeletionVector` branch of `prepareSplit()`, skip the DV when `referencedDataFile` is set and does not equal the split's data file. Belt-and-suspenders -- the coordinator already filters per-split, so this guard rarely fires in practice but protects against split shipping bugs that would otherwise mask wrong rows. - `velox/connectors/hive/iceberg/tests/DeletionVectorReaderTest.cpp`: Migrate `makeDvDeleteFile` to populate `contentOffset` / `contentLength` directly on `IcebergDeleteFile` instead of via the bounds-map smuggling. Validates that the new typed-field path produces the same output as the old bounds-map path. - `velox/connectors/hive/iceberg/tests/BUCK`: autodeps regenerated the `DeletionVectorReader_cpp_lib` -> `DeletionVectorReader` rename consequence after the migration touched the test target. ## Why these are needed - Equality-delete conflict resolution (V2+ spec): without the typed `dataSequenceNumber` on the Velox `IcebergDeleteFile`, `shouldSkipBySequenceNumber()` always evaluates against 0 and the filter never fires -- equality deletes can over-apply to rows from *later* snapshots that share the same equality key. - V3 `_row_id` system column: without `$first_row_id` in `infoColumns`, `IcebergSplitReader::firstRowId_` stays `nullopt` and `_row_id` returns NULL for every row instead of the per-row file-position-based ID. - DV blob location: the existing `lowerBounds[100]` / `upperBounds[101]` smuggling kept the C++ reader functional but meant the Velox struct couldn't be reasoned about by anyone who didn't know about the secret field IDs. Typed fields make DV blob location a first-class concern. - `referencedDataFile` filter: not strictly required while the coordinator filters per-split, but adds protection against pre-V3-spec compliance bugs in the planner. Differential Revision: D104959645
b4c8130 to
5d1a4d2
Compare
…gh bridge + readers (prestodb#27789) Summary: Pull Request resolved: prestodb#27789 Stacked on the previous diff (which added the V3 fields to the Java, C++ protocol, and Velox structs). This diff wires the new fields through the Prestissimo->Velox bridge into the typed Velox struct, populates the missing `$first_row_id` info column, switches the C++ DeletionVectorReader from the bounds-map smuggling hack to the typed `contentOffset` / `contentLength` fields, and adds a belt-and-suspenders `referencedDataFile` filter to IcebergSplitReader. ## What this changes - `presto_cpp/main/connectors/IcebergPrestoToVeloxConnector.cpp`: - Forward `deleteFile.dataSequenceNumber`, `.contentOffset`, `.contentSize`, and `.referencedDataFile` into the Velox `IcebergDeleteFile` constructor. Without this, the typed fields stay at their defaults (0 / "") and the V3 reader paths silently degrade. - Populate `infoColumns[kFirstRowIdInfoColumn]` from `icebergSplit->firstRowId` alongside the existing `kDataSequenceNumberInfoColumn`. Without this entry, `IcebergSplitReader::firstRowId_` is always `nullopt` in production and the V3 `_row_id` system column always returns NULL. - Use the constants from `IcebergMetadataColumns.h` instead of new string literals (no behavior change beyond removing duplication). - `velox/connectors/hive/iceberg/DeletionVectorReader.cpp`: Prefer the typed `dvFile_.contentOffset` / `dvFile_.contentLength` fields over the legacy `kDvOffsetFieldId` / `kDvLengthFieldId` bounds-map smuggling. Falls back to the bounds map only when `contentLength == 0`, giving callers a one-release graceful migration window. - `velox/connectors/hive/iceberg/IcebergSplitReader.cpp`: In the `kDeletionVector` branch of `prepareSplit()`, skip the DV when `referencedDataFile` is set and does not equal the split's data file. Belt-and-suspenders -- the coordinator already filters per-split, so this guard rarely fires in practice but protects against split shipping bugs that would otherwise mask wrong rows. - `velox/connectors/hive/iceberg/tests/DeletionVectorReaderTest.cpp`: Migrate `makeDvDeleteFile` to populate `contentOffset` / `contentLength` directly on `IcebergDeleteFile` instead of via the bounds-map smuggling. Validates that the new typed-field path produces the same output as the old bounds-map path. - `velox/connectors/hive/iceberg/tests/BUCK`: autodeps regenerated the `DeletionVectorReader_cpp_lib` -> `DeletionVectorReader` rename consequence after the migration touched the test target. ## Why these are needed - Equality-delete conflict resolution (V2+ spec): without the typed `dataSequenceNumber` on the Velox `IcebergDeleteFile`, `shouldSkipBySequenceNumber()` always evaluates against 0 and the filter never fires -- equality deletes can over-apply to rows from *later* snapshots that share the same equality key. - V3 `_row_id` system column: without `$first_row_id` in `infoColumns`, `IcebergSplitReader::firstRowId_` stays `nullopt` and `_row_id` returns NULL for every row instead of the per-row file-position-based ID. - DV blob location: the existing `lowerBounds[100]` / `upperBounds[101]` smuggling kept the C++ reader functional but meant the Velox struct couldn't be reasoned about by anyone who didn't know about the secret field IDs. Typed fields make DV blob location a first-class concern. - `referencedDataFile` filter: not strictly required while the coordinator filters per-split, but adds protection against pre-V3-spec compliance bugs in the planner. Differential Revision: D104959645
5d1a4d2 to
75b540f
Compare
…gh bridge + readers (prestodb#27789) Summary: Stacked on the previous diff (which added the V3 fields to the Java, C++ protocol, and Velox structs). This diff wires the new fields through the Prestissimo->Velox bridge into the typed Velox struct, populates the missing `$first_row_id` info column, switches the C++ DeletionVectorReader from the bounds-map smuggling hack to the typed `contentOffset` / `contentLength` fields, and adds a belt-and-suspenders `referencedDataFile` filter to IcebergSplitReader. ## What this changes - `presto_cpp/main/connectors/IcebergPrestoToVeloxConnector.cpp`: - Forward `deleteFile.dataSequenceNumber`, `.contentOffset`, `.contentSize`, and `.referencedDataFile` into the Velox `IcebergDeleteFile` constructor. Without this, the typed fields stay at their defaults (0 / "") and the V3 reader paths silently degrade. - Populate `infoColumns[kFirstRowIdInfoColumn]` from `icebergSplit->firstRowId` alongside the existing `kDataSequenceNumberInfoColumn`. Without this entry, `IcebergSplitReader::firstRowId_` is always `nullopt` in production and the V3 `_row_id` system column always returns NULL. - Use the constants from `IcebergMetadataColumns.h` instead of new string literals (no behavior change beyond removing duplication). - `velox/connectors/hive/iceberg/DeletionVectorReader.cpp`: Prefer the typed `dvFile_.contentOffset` / `dvFile_.contentLength` fields over the legacy `kDvOffsetFieldId` / `kDvLengthFieldId` bounds-map smuggling. Falls back to the bounds map only when `contentLength == 0`, giving callers a one-release graceful migration window. - `velox/connectors/hive/iceberg/IcebergSplitReader.cpp`: In the `kDeletionVector` branch of `prepareSplit()`, skip the DV when `referencedDataFile` is set and does not equal the split's data file. Belt-and-suspenders -- the coordinator already filters per-split, so this guard rarely fires in practice but protects against split shipping bugs that would otherwise mask wrong rows. - `velox/connectors/hive/iceberg/tests/DeletionVectorReaderTest.cpp`: Migrate `makeDvDeleteFile` to populate `contentOffset` / `contentLength` directly on `IcebergDeleteFile` instead of via the bounds-map smuggling. Validates that the new typed-field path produces the same output as the old bounds-map path. - `velox/connectors/hive/iceberg/tests/BUCK`: autodeps regenerated the `DeletionVectorReader_cpp_lib` -> `DeletionVectorReader` rename consequence after the migration touched the test target. ## Why these are needed - Equality-delete conflict resolution (V2+ spec): without the typed `dataSequenceNumber` on the Velox `IcebergDeleteFile`, `shouldSkipBySequenceNumber()` always evaluates against 0 and the filter never fires -- equality deletes can over-apply to rows from *later* snapshots that share the same equality key. - V3 `_row_id` system column: without `$first_row_id` in `infoColumns`, `IcebergSplitReader::firstRowId_` stays `nullopt` and `_row_id` returns NULL for every row instead of the per-row file-position-based ID. - DV blob location: the existing `lowerBounds[100]` / `upperBounds[101]` smuggling kept the C++ reader functional but meant the Velox struct couldn't be reasoned about by anyone who didn't know about the secret field IDs. Typed fields make DV blob location a first-class concern. - `referencedDataFile` filter: not strictly required while the coordinator filters per-split, but adds protection against pre-V3-spec compliance bugs in the planner. Differential Revision: D104959645
75b540f to
533ce46
Compare
…e-iceberg][iceberg] (prestodb#27789) Summary: Stacked on the previous diff (which added the V3 fields to the Java, C++ protocol, and Velox structs). This diff wires the new fields through the Prestissimo->Velox bridge into the typed Velox struct, populates the missing `$first_row_id` info column, switches the C++ DeletionVectorReader from the bounds-map smuggling hack to the typed `contentOffset` / `contentLength` fields, and adds a belt-and-suspenders `referencedDataFile` filter to IcebergSplitReader. ## What this changes - `presto_cpp/main/connectors/IcebergPrestoToVeloxConnector.cpp`: - Forward `deleteFile.dataSequenceNumber`, `.contentOffset`, `.contentSize`, and `.referencedDataFile` into the Velox `IcebergDeleteFile` constructor. Without this, the typed fields stay at their defaults (0 / "") and the V3 reader paths silently degrade. - Populate `infoColumns[kFirstRowIdInfoColumn]` from `icebergSplit->firstRowId` alongside the existing `kDataSequenceNumberInfoColumn`. Without this entry, `IcebergSplitReader::firstRowId_` is always `nullopt` in production and the V3 `_row_id` system column always returns NULL. - Use the constants from `IcebergMetadataColumns.h` instead of new string literals (no behavior change beyond removing duplication). - `velox/connectors/hive/iceberg/DeletionVectorReader.cpp`: Prefer the typed `dvFile_.contentOffset` / `dvFile_.contentLength` fields over the legacy `kDvOffsetFieldId` / `kDvLengthFieldId` bounds-map smuggling. Falls back to the bounds map only when `contentLength == 0`, giving callers a one-release graceful migration window. - `velox/connectors/hive/iceberg/IcebergSplitReader.cpp`: In the `kDeletionVector` branch of `prepareSplit()`, skip the DV when `referencedDataFile` is set and does not equal the split's data file. Belt-and-suspenders -- the coordinator already filters per-split, so this guard rarely fires in practice but protects against split shipping bugs that would otherwise mask wrong rows. - `velox/connectors/hive/iceberg/tests/DeletionVectorReaderTest.cpp`: Migrate `makeDvDeleteFile` to populate `contentOffset` / `contentLength` directly on `IcebergDeleteFile` instead of via the bounds-map smuggling. Validates that the new typed-field path produces the same output as the old bounds-map path. - `velox/connectors/hive/iceberg/tests/BUCK`: autodeps regenerated the `DeletionVectorReader_cpp_lib` -> `DeletionVectorReader` rename consequence after the migration touched the test target. ## Why these are needed - Equality-delete conflict resolution (V2+ spec): without the typed `dataSequenceNumber` on the Velox `IcebergDeleteFile`, `shouldSkipBySequenceNumber()` always evaluates against 0 and the filter never fires -- equality deletes can over-apply to rows from *later* snapshots that share the same equality key. - V3 `_row_id` system column: without `$first_row_id` in `infoColumns`, `IcebergSplitReader::firstRowId_` stays `nullopt` and `_row_id` returns NULL for every row instead of the per-row file-position-based ID. - DV blob location: the existing `lowerBounds[100]` / `upperBounds[101]` smuggling kept the C++ reader functional but meant the Velox struct couldn't be reasoned about by anyone who didn't know about the secret field IDs. Typed fields make DV blob location a first-class concern. - `referencedDataFile` filter: not strictly required while the coordinator filters per-split, but adds protection against pre-V3-spec compliance bugs in the planner. === NO RELEASE NOTES === Differential Revision: D104959645
533ce46 to
cd2c96c
Compare
cd2c96c to
a164b19
Compare
prestodb#27789) Summary: Stacked on D104959593 (which added V3 typed fields to `velox::IcebergDeleteFile` and the `dataSequenceNumber` parameter on the production `HiveIcebergSplit` constructor). This diff plumbs all of those fields through the Java side and the Prestissimo bridge so the typed Velox structs are populated end-to-end. Adds: - `dataSequenceNumber`, `referencedDataFile`, `contentOffset`, `contentSize` on `presto-iceberg/.../delete/DeleteFile.java` - The same four fields on the C++ protocol struct `presto::protocol::iceberg::DeleteFile` - All bridge wiring in `presto_cpp/main/connectors/IcebergPrestoToVeloxConnector.cpp`: - Forwards the four DeleteFile fields into the Velox `IcebergDeleteFile` struct (fields added in D104959593) - Forwards `protocol::iceberg::IcebergSplit::dataSequenceNumber` into the Velox `HiveIcebergSplit` ctor (parameter added in D104959593) - Populates `infoColumns[$first_row_id]` from `icebergSplit->firstRowId` alongside the existing `$data_sequence_number` entry, so the V3 `_row_id` system column finally returns non-NULL values in production - Uses the constants from `IcebergMetadataColumns.h` instead of raw string literals (no behavior change beyond removing duplication) - Uses `folly::to<std::string>` in place of `std::to_string` (Android-portable per Meta C++ convention) - Routes the protocol pointer through a const-reference local after the VELOX_CHECK_NOT_NULL so clang-tidy bugprone-unchecked-pointer-access can prove null-safety ## Why these are needed - Equality-delete conflict resolution (V2+ spec): without the typed `dataSequenceNumber` plumbed all the way from manifest -> protocol -> bridge -> Velox struct, `shouldSkipBySequenceNumber()` always evaluates against 0 and the filter never fires -- equality deletes can over-apply to rows from later snapshots that share the same equality key. - V3 `_row_id` system column: without `$first_row_id` in `infoColumns`, `IcebergSplitReader::firstRowId_` stays `nullopt` and `_row_id` returns NULL for every row instead of the per-row file-position-based ID. - DV blob location: the existing `lowerBounds[100]` / `upperBounds[101]` smuggling kept the C++ reader functional but meant the Velox struct couldn't be reasoned about by anyone who didn't know about the secret field IDs. Typed fields make DV blob location a first-class concern. - `referencedDataFile` filter: not strictly required while the coordinator filters per-split, but adds protection against pre-V3-spec compliance bugs in the planner. ## Pairing requirement D104959593 MUST land first, or the Prestissimo bridge will fail to compile when it tries to pass `contentOffset` / `contentLength` / `referencedDataFile` (and the trailing `dataSequenceNumber` ctor arg) into Velox structs that don't yet declare them. Differential Revision: D104959645
a164b19 to
36c8c14
Compare
| // reader VELOX_CHECK_GE-asserts non-negativity if the key is present, so | ||
| // emitting -1 would crash every V1/V2 read. Mirrors the Java guard in | ||
| // `IcebergPageSourceProvider.java`. | ||
| if (icebergSplit.firstRowId >= 0) { |
There was a problem hiding this comment.
Do we need this change in this PR? Its also present in https://github.com/prestodb/presto/pull/27743/changes where row lineage related changes are being made.
|
Thanks @apurva-meta for this change. You mentioned changes in D104959593 must precede this PR. Can you link the PR for D104959593 here? |
It has already been merged in velox: facebookincubator/velox#17503 |
|
@apurva-meta : Can you check the test failures in prestocpp builds. Copilot is suggesting Failing code: Error:
Root cause:
SolutionUpdate the construction of Current code: velox::connector::hive::iceberg::IcebergDeleteFile icebergDeleteFile(
toVeloxFileContent(deleteFile.content),
deleteFile.path,
toVeloxFileFormat(deleteFile.format),
deleteFile.recordCount,
deleteFile.fileSizeInBytes,
std::vector(deleteFile.equalityFieldIds),
lowerBounds,
upperBounds,
deleteFile.dataSequenceNumber,
deleteFile.contentOffset,
deleteFile.contentSize,
deleteFile.referencedDataFile);Recommended fixAdjust the constructor call to the updated Velox parameter order / field set. Most likely the break is from one of these:
A robust fix is to switch to named/aggregate initialization if velox::connector::hive::iceberg::IcebergDeleteFile icebergDeleteFile{
.content = toVeloxFileContent(deleteFile.content),
.path = deleteFile.path,
.format = toVeloxFileFormat(deleteFile.format),
.recordCount = deleteFile.recordCount,
.fileSizeInBytes = deleteFile.fileSizeInBytes,
.equalityFieldIds = std::vector<int32_t>(
deleteFile.equalityFieldIds.begin(),
deleteFile.equalityFieldIds.end()),
.lowerBounds = std::move(lowerBounds),
.upperBounds = std::move(upperBounds),
.dataSequenceNumber = deleteFile.dataSequenceNumber,
.contentOffset = deleteFile.contentOffset,
.contentSize = deleteFile.contentSize,
.referencedDataFile = deleteFile.referencedDataFile,
};If Velox now expects fewer fields, remove the no-longer-supported trailing ones from the initializer/call. If it expects .contentOffset = deleteFile.contentOffset ? std::optional(*deleteFile.contentOffset) : std::nullopt,
.contentSize = deleteFile.contentSize ? std::optional(*deleteFile.contentSize) : std::nullopt,
.referencedDataFile = deleteFile.referencedDataFile,Minimal likely patchIf the API still uses a constructor but dropped the last field, try: velox::connector::hive::iceberg::IcebergDeleteFile icebergDeleteFile(
toVeloxFileContent(deleteFile.content),
deleteFile.path,
toVeloxFileFormat(deleteFile.format),
deleteFile.recordCount,
deleteFile.fileSizeInBytes,
std::vector<int32_t>(
deleteFile.equalityFieldIds.begin(),
deleteFile.equalityFieldIds.end()),
lowerBounds,
upperBounds,
deleteFile.dataSequenceNumber,
deleteFile.contentOffset,
deleteFile.contentSize);Why this is the right place to fixThe workflow’s failing build step is: The compile failure occurs specifically in: Suggested final editReplace the direct positional constructor with whatever matches the current Velox |
|
Thanks — the overall direction makes sense, but I don’t think this is ready to approve yet. The main blocker is the Prestocpp build: this PR depends on the Velox-side API from PR 17503, and the new IcebergDeleteFile / HiveIcebergSplit construction in IcebergPrestoToVeloxConnector.cpp looks fragile against any dependency skew. Can you confirm the exact Velox version/submodule state expected by Presto CI and fix the current build failure first? Two smaller nits:
|
Summary:
Stacked on D104959593 (which added V3 typed fields to
velox::IcebergDeleteFileand thedataSequenceNumberparameter onthe production
HiveIcebergSplitconstructor). This diff plumbs allof those fields through the Java side and the Prestissimo bridge so
the typed Velox structs are populated end-to-end.
Adds:
dataSequenceNumber,referencedDataFile,contentOffset,contentSizeonpresto-iceberg/.../delete/DeleteFile.javapresto::protocol::iceberg::DeleteFilepresto_cpp/main/connectors/IcebergPrestoToVeloxConnector.cpp:IcebergDeleteFilestruct (fields added in D104959593)protocol::iceberg::IcebergSplit::dataSequenceNumberinto the Velox
HiveIcebergSplitctor (parameter added inD104959593)
infoColumns[$first_row_id]fromicebergSplit->firstRowIdalongside the existing$data_sequence_numberentry, so the V3_row_idsystemcolumn finally returns non-NULL values in production
IcebergMetadataColumns.hinstead ofraw string literals (no behavior change beyond removing
duplication)
folly::to<std::string>in place ofstd::to_string(Android-portable per Meta C++ convention)
after the VELOX_CHECK_NOT_NULL so clang-tidy
bugprone-unchecked-pointer-access can prove null-safety
Why these are needed
Equality-delete conflict resolution (V2+ spec): without the typed
dataSequenceNumberplumbed all the way from manifest -> protocol-> bridge -> Velox struct,
shouldSkipBySequenceNumber()alwaysevaluates against 0 and the filter never fires -- equality deletes
can over-apply to rows from later snapshots that share the same
equality key.
V3
_row_idsystem column: without$first_row_idininfoColumns,IcebergSplitReader::firstRowId_staysnulloptand
_row_idreturns NULL for every row instead of the per-rowfile-position-based ID.
DV blob location: the existing
lowerBounds[100]/upperBounds[101]smuggling kept the C++ reader functional butmeant the Velox struct couldn't be reasoned about by anyone who
didn't know about the secret field IDs. Typed fields make DV
blob location a first-class concern.
referencedDataFilefilter: not strictly required while thecoordinator filters per-split, but adds protection against
pre-V3-spec compliance bugs in the planner.
Pairing requirement
D104959593 MUST land first, or the Prestissimo bridge will fail to
compile when it tries to pass
contentOffset/contentLength/referencedDataFile(and the trailingdataSequenceNumberctorarg) into Velox structs that don't yet declare them.
Differential Revision: D104959645