chore(deps): Update BigQuery Storage API SDK from v1beta1 to v1#27797
chore(deps): Update BigQuery Storage API SDK from v1beta1 to v1#27797benhxy wants to merge 1 commit into
Conversation
Reviewer's GuideMigrates the Presto BigQuery connector from the preview BigQuery Storage v1beta1 read API to the GA BigQuery Storage v1 Read API by swapping client factories and types, updating read session and read-rows request construction and retry logic, and aligning tests and dependencies with the new v1 protocol buffers. Sequence diagram for BigQuery read flow using Storage v1 Read APIsequenceDiagram
participant BigQuerySplitManager
participant ReadSessionCreator
participant BigQueryReadClientFactory
participant BigQueryReadClient
participant BigQueryResultPageSource
participant ReadRowsHelper
BigQuerySplitManager->>ReadSessionCreator: create(tableId, projectedColumnsNames, filter, actualParallelism)
ReadSessionCreator->>BigQueryReadClientFactory: createBigQueryReadClient()
BigQueryReadClientFactory-->>ReadSessionCreator: BigQueryReadClient
ReadSessionCreator->>BigQueryReadClient: createReadSession(CreateReadSessionRequest)
BigQueryReadClient-->>ReadSessionCreator: ReadSession
BigQuerySplitManager-->>BigQueryResultPageSource: BigQuerySplit
BigQueryResultPageSource->>BigQueryReadClientFactory: createBigQueryReadClient()
BigQueryReadClientFactory-->>BigQueryResultPageSource: BigQueryReadClient
BigQueryResultPageSource->>ReadRowsHelper: new(BigQueryReadClient, ReadRowsRequest.Builder, maxReadRowsRetries)
BigQueryResultPageSource->>ReadRowsHelper: readRows()
loop on retryable errors
ReadRowsHelper->>ReadRowsHelper: request.setOffset(readRowsCount)
ReadRowsHelper->>BigQueryReadClient: readRows(ReadRowsRequest)
BigQueryReadClient-->>ReadRowsHelper: ReadRowsResponse
end
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 2 issues
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location path="presto-bigquery/src/main/java/com/facebook/presto/plugin/bigquery/ReadSessionCreator.java" line_range="91-94" />
<code_context>
+ .setDataFormat(DataFormat.AVRO)
+ .setReadOptions(readOptions);
+
+ ReadSession readSession = bigQueryReadClient.createReadSession(
+ CreateReadSessionRequest.newBuilder()
.setParent("projects/" + bigQueryClient.getProjectId())
- .setFormat(Storage.DataFormat.AVRO)
- .setRequestedStreams(parallelism)
- .setReadOptions(readOptions)
- .setTableReference(tableReference)
- // The BALANCED sharding strategy causes the server to
- // assign roughly the same number of rows to each stream.
- .setShardingStrategy(Storage.ShardingStrategy.BALANCED)
+ .setReadSession(sessionBuilder)
+ .setMaxStreamCount(parallelism)
.build());
</code_context>
<issue_to_address>
**issue (bug_risk):** Build the ReadSession before setting it on the CreateReadSessionRequest.
`CreateReadSessionRequest#setReadSession` takes a `ReadSession`, not a builder. Use `setReadSession(sessionBuilder.build())` so the request receives a fully constructed immutable `ReadSession`.
</issue_to_address>
### Comment 2
<location path="presto-bigquery/src/test/java/com/facebook/presto/plugin/bigquery/TestReadRowsHelper.java" line_range="73-82" />
<code_context>
{
Iterator<MockResponsesBatch> responses;
- MockReadRowsHelper(BigQueryStorageClient client, Storage.ReadRowsRequest.Builder request, int maxReadRowsRetries, Iterable<MockResponsesBatch> responses)
+ MockReadRowsHelper(BigQueryReadClient client, ReadRowsRequest.Builder request, int maxReadRowsRetries, Iterable<MockResponsesBatch> responses)
{
super(client, request, maxReadRowsRetries);
this.responses = responses.iterator();
}
@Override
- protected Iterator<Storage.ReadRowsResponse> fetchResponses(Storage.ReadRowsRequest.Builder readRowsRequest)
+ protected Iterator<ReadRowsResponse> fetchResponses(ReadRowsRequest.Builder readRowsRequest)
{
return responses.next();
</code_context>
<issue_to_address>
**suggestion (testing):** Strengthen tests to assert that the ReadRowsRequest offset is updated on retries
Since ReadRowsHelper now uses the top-level v1 ReadRowsRequest offset, the existing tests that only assert aggregated row counts won’t detect regressions where the offset isn’t advanced between retries. Consider enhancing MockReadRowsHelper.fetchResponses to inspect the incoming ReadRowsRequest.Builder and assert that its offset equals the expected cumulative row count before returning the next MockResponsesBatch. This will ensure the tests cover the new retry/offset semantics rather than just the final rowCount.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| Iterator<MockResponsesBatch> responses; | ||
|
|
||
| MockReadRowsHelper(BigQueryStorageClient client, Storage.ReadRowsRequest.Builder request, int maxReadRowsRetries, Iterable<MockResponsesBatch> responses) | ||
| MockReadRowsHelper(BigQueryReadClient client, ReadRowsRequest.Builder request, int maxReadRowsRetries, Iterable<MockResponsesBatch> responses) | ||
| { | ||
| super(client, request, maxReadRowsRetries); | ||
| this.responses = responses.iterator(); | ||
| } | ||
|
|
||
| @Override | ||
| protected Iterator<Storage.ReadRowsResponse> fetchResponses(Storage.ReadRowsRequest.Builder readRowsRequest) | ||
| protected Iterator<ReadRowsResponse> fetchResponses(ReadRowsRequest.Builder readRowsRequest) |
There was a problem hiding this comment.
suggestion (testing): Strengthen tests to assert that the ReadRowsRequest offset is updated on retries
Since ReadRowsHelper now uses the top-level v1 ReadRowsRequest offset, the existing tests that only assert aggregated row counts won’t detect regressions where the offset isn’t advanced between retries. Consider enhancing MockReadRowsHelper.fetchResponses to inspect the incoming ReadRowsRequest.Builder and assert that its offset equals the expected cumulative row count before returning the next MockResponsesBatch. This will ensure the tests cover the new retry/offset semantics rather than just the final rowCount.
Description
Update Google BigQuery Storage API SDK from v1beta1 to v1.
Motivation and Context
v1beta1 is a preview API, and v1 is a GA API with better performance and reliability.
Impact
There should be no visible impact as v1 API is backward compatible (no changes in API activation, IAM permission, quota, billing etc.). V1 API has been GA-ed since 2020 and most of SDK/customers have migrated to v1.
Test Plan
Updates to unit tests.
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.
Summary by Sourcery
Upgrade BigQuery storage integration from the v1beta1 API to the GA v1 API and align request/response handling accordingly.
Enhancements:
Build:
Summary by Sourcery
Upgrade the BigQuery connector to use the GA BigQuery Storage v1 Read API instead of the deprecated v1beta1 API for read sessions and row retrieval.
Enhancements:
Build:
Tests:
Summary by Sourcery
Upgrade the BigQuery connector to use the GA BigQuery Storage v1 Read API instead of the deprecated v1beta1 API for read sessions and row retrieval.
Enhancements:
Build:
Tests:
Summary by Sourcery
Migrate the BigQuery connector from the preview BigQuery Storage v1beta1 client to the GA BigQuery Storage v1 Read API for creating read sessions and reading rows.
Enhancements:
Build:
Tests: