Skip to content

Conversation

aribray
Copy link
Contributor

@aribray aribray commented Nov 4, 2022 β€’

Current behavior:

  • for load jobs from federated formats like AVRO, PARQUET, and ORC, BigQuery uses the schema of whichever file is lexicographically last.

Example:

source_uris = [
    "gs://{project}/{bucket_name}/c-file.avro", 
    "gs://{project}/{bucket_name}/b-file.avro",
    "gs://{project}/{bucket_name}/r-file.avro",
]

"gs://{project}/{bucket_name}/r-file.avro" is lexicographically last

New behavior:

  • The reference_file_schema_uri field allows users to specify the schema
  • The reference_file_schema_uri does not have to be a file from the source_uris list
  • To prevent data loss, the reference_file_schema_uri should be a superset of the schemas in the source_uris list

Googlers see 246809557

@product-auto-label product-auto-label bot added size: m Pull request size is medium. api: bigquery Issues related to the googleapis/python-bigquery API. labels Nov 4, 2022
@aribray aribray marked this pull request as ready for review November 4, 2022 15:56
@aribray aribray requested a review from a team November 4, 2022 15:56
@aribray aribray requested a review from a team as a code owner November 4, 2022 15:56
@aribray aribray requested a review from Neenu1995 November 4, 2022 15:56
@product-auto-label product-auto-label bot added size: l Pull request size is large. and removed size: m Pull request size is medium. labels Nov 4, 2022
Copy link
Contributor

@leahecole leahecole left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah with the nits, I'm honestly torn. Use your best judgment - it's nbd if it's not changed.

@product-auto-label product-auto-label bot added size: m Pull request size is medium. and removed size: l Pull request size is large. labels Nov 9, 2022
@product-auto-label product-auto-label bot removed the size: m Pull request size is medium. label Nov 10, 2022
@aribray aribray added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Nov 10, 2022
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Nov 10, 2022
@aribray aribray added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Nov 10, 2022
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Nov 10, 2022
@aribray aribray added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Nov 10, 2022
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Nov 10, 2022
@aribray aribray added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Nov 11, 2022
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Nov 11, 2022
@aribray aribray added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Nov 13, 2022
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Nov 13, 2022
@aribray aribray merged commit 931285f into googleapis:main Nov 14, 2022
@aribray aribray deleted the aribray--federated-formats branch November 14, 2022 22:26
abdelmegahedgoogle pushed a commit to abdelmegahedgoogle/python-bigquery that referenced this pull request Apr 17, 2023
googleapis#1399)

* feat: add 'reference_file_schema_uri' to LoadJobConfig and ExternalConfig
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. size: l Pull request size is large.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants