Skip to content

Conversation

nehsyc
Copy link
Contributor

@nehsyc nehsyc commented Apr 9, 2021

Also fix the string representation of ShardedKeyCoder to print out the key coder.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

Post-Commit Tests Status (on master branch)

Lang SDK ULR Dataflow Flink Samza Spark Twister2
Go Build Status --- Build Status Build Status --- Build Status ---
Java Build Status Build Status Build Status
Build Status
Build Status
Build Status
Build Status Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status Build Status
Build Status
Build Status
Build Status
Python Build Status
Build Status
Build Status
--- Build Status
Build Status
Build Status
Build Status
Build Status
--- Build Status ---
XLang Build Status --- Build Status Build Status --- Build Status ---

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website Whitespace Typescript
Non-portable Build Status
Build Status
Build Status
Build Status
Build Status
Build Status Build Status Build Status Build Status
Portable --- Build Status Build Status --- --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests

See CI.md for more information about GitHub Actions CI.

@nehsyc
Copy link
Contributor Author

nehsyc commented Apr 9, 2021

R: @udim

@@ -941,7 +943,8 @@ def _write_files_with_auto_sharding(
destination_files_kv_pc = (
destination_data_kv_pc
| 'ToHashableTableRef' >> beam.Map(
lambda kv: (bigquery_tools.get_hashable_destination(kv[0]), kv[1]))
lambda kv: (bigquery_tools.get_hashable_destination(kv[0]), kv[1])).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, but with the additional step of making this lambda a function (you can't add type annotations to lambdas).

@@ -1431,7 +1433,8 @@ def _restore_table_ref(sharded_table_ref_elems_kv):
bigquery_tools.AppendDestinationsFn(self.table_reference),
*self.table_side_inputs)
| 'AddInsertIds' >> beam.ParDo(_StreamToBigQuery.InsertIdPrefixFn())
| 'ToHashableTableRef' >> beam.Map(_to_hashable_table_ref))
| 'ToHashableTableRef' >> beam.Map(_to_hashable_table_ref)
).with_output_types(Tuple[str, Any])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The explicit Any discards type information. Could you try annotating the types in _to_hashable_table_ref instead?

V = TypeVar('V')

def _to_hashable_table_ref(table_ref_elem_kv: Tuple[Union[str, TABLE_REF_TYPE], V]) -> Tuple[str, V]:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion! Made the change accordingly.

@codecov
Copy link

codecov bot commented Apr 9, 2021 β€’

Codecov Report

Merging #14499 (96ce928) into master (1b86266) will decrease coverage by 0.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #14499      +/-   ##
==========================================
- Coverage   83.49%   83.47%   -0.02%     
==========================================
  Files         447      449       +2     
  Lines       58904    58978      +74     
==========================================
+ Hits        49183    49233      +50     
- Misses       9721     9745      +24     
Impacted Files Coverage Ξ”
...sdks/python/apache_beam/internal/metrics/metric.py
...e_beam/portability/api/beam_runner_api_pb2_urns.py
...s/python/apache_beam/testing/pipeline_verifiers.py
...rcs/sdks/python/apache_beam/typehints/typehints.py
...s/python/apache_beam/io/gcp/bigquery_file_loads.py
...38/build/srcs/sdks/python/apache_beam/io/avroio.py
...srcs/sdks/python/apache_beam/metrics/metricbase.py
...srcs/sdks/python/apache_beam/coders/avro_record.py
...nternal/clients/dataflow/dataflow_v1b3_messages.py
...uild/srcs/sdks/python/apache_beam/coders/coders.py
... and 886 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Ξ” = absolute <relative> (impact), ΓΈ = not affected, ? = missing data
Powered by Codecov. Last update 1b86266...96ce928. Read the comment docs.

@@ -1417,7 +1417,12 @@ def _add_random_shard(element):
value = element[1]
return ((key, random.randint(0, DEFAULT_SHARDS_PER_DESTINATION)), value)

def _to_hashable_table_ref(table_ref_elem_kv):
V = TypeVar('V')
TABLE_REF_TYPE = TypeVar('TABLE_REF_TYPE')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I was lazy when I wrote TABLE_REF_TYPE - I meant it should be replaced with bigquery.TableReference. (the Union should contain the types accepted by bigquery_tools.get_hashable_destination)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha. That makes more sense! Corrected.

V = TypeVar('V')
TABLE_REF_TYPE = TypeVar('TABLE_REF_TYPE')

def _to_hashable_table_ref(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be put in bigquery_tools.py to avoid code repetition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. I moved it to bigquery_tools.py

@nehsyc
Copy link
Contributor Author

nehsyc commented Apr 10, 2021

retest this please

@nehsyc nehsyc force-pushed the deterministic_coder_fix branch 2 times, most recently from f9956e5 to 62768fd Compare April 10, 2021 06:36
Move to_hashable_table_ref to bigquery_tools
@nehsyc nehsyc force-pushed the deterministic_coder_fix branch from 62768fd to 38f8390 Compare April 10, 2021 07:00
@@ -69,6 +71,11 @@
except ImportError:
pass

try:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to explicitly import this otherwise many tests failed with "apache_beam.io.gcp.internal.clients.bigquery has no attribute TableReference" - I guess bigquery lib was not available in some environments. The pylint check also failed possibly because of this but I haven't figured out what's wrong. Any insights?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that makes sense - you can add # pylint: disable=ungrouped-imports, wrong-import-order, wrong-import-position before the block to get pylint to pass.

@pabloem pabloem merged commit f805f1c into apache:master Apr 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants