-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Closed
Description
What happened?
When users don't explicitly set a timestamp on their records, the Python BT client defaults the timestamp to -1
, which Bigtable handles by attaching system time at ingestion. The connector mishandles these rows by not sending over the -1
timestamp and instead dropping it here. When the records get to the underlying Java IO, it doesn't see any explicit timestamp set. Unlike the Python client, the Java BT client defaults timestamps to 0
, which Bigtable handles by attaching epoch time.
The result is instead of attaching the current timestamp to cells, we attach epoch time for each of them.
This can affect users in two ways:
- Users can set a garbage collection policy that cleans up old records in their table. These records with unset timestamps will show up as really old (1970-1-1) and will be garbage collected
- Bigtable keeps the history of a cell in a table. When users write to a cell multiple times, this bug will cause the cell history to be overwritten because the same timestamp (epoch time) is used each time.
Issue Priority
Priority: 1 (data loss / total loss of function)
Issue Components
- Component: Python SDK
- Component: Java SDK
- Component: Go SDK
- Component: Typescript SDK
- Component: IO connector
- Component: Beam examples
- Component: Beam playground
- Component: Beam katas
- Component: Website
- Component: Spark Runner
- Component: Flink Runner
- Component: Samza Runner
- Component: Twister2 Runner
- Component: Hazelcast Jet Runner
- Component: Google Cloud Dataflow Runner