Skip to content

Deadlock in streaming_pull_manager and ResumableBidiRpc after receiving 504 deadline exceeded error #74

@sweatybridge

Description

@sweatybridge

Environment details

  • OS type and version: alpine 3.11
  • Python version: 3.7.7
  • pip version: 20.0.2
  • google-cloud-pubsub version: 1.4.2

Steps to reproduce

  1. Start subscriber client
  2. Wait for 10 minutes (server should return 504 Deadline Exceeded error)
  3. Messages no longer received and heartbeat no longer sent

Here are the debug logs just after 10 mins.

DEBUG:google.cloud.pubsub_v1.subscriber._protocol.leaser:The current deadline value is 10 seconds.
DEBUG:google.cloud.pubsub_v1.subscriber._protocol.leaser:Snoozing lease management for 7.152175 seconds.
INFO:google.cloud.pubsub_v1.subscriber._protocol.streaming_pull_manager:Observed non-terminating stream error 504 Deadline Exceeded
INFO:google.cloud.pubsub_v1.subscriber._protocol.streaming_pull_manager:Observed recoverable stream error 504 Deadline Exceeded
DEBUG:google.api_core.bidi:Re-opening stream from gRPC callback.
DEBUG:google.api_core.bidi:Empty queue and inactive call, exiting request generator.
DEBUG:google.cloud.pubsub_v1.subscriber._protocol.leaser:The current deadline value is 10 seconds.
DEBUG:google.cloud.pubsub_v1.subscriber._protocol.leaser:Snoozing lease management for 4.379558 seconds.
DEBUG:google.cloud.pubsub_v1.subscriber._protocol.leaser:The current deadline value is 10 seconds.
DEBUG:google.cloud.pubsub_v1.subscriber._protocol.leaser:Snoozing lease management for 7.720803 seconds.
DEBUG:google.cloud.pubsub_v1.subscriber._protocol.leaser:The current deadline value is 10 seconds.
DEBUG:google.cloud.pubsub_v1.subscriber._protocol.leaser:Snoozing lease management for 3.224825 seconds.
DEBUG:google.cloud.pubsub_v1.subscriber._protocol.leaser:The current deadline value is 10 seconds.
DEBUG:google.cloud.pubsub_v1.subscriber._protocol.leaser:Snoozing lease management for 0.857948 seconds.

Looking at the logs, I'm expected to see _LOGGER.debug("Call to retryable %r caused %s.", method, exc) since it should be printed on error https://github.com/googleapis/python-api-core/blob/5e5559202891f7e5b6c22c2cbc549e1ec26eb857/google/api_core/bidi.py#L508. However, the message was not printed so the thread was stuck acquiring self._operational_lock.

I've sampled the top functions running on all threads after getting into the deadlock state (attached as stacktrace below). The heartbeater's background thread seems to be frequently holding the lock as it performs the is_active check https://github.com/googleapis/python-pubsub/blob/master/google/cloud/pubsub_v1/subscriber/_protocol/streaming_pull_manager.py#L425. I'm not sure why it's not being released.

Code example

import logging

from google.cloud.pubsub_v1 import SubscriberClient


def main():
    logging.basicConfig(level=logging.DEBUG)

    client = SubscriberClient()
    sub = client.subscription_path("project", "subscription")

    def callback(message):
        print(message)
        message.ack()

    future = client.subscribe(sub, callback)

    try:
        future.result()
    except KeyboardInterrupt:
        future.cancel()

if __name__ == "__main__":
    main()

Stack trace

Collecting samples from 'python main.py' (python v3.7.7)
Total Samples 2000
GIL: 0.00%, Active: 12.00%, Threads: 17

  %Own   %Total  OwnTime  TotalTime  Function (filename:line)
 10.00%  10.00%    1.94s     1.94s   _worker (concurrent/futures/thread.py:78)
  2.00%   2.00%   0.640s    0.640s   _recoverable (api_core/bidi.py:508)
  0.00%   0.00%   0.330s    0.330s   is_active (api_core/bidi.py:578)
  0.00%   0.00%   0.000s    0.330s   heartbeat (pubsub_v1/subscriber/_protocol/streaming_pull_manager.py:425)
  0.00%  12.00%   0.000s     2.91s   _bootstrap (threading.py:890)
  0.00%  12.00%   0.000s     2.91s   _bootstrap_inner (threading.py:926)
  0.00%   2.00%   0.000s    0.640s   recv (api_core/bidi.py:562)
  0.00%  12.00%   0.000s     2.91s   run (threading.py:870)
  0.00%   0.00%   0.000s    0.330s   heartbeat (pubsub_v1/subscriber/_protocol/heartbeater.py:40)
  0.00%   2.00%   0.000s    0.640s   _thread_main (api_core/bidi.py:655)

Metadata

Metadata

Assignees

Labels

api: pubsubIssues related to the googleapis/python-pubsub API.priority: p2Moderately-important priority. Fix may not be included in next release.type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions