-
Notifications
You must be signed in to change notification settings - Fork 215
Description
Environment details
- OS type and version: alpine 3.11
- Python version: 3.7.7
- pip version: 20.0.2
google-cloud-pubsub
version: 1.4.2
Steps to reproduce
- Start subscriber client
- Wait for 10 minutes (server should return 504 Deadline Exceeded error)
- Messages no longer received and heartbeat no longer sent
Here are the debug logs just after 10 mins.
DEBUG:google.cloud.pubsub_v1.subscriber._protocol.leaser:The current deadline value is 10 seconds.
DEBUG:google.cloud.pubsub_v1.subscriber._protocol.leaser:Snoozing lease management for 7.152175 seconds.
INFO:google.cloud.pubsub_v1.subscriber._protocol.streaming_pull_manager:Observed non-terminating stream error 504 Deadline Exceeded
INFO:google.cloud.pubsub_v1.subscriber._protocol.streaming_pull_manager:Observed recoverable stream error 504 Deadline Exceeded
DEBUG:google.api_core.bidi:Re-opening stream from gRPC callback.
DEBUG:google.api_core.bidi:Empty queue and inactive call, exiting request generator.
DEBUG:google.cloud.pubsub_v1.subscriber._protocol.leaser:The current deadline value is 10 seconds.
DEBUG:google.cloud.pubsub_v1.subscriber._protocol.leaser:Snoozing lease management for 4.379558 seconds.
DEBUG:google.cloud.pubsub_v1.subscriber._protocol.leaser:The current deadline value is 10 seconds.
DEBUG:google.cloud.pubsub_v1.subscriber._protocol.leaser:Snoozing lease management for 7.720803 seconds.
DEBUG:google.cloud.pubsub_v1.subscriber._protocol.leaser:The current deadline value is 10 seconds.
DEBUG:google.cloud.pubsub_v1.subscriber._protocol.leaser:Snoozing lease management for 3.224825 seconds.
DEBUG:google.cloud.pubsub_v1.subscriber._protocol.leaser:The current deadline value is 10 seconds.
DEBUG:google.cloud.pubsub_v1.subscriber._protocol.leaser:Snoozing lease management for 0.857948 seconds.
Looking at the logs, I'm expected to see _LOGGER.debug("Call to retryable %r caused %s.", method, exc)
since it should be printed on error https://github.com/googleapis/python-api-core/blob/5e5559202891f7e5b6c22c2cbc549e1ec26eb857/google/api_core/bidi.py#L508. However, the message was not printed so the thread was stuck acquiring self._operational_lock
.
I've sampled the top functions running on all threads after getting into the deadlock state (attached as stacktrace below). The heartbeater's background thread seems to be frequently holding the lock as it performs the is_active
check https://github.com/googleapis/python-pubsub/blob/master/google/cloud/pubsub_v1/subscriber/_protocol/streaming_pull_manager.py#L425. I'm not sure why it's not being released.
Code example
import logging
from google.cloud.pubsub_v1 import SubscriberClient
def main():
logging.basicConfig(level=logging.DEBUG)
client = SubscriberClient()
sub = client.subscription_path("project", "subscription")
def callback(message):
print(message)
message.ack()
future = client.subscribe(sub, callback)
try:
future.result()
except KeyboardInterrupt:
future.cancel()
if __name__ == "__main__":
main()
Stack trace
Collecting samples from 'python main.py' (python v3.7.7)
Total Samples 2000
GIL: 0.00%, Active: 12.00%, Threads: 17
%Own %Total OwnTime TotalTime Function (filename:line)
10.00% 10.00% 1.94s 1.94s _worker (concurrent/futures/thread.py:78)
2.00% 2.00% 0.640s 0.640s _recoverable (api_core/bidi.py:508)
0.00% 0.00% 0.330s 0.330s is_active (api_core/bidi.py:578)
0.00% 0.00% 0.000s 0.330s heartbeat (pubsub_v1/subscriber/_protocol/streaming_pull_manager.py:425)
0.00% 12.00% 0.000s 2.91s _bootstrap (threading.py:890)
0.00% 12.00% 0.000s 2.91s _bootstrap_inner (threading.py:926)
0.00% 2.00% 0.000s 0.640s recv (api_core/bidi.py:562)
0.00% 12.00% 0.000s 2.91s run (threading.py:870)
0.00% 0.00% 0.000s 0.330s heartbeat (pubsub_v1/subscriber/_protocol/heartbeater.py:40)
0.00% 2.00% 0.000s 0.640s _thread_main (api_core/bidi.py:655)