Skip to content

SharedResourceHolder should roughly handle exceptions during close #6002

@ejona86

Description

@ejona86

@xCASx reported a bug where gRPC would get "hung" after a point, which included the following stack trace:

Jul 22, 2019 7:37:47 PM io.grpc.internal.LogExceptionRunnable run
SEVERE: Exception while executing runnable io.grpc.internal.SharedResourceHolder$2@73cfe40e
io.grpc.netty.shaded.io.netty.channel.ChannelException: eventfd_write() failed: Bad file descriptor
at io.grpc.netty.shaded.io.netty.channel.epoll.Native.eventFdWrite(Native Method)
at io.grpc.netty.shaded.io.netty.channel.epoll.EpollEventLoop.wakeup(EpollEventLoop.java:167)
at io.grpc.netty.shaded.io.netty.util.concurrent.SingleThreadEventExecutor.shutdownGracefully(SingleThreadEventExecutor.java:603)
at io.grpc.netty.shaded.io.netty.util.concurrent.MultithreadEventExecutorGroup.shutdownGracefully(MultithreadEventExecutorGroup.java:163)
at io.grpc.netty.shaded.io.grpc.netty.Utils$DefaultEventLoopGroupResource.close(Utils.java:346)
at io.grpc.netty.shaded.io.grpc.netty.Utils$DefaultEventLoopGroupResource.close(Utils.java:318)
at io.grpc.internal.SharedResourceHolder$2.run(SharedResourceHolder.java:145)
at io.grpc.internal.LogExceptionRunnable.run(LogExceptionRunnable.java:43)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

If we look in SharedResourceHolder, if resource.close(instance) fails throws instances.remove(resource) is not run. While we shouldn't encourage exceptions to be thrown during close, we would like it to be able to recover eventually. In this case, any future resource fetches will get the partially-closed resource, which will immediately fail.

resource.close(instance);
instances.remove(resource);

@xCASx, a workaround would be to keep the client objects alive for as long as possible. We generally encourage that for performance, but as long as one client object is alive we won't attempt to shut down this executor.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions