Closed
Description
What version of gRPC are you using?
v1.18.0/v1.19.1
What version of Go are you using (go version
)?
1.11
What operating system (Linux, Windows, …) and version?
Linux
What did you do?
I tried to enable MaxConnectionAge on a C++ server and call from a Go client. When the connection age was reached, the rpcs in the Go client failed with the unavailable error and the "transport is closing" message.
What did you expect to see?
No unavailable errors.
What did you see instead?
I test with both age and grace time set to one minute, and the Go client reported lots of errors. Each rpc takes a few seconds to complete, within the one minute grace period.
grpc-go v1.18.0/v1.19.1
rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: <nil>
rpc error: code = Unavailable desc = the connection is draining
rpc error: code = Unavailable desc = transport is closing
Reference: grpc/grpc#12295 (comment)
Activity
menghanl commentedon Apr 12, 2019
Can you turn on logging for the client and server?
For go client:
GRPC_GO_LOG_VERBOSITY_LEVEL=99 GRPC_GO_LOG_SEVERITY_LEVEL=info
For C++ server: try
GRPC_VERBOSITY=DEBUG GRPC_TRACE=server_channel
dfawley commentedon May 2, 2019
ping @wjywbs - could you gather the debugging info requested and do you have a minimal reproducible test case?
wjywbs commentedon May 2, 2019
Sure. I'll try to create a minimized demo.
easwars commentedon May 2, 2019
I tried with a Go server and Go client (which makes one connection to the server and repeatedly makes RPCs on it).
When the
MaxConnectionAge
andMaxConnectionAgeGrace
on the server were much greater than the time taken to complete the RPCs, I saw the following in the client logs:I0502 14:33:32.238317 58768 pickfirst.go:73] pickfirstBalancer: HandleSubConnStateChange: 0xc0007e5360, TRANSIENT_FAILURE
I0502 14:33:32.238395 58768 pickfirst.go:73] pickfirstBalancer: HandleSubConnStateChange: 0xc0007e5360, CONNECTING
I0502 14:33:32.242678 58768 pickfirst.go:73] pickfirstBalancer: HandleSubConnStateChange: 0xc0007e5360, READY
and the client kept running without any issues.
But, when the
MaxConnectionAge
andMaxConnectionAgeGrace
on the server was lesser than the time taken to complete the RPC, I saw the following on the client:I0502 15:24:30.825472 83849 pickfirst.go:73] pickfirstBalancer: HandleSubConnStateChange: 0xc000475c00, TRANSIENT_FAILURE
I0502 15:24:30.825538 83849 pickfirst.go:73] pickfirstBalancer: HandleSubConnStateChange: 0xc000475c00, CONNECTING
I0502 15:24:30.829788 83849 pickfirst.go:73] pickfirstBalancer: HandleSubConnStateChange: 0xc000475c00, READY
F0502 15:24:30.835994 83849 fortuneteller_client.go:83] Error received while calling callTellFortune: rpc error: code = Unavailable desc = transport is closing
easwars commentedon May 2, 2019
I think this is what is happening:
Since
MaxConnectionAge
andMaxConnectionAgeGrace
have passed on the server, it force closes the connection.And in the client in
http2Client.reader()
, the call tot.framer.fr.ReadFrame
returns EOF, and since this is not of typehttp2.StreamError
we callClose()
on the transport, which closes all active streams withErrConnClosing
At this point, I don't know enough to say whether this is a bug or is WAI :-)
dfawley commentedon May 2, 2019
If
MaxConnectionAgeGrace
passes, then this would be expected. From the issue linked, I understood that the grace period was not being reached and errors were still appearing. @wjywbs - can you confirm this?wjywbs commentedon May 2, 2019
Yes, I set both age and grace to 60s and each rpc finishes within a few seconds, but the unavailable errors appear a lot of times.
dfawley commentedon May 23, 2019
@wjywbs do you have a minimal reproducible test case we can use to replicate the problem you are seeing?
hanjm commentedon May 29, 2019
hi @dfawley @wjywbs, i set MaxConnectionAge for long live grpc stream server recently, then a lots of grpc log "transport is closing", so i found this issue and write a test case to replicate the problem. you may need use -count 100 because sometimes it test pass sometimes it test fail.
hanjm commentedon Jun 2, 2019
update. if i use grpc.FailFast(false) or grpc.WithDefaultCallOptions(grpc.FailFast(false)), it's will work fine.
dfawley commentedon Jun 20, 2019
We suspect this may be fixed by #2857. Can you try with this change and let us know if it resolves your problems?
wjywbs commentedon Jul 11, 2019
I tested v1.22.0, and there were still many "rpc error: code = Unavailable desc = the connection is draining". With grpc.WaitForReady(true), there were less errors and the descriptions were "the connection is draining" and "transport is closing".
dfawley commentedon Jul 11, 2019
It seems this might be a problem with how we handle the race between choosing a transport and the transport getting the GOAWAY from the server. I'll take a look next week.
8 remaining items