Description
When using HttpClient for POST requests, connection resets by HTTP server may cause the client to lock in unexpected message type: DefaultFullHttpRequest, state: 1
state for subsequent requests with the same server. It's caused by the retry feature of HttpClient.
Expected Behavior
Subsequent HTTP requests work normally after a connection reset by the HTTP server.
Actual Behavior
Sometimes, the HttpClient is locked in unexpected message type: DefaultFullHttpRequest, state: 1
state.
Steps to Reproduce
The issue has happened in our production when the network was unstable, i.e. with lots of TCP connection resets.
- Start a HTTP server (could be any HTTP server, it doesn't matter)
- Using HttpClient (it's easier to reproduce the issue if setting maxConnections to 1) to send a POST request (with body) to the HTTP server. Debug to pause at reactor.netty.http.HttpOperations#send -
return new PostHeadersNettyOutbound
public NettyOutbound send(Publisher<? extends ByteBuf> source) {
if (!channel().isActive()) {
return then(Mono.error(AbortedException.beforeSend()));
}
if (source instanceof Mono) {
// pause here. Channel must be active in order to reach here
return new PostHeadersNettyOutbound(((Mono<ByteBuf>)source)
....
}
....
}
- Kill the HTTP server
- The HTTP request continues to do
HttpObjectEncoder#encode
, which will release the refCnt ofDefaultFullHttpRequest#content
here - Connection reset by HTTP server (at step 3) leads to reactor.netty.http.client.HttpClientConnect.HttpObserver#onUncaughtException, which leads to a retry of the HTTP request
- Start the HTTP server again
- Retrying the HTTP request will go through
HttpObjectEncoder#encode
again. However, the refCnt ofDefaultFullHttpRequest#content
was already released at step 4. Trying to encode again sometimes (not always. it depends on exact moment of the connection reset) causesjava.lang.IndexOutOfBoundsException
to be thrown at here, which leavesHttpObjectEncoder#state
inST_CONTENT_NON_CHUNK
state.io.netty.util.IllegalReferenceCountException: refCnt: 0, decrement: 1
is thrown after at here. - Subsequent requests with the same server (assuming using the same TCP connection) will always meet with error
unexpected message type: DefaultFullHttpRequest, state: 1
because the instance ofHttpObjectEncoder
was locked inST_CONTENT_NON_CHUNK
state at step 7.
Stack trace when the above situation happens, that is io.netty.util.IllegalReferenceCountException: refCnt: 0, decrement: 1
and then unexpected message type: DefaultFullHttpRequest, state: 1
.
2021-04-29 16:55:17.008 WARN --- [reactor-http-nio-3] r.netty.http.client.HttpClientConnect : [id: 0xe1894edd, L:/127.0.0.1:65440 - R:/127.0.0.1:11001] The connection observed an error
io.netty.handler.codec.EncoderException: io.netty.util.IllegalReferenceCountException: refCnt: 0, decrement: 1
at io.netty.handler.codec.MessageToMessageEncoder.write(MessageToMessageEncoder.java:107)
at io.netty.channel.CombinedChannelDuplexHandler.write(CombinedChannelDuplexHandler.java:346)
at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:715)
at io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:762)
at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:788)
at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:756)
at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:806)
at io.netty.channel.DefaultChannelPipeline.writeAndFlush(DefaultChannelPipeline.java:1025)
at io.netty.channel.AbstractChannel.writeAndFlush(AbstractChannel.java:294)
at reactor.netty.http.HttpOperations.lambda$send$0(HttpOperations.java:118)
at reactor.core.publisher.FluxFlatMap.trySubscribeScalarMap(FluxFlatMap.java:151)
at reactor.core.publisher.MonoFlatMap.subscribeOrReturn(MonoFlatMap.java:53)
at reactor.core.publisher.Mono.subscribe(Mono.java:4090)
at reactor.netty.NettyOutbound.subscribe(NettyOutbound.java:329)
at reactor.core.publisher.MonoSource.subscribe(MonoSource.java:58)
at reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:52)
at reactor.netty.http.client.HttpClientConnect$HttpIOHandlerObserver.onStateChange(HttpClientConnect.java:430)
at reactor.netty.ReactorNetty$CompositeConnectionObserver.onStateChange(ReactorNetty.java:503)
at reactor.netty.resources.PooledConnectionProvider$DisposableAcquire.run(PooledConnectionProvider.java:571)
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:745)
Caused by: io.netty.util.IllegalReferenceCountException: refCnt: 0, decrement: 1
at io.netty.util.internal.ReferenceCountUpdater.toLiveRealRefCnt(ReferenceCountUpdater.java:74)
at io.netty.util.internal.ReferenceCountUpdater.release(ReferenceCountUpdater.java:138)
at io.netty.buffer.AbstractReferenceCountedByteBuf.release(AbstractReferenceCountedByteBuf.java:100)
at io.netty.handler.codec.http.DefaultFullHttpRequest.release(DefaultFullHttpRequest.java:102)
at io.netty.util.ReferenceCountUtil.release(ReferenceCountUtil.java:88)
at io.netty.handler.codec.MessageToMessageEncoder.write(MessageToMessageEncoder.java:91)
... 25 common frames omitted
2021-04-29 16:55:28.282 ERROR --- [reactor-http-nio-3] r.n.http.server.HttpServerOperations : [id: 0xdd50934f, L:/0:0:0:0:0:0:0:1:9001 - R:/0:0:0:0:0:0:0:1:65450] Error starting response. Replying error status
io.netty.handler.codec.EncoderException: java.lang.IllegalStateException: unexpected message type: DefaultFullHttpRequest, state: 1
at io.netty.handler.codec.MessageToMessageEncoder.write(MessageToMessageEncoder.java:107)
at io.netty.channel.CombinedChannelDuplexHandler.write(CombinedChannelDuplexHandler.java:346)
at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:715)
at io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:762)
at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:788)
at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:756)
at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:806)
at io.netty.channel.DefaultChannelPipeline.writeAndFlush(DefaultChannelPipeline.java:1025)
at io.netty.channel.AbstractChannel.writeAndFlush(AbstractChannel.java:294)
at reactor.netty.http.HttpOperations.lambda$send$0(HttpOperations.java:118)
at reactor.core.publisher.FluxFlatMap.trySubscribeScalarMap(FluxFlatMap.java:151)
at reactor.core.publisher.MonoFlatMap.subscribeOrReturn(MonoFlatMap.java:53)
at reactor.core.publisher.Mono.subscribe(Mono.java:4090)
at reactor.netty.NettyOutbound.subscribe(NettyOutbound.java:329)
at reactor.core.publisher.MonoSource.subscribe(MonoSource.java:58)
at reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:52)
at reactor.netty.http.client.HttpClientConnect$HttpIOHandlerObserver.onStateChange(HttpClientConnect.java:430)
at reactor.netty.ReactorNetty$CompositeConnectionObserver.onStateChange(ReactorNetty.java:503)
at reactor.netty.resources.PooledConnectionProvider$DisposableAcquire.run(PooledConnectionProvider.java:571)
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalStateException: unexpected message type: DefaultFullHttpRequest, state: 1
at io.netty.handler.codec.http.HttpObjectEncoder.encode(HttpObjectEncoder.java:86)
at io.netty.handler.codec.http.HttpClientCodec$Encoder.encode(HttpClientCodec.java:167)
at io.netty.handler.codec.MessageToMessageEncoder.write(MessageToMessageEncoder.java:89)
... 25 common frames omitted
Possible Solution
Disable the retry feature of HttpClient.
HttpClient.create().disableRetry(true)
With current implementation, retries of any HTTP request with a body (POST, PUT, etc) in case of connection reset are not safe. This feature should probably be removed (or at least disabled by default).
Your Environment
- Reactor version(s) used: reactor-netty 0.9.18.RELEASE. reactor-core 3.3.15.RELEASE
- Other relevant libraries versions (eg.
netty
, ...): netty 4.1.60.Final - JVM version (
java -version
):java version "11" 2018-09-25 - OS and version (eg.
uname -a
):Darwin MacBook-Pro.local 20.3.0 Darwin Kernel Version 20.3.0: Thu Jan 21 00:07:06 PST 2021; root:xnu-7195.81.3~1/RELEASE_X86_64 x86_64
Activity
When "Connection reset by peer" is observed mark the connection as no…
violetagg commentedon May 6, 2021
@xiazuojie Are you able to test this PR #1625
xiazuojie commentedon May 7, 2021
@violetagg, I've checked the PR. It should fix my specific scenario.
However, I'm not so sure that retries should be part of a library feature, especially when enabled by default. It's prone to errors and side effects. In my opinion, it's better to have the decision (of whether to retry or not) in business domain.
violetagg commentedon May 7, 2021
And for that purpose you can disable the retries if this is needed.
When "Connection reset by peer" is observed mark the connection as no…
xiazuojie commentedon May 7, 2021
Yes, it can be opted out. But it probably should be an opt-in.
If I am not mistaken, I've not seen other HTTP clients with retry enabled by default.
Do not retry the request when the headers/body were sent.
11 remaining items