Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HttpClient retry may cause "unexpected message type: DefaultFullHttpRequest, state: 1" #1622

Closed
xiazuojie opened this issue May 6, 2021 · 5 comments · Fixed by #1625 or #1626
Closed
Assignees
Labels
type/bug A general bug

Comments

@xiazuojie
Copy link

xiazuojie commented May 6, 2021

When using HttpClient for POST requests, connection resets by HTTP server may cause the client to lock in unexpected message type: DefaultFullHttpRequest, state: 1 state for subsequent requests with the same server. It's caused by the retry feature of HttpClient.

Expected Behavior

Subsequent HTTP requests work normally after a connection reset by the HTTP server.

Actual Behavior

Sometimes, the HttpClient is locked in unexpected message type: DefaultFullHttpRequest, state: 1 state.

Steps to Reproduce

The issue has happened in our production when the network was unstable, i.e. with lots of TCP connection resets.

  1. Start a HTTP server (could be any HTTP server, it doesn't matter)
  2. Using HttpClient (it's easier to reproduce the issue if setting maxConnections to 1) to send a POST request (with body) to the HTTP server. Debug to pause at reactor.netty.http.HttpOperations#send - return new PostHeadersNettyOutbound
	public NettyOutbound send(Publisher<? extends ByteBuf> source) {
		if (!channel().isActive()) {
			return then(Mono.error(AbortedException.beforeSend()));
		}
		if (source instanceof Mono) {
                         // pause here. Channel must be active in order to reach here
			return new PostHeadersNettyOutbound(((Mono<ByteBuf>)source)
					....
		}
		....
	}
  1. Kill the HTTP server
  2. The HTTP request continues to do HttpObjectEncoder#encode, which will release the refCnt of DefaultFullHttpRequest#content here
  3. Connection reset by HTTP server (at step 3) leads to reactor.netty.http.client.HttpClientConnect.HttpObserver#onUncaughtException, which leads to a retry of the HTTP request
  4. Start the HTTP server again
  5. Retrying the HTTP request will go through HttpObjectEncoder#encode again. However, the refCnt of DefaultFullHttpRequest#content was already released at step 4. Trying to encode again sometimes (not always. it depends on exact moment of the connection reset) causes java.lang.IndexOutOfBoundsException to be thrown at here, which leaves HttpObjectEncoder#state in ST_CONTENT_NON_CHUNK state. io.netty.util.IllegalReferenceCountException: refCnt: 0, decrement: 1 is thrown after at here.
  6. Subsequent requests with the same server (assuming using the same TCP connection) will always meet with error unexpected message type: DefaultFullHttpRequest, state: 1 because the instance ofHttpObjectEncoder was locked in ST_CONTENT_NON_CHUNK state at step 7.

Stack trace when the above situation happens, that is io.netty.util.IllegalReferenceCountException: refCnt: 0, decrement: 1 and then unexpected message type: DefaultFullHttpRequest, state: 1.

2021-04-29 16:55:17.008  WARN   --- [reactor-http-nio-3] r.netty.http.client.HttpClientConnect    : [id: 0xe1894edd, L:/127.0.0.1:65440 - R:/127.0.0.1:11001] The connection observed an error
io.netty.handler.codec.EncoderException: io.netty.util.IllegalReferenceCountException: refCnt: 0, decrement: 1
  at io.netty.handler.codec.MessageToMessageEncoder.write(MessageToMessageEncoder.java:107)
  at io.netty.channel.CombinedChannelDuplexHandler.write(CombinedChannelDuplexHandler.java:346)
  at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:715)
  at io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:762)
  at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:788)
  at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:756)
  at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:806)
  at io.netty.channel.DefaultChannelPipeline.writeAndFlush(DefaultChannelPipeline.java:1025)
  at io.netty.channel.AbstractChannel.writeAndFlush(AbstractChannel.java:294)
  at reactor.netty.http.HttpOperations.lambda$send$0(HttpOperations.java:118)
  at reactor.core.publisher.FluxFlatMap.trySubscribeScalarMap(FluxFlatMap.java:151)
  at reactor.core.publisher.MonoFlatMap.subscribeOrReturn(MonoFlatMap.java:53)
  at reactor.core.publisher.Mono.subscribe(Mono.java:4090)
  at reactor.netty.NettyOutbound.subscribe(NettyOutbound.java:329)
  at reactor.core.publisher.MonoSource.subscribe(MonoSource.java:58)
  at reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:52)
  at reactor.netty.http.client.HttpClientConnect$HttpIOHandlerObserver.onStateChange(HttpClientConnect.java:430)
  at reactor.netty.ReactorNetty$CompositeConnectionObserver.onStateChange(ReactorNetty.java:503)
  at reactor.netty.resources.PooledConnectionProvider$DisposableAcquire.run(PooledConnectionProvider.java:571)
  at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
  at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
  at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
  at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
  at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
  at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
  at java.lang.Thread.run(Thread.java:745)
Caused by: io.netty.util.IllegalReferenceCountException: refCnt: 0, decrement: 1
  at io.netty.util.internal.ReferenceCountUpdater.toLiveRealRefCnt(ReferenceCountUpdater.java:74)
  at io.netty.util.internal.ReferenceCountUpdater.release(ReferenceCountUpdater.java:138)
  at io.netty.buffer.AbstractReferenceCountedByteBuf.release(AbstractReferenceCountedByteBuf.java:100)
  at io.netty.handler.codec.http.DefaultFullHttpRequest.release(DefaultFullHttpRequest.java:102)
  at io.netty.util.ReferenceCountUtil.release(ReferenceCountUtil.java:88)
  at io.netty.handler.codec.MessageToMessageEncoder.write(MessageToMessageEncoder.java:91)
  ... 25 common frames omitted
2021-04-29 16:55:28.282 ERROR   --- [reactor-http-nio-3] r.n.http.server.HttpServerOperations     : [id: 0xdd50934f, L:/0:0:0:0:0:0:0:1:9001 - R:/0:0:0:0:0:0:0:1:65450] Error starting response. Replying error status
io.netty.handler.codec.EncoderException: java.lang.IllegalStateException: unexpected message type: DefaultFullHttpRequest, state: 1
  at io.netty.handler.codec.MessageToMessageEncoder.write(MessageToMessageEncoder.java:107)
  at io.netty.channel.CombinedChannelDuplexHandler.write(CombinedChannelDuplexHandler.java:346)
  at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:715)
  at io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:762)
  at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:788)
  at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:756)
  at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:806)
  at io.netty.channel.DefaultChannelPipeline.writeAndFlush(DefaultChannelPipeline.java:1025)
  at io.netty.channel.AbstractChannel.writeAndFlush(AbstractChannel.java:294)
  at reactor.netty.http.HttpOperations.lambda$send$0(HttpOperations.java:118)
  at reactor.core.publisher.FluxFlatMap.trySubscribeScalarMap(FluxFlatMap.java:151)
  at reactor.core.publisher.MonoFlatMap.subscribeOrReturn(MonoFlatMap.java:53)
  at reactor.core.publisher.Mono.subscribe(Mono.java:4090)
  at reactor.netty.NettyOutbound.subscribe(NettyOutbound.java:329)
  at reactor.core.publisher.MonoSource.subscribe(MonoSource.java:58)
  at reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:52)
  at reactor.netty.http.client.HttpClientConnect$HttpIOHandlerObserver.onStateChange(HttpClientConnect.java:430)
  at reactor.netty.ReactorNetty$CompositeConnectionObserver.onStateChange(ReactorNetty.java:503)
  at reactor.netty.resources.PooledConnectionProvider$DisposableAcquire.run(PooledConnectionProvider.java:571)
  at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
  at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
  at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
  at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
  at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
  at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
  at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalStateException: unexpected message type: DefaultFullHttpRequest, state: 1
  at io.netty.handler.codec.http.HttpObjectEncoder.encode(HttpObjectEncoder.java:86)
  at io.netty.handler.codec.http.HttpClientCodec$Encoder.encode(HttpClientCodec.java:167)
  at io.netty.handler.codec.MessageToMessageEncoder.write(MessageToMessageEncoder.java:89)
  ... 25 common frames omitted

Possible Solution

Disable the retry feature of HttpClient.

HttpClient.create().disableRetry(true)

With current implementation, retries of any HTTP request with a body (POST, PUT, etc) in case of connection reset are not safe. This feature should probably be removed (or at least disabled by default).

Your Environment

  • Reactor version(s) used: reactor-netty 0.9.18.RELEASE. reactor-core 3.3.15.RELEASE
  • Other relevant libraries versions (eg. netty, ...): netty 4.1.60.Final
  • JVM version (java -version):java version "11" 2018-09-25
  • OS and version (eg. uname -a):Darwin MacBook-Pro.local 20.3.0 Darwin Kernel Version 20.3.0: Thu Jan 21 00:07:06 PST 2021; root:xnu-7195.81.3~1/RELEASE_X86_64 x86_64
@xiazuojie xiazuojie added status/need-triage A new issue that still need to be evaluated as a whole type/bug A general bug labels May 6, 2021
@violetagg violetagg removed the status/need-triage A new issue that still need to be evaluated as a whole label May 6, 2021
@violetagg violetagg added this to the 0.9.20.RELEASE milestone May 6, 2021
@violetagg violetagg self-assigned this May 6, 2021
violetagg added a commit that referenced this issue May 6, 2021
…n-persistent

- When "Connection reset by peer" is observed mark the connection as non-persistent,
do not wait for the channel close event to invalidate the connection.
- Do not retry the request when the headers/body were sent.

Fixes #1622
@violetagg
Copy link
Member

@xiazuojie Are you able to test this PR #1625

@xiazuojie
Copy link
Author

xiazuojie commented May 7, 2021

@xiazuojie Are you able to test this PR #1625

@violetagg, I've checked the PR. It should fix my specific scenario.
However, I'm not so sure that retries should be part of a library feature, especially when enabled by default. It's prone to errors and side effects. In my opinion, it's better to have the decision (of whether to retry or not) in business domain.

@violetagg
Copy link
Member

@xiazuojie Are you able to test this PR #1625

@violetagg, I've checked the PR. It should fix my specific scenario.
However, I'm not so sure that retries should be part of a library feature, especially when enabled by default. It's prone to error and side effects. In my opinion, it's better to have the decision (of whether to retry or not) in business domain.

And for that purpose you can disable the retries if this is needed.

violetagg added a commit that referenced this issue May 7, 2021
…n-persistent

- When "Connection reset by peer" is observed mark the connection as non-persistent,
do not wait for the channel close event to invalidate the connection.

Fixes #1622
@xiazuojie
Copy link
Author

@xiazuojie Are you able to test this PR #1625

@violetagg, I've checked the PR. It should fix my specific scenario.
However, I'm not so sure that retries should be part of a library feature, especially when enabled by default. It's prone to error and side effects. In my opinion, it's better to have the decision (of whether to retry or not) in business domain.

And for that purpose you can disable the retries if this is needed.

Yes, it can be opted out. But it probably should be an opt-in.
If I am not mistaken, I've not seen other HTTP clients with retry enabled by default.

@violetagg
Copy link
Member

violetagg commented May 7, 2021

Yes, it can be opted out. But it probably should be an opt-in.
If I am not mistaken, I've not seen other HTTP clients with retry enabled by default.

We cannot change that behaviour for 0.9.x and 1.0.x. However with #1626 the retry will happen only if we didn't send anything with the first attempt.
If you want we can discuss in a new issue whether we want the retry functionality disabled by default for the next versions (1.1.x ...).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment