Skip to content

HttpClient retry may cause "unexpected message type: DefaultFullHttpRequest, state: 1" #1622

@xiazuojie

Description

@xiazuojie

When using HttpClient for POST requests, connection resets by HTTP server may cause the client to lock in unexpected message type: DefaultFullHttpRequest, state: 1 state for subsequent requests with the same server. It's caused by the retry feature of HttpClient.

Expected Behavior

Subsequent HTTP requests work normally after a connection reset by the HTTP server.

Actual Behavior

Sometimes, the HttpClient is locked in unexpected message type: DefaultFullHttpRequest, state: 1 state.

Steps to Reproduce

The issue has happened in our production when the network was unstable, i.e. with lots of TCP connection resets.

  1. Start a HTTP server (could be any HTTP server, it doesn't matter)
  2. Using HttpClient (it's easier to reproduce the issue if setting maxConnections to 1) to send a POST request (with body) to the HTTP server. Debug to pause at reactor.netty.http.HttpOperations#send - return new PostHeadersNettyOutbound
	public NettyOutbound send(Publisher<? extends ByteBuf> source) {
		if (!channel().isActive()) {
			return then(Mono.error(AbortedException.beforeSend()));
		}
		if (source instanceof Mono) {
                         // pause here. Channel must be active in order to reach here
			return new PostHeadersNettyOutbound(((Mono<ByteBuf>)source)
					....
		}
		....
	}
  1. Kill the HTTP server
  2. The HTTP request continues to do HttpObjectEncoder#encode, which will release the refCnt of DefaultFullHttpRequest#content here
  3. Connection reset by HTTP server (at step 3) leads to reactor.netty.http.client.HttpClientConnect.HttpObserver#onUncaughtException, which leads to a retry of the HTTP request
  4. Start the HTTP server again
  5. Retrying the HTTP request will go through HttpObjectEncoder#encode again. However, the refCnt of DefaultFullHttpRequest#content was already released at step 4. Trying to encode again sometimes (not always. it depends on exact moment of the connection reset) causes java.lang.IndexOutOfBoundsException to be thrown at here, which leaves HttpObjectEncoder#state in ST_CONTENT_NON_CHUNK state. io.netty.util.IllegalReferenceCountException: refCnt: 0, decrement: 1 is thrown after at here.
  6. Subsequent requests with the same server (assuming using the same TCP connection) will always meet with error unexpected message type: DefaultFullHttpRequest, state: 1 because the instance ofHttpObjectEncoder was locked in ST_CONTENT_NON_CHUNK state at step 7.

Stack trace when the above situation happens, that is io.netty.util.IllegalReferenceCountException: refCnt: 0, decrement: 1 and then unexpected message type: DefaultFullHttpRequest, state: 1.

2021-04-29 16:55:17.008  WARN   --- [reactor-http-nio-3] r.netty.http.client.HttpClientConnect    : [id: 0xe1894edd, L:/127.0.0.1:65440 - R:/127.0.0.1:11001] The connection observed an error
io.netty.handler.codec.EncoderException: io.netty.util.IllegalReferenceCountException: refCnt: 0, decrement: 1
  at io.netty.handler.codec.MessageToMessageEncoder.write(MessageToMessageEncoder.java:107)
  at io.netty.channel.CombinedChannelDuplexHandler.write(CombinedChannelDuplexHandler.java:346)
  at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:715)
  at io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:762)
  at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:788)
  at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:756)
  at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:806)
  at io.netty.channel.DefaultChannelPipeline.writeAndFlush(DefaultChannelPipeline.java:1025)
  at io.netty.channel.AbstractChannel.writeAndFlush(AbstractChannel.java:294)
  at reactor.netty.http.HttpOperations.lambda$send$0(HttpOperations.java:118)
  at reactor.core.publisher.FluxFlatMap.trySubscribeScalarMap(FluxFlatMap.java:151)
  at reactor.core.publisher.MonoFlatMap.subscribeOrReturn(MonoFlatMap.java:53)
  at reactor.core.publisher.Mono.subscribe(Mono.java:4090)
  at reactor.netty.NettyOutbound.subscribe(NettyOutbound.java:329)
  at reactor.core.publisher.MonoSource.subscribe(MonoSource.java:58)
  at reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:52)
  at reactor.netty.http.client.HttpClientConnect$HttpIOHandlerObserver.onStateChange(HttpClientConnect.java:430)
  at reactor.netty.ReactorNetty$CompositeConnectionObserver.onStateChange(ReactorNetty.java:503)
  at reactor.netty.resources.PooledConnectionProvider$DisposableAcquire.run(PooledConnectionProvider.java:571)
  at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
  at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
  at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
  at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
  at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
  at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
  at java.lang.Thread.run(Thread.java:745)
Caused by: io.netty.util.IllegalReferenceCountException: refCnt: 0, decrement: 1
  at io.netty.util.internal.ReferenceCountUpdater.toLiveRealRefCnt(ReferenceCountUpdater.java:74)
  at io.netty.util.internal.ReferenceCountUpdater.release(ReferenceCountUpdater.java:138)
  at io.netty.buffer.AbstractReferenceCountedByteBuf.release(AbstractReferenceCountedByteBuf.java:100)
  at io.netty.handler.codec.http.DefaultFullHttpRequest.release(DefaultFullHttpRequest.java:102)
  at io.netty.util.ReferenceCountUtil.release(ReferenceCountUtil.java:88)
  at io.netty.handler.codec.MessageToMessageEncoder.write(MessageToMessageEncoder.java:91)
  ... 25 common frames omitted
2021-04-29 16:55:28.282 ERROR   --- [reactor-http-nio-3] r.n.http.server.HttpServerOperations     : [id: 0xdd50934f, L:/0:0:0:0:0:0:0:1:9001 - R:/0:0:0:0:0:0:0:1:65450] Error starting response. Replying error status
io.netty.handler.codec.EncoderException: java.lang.IllegalStateException: unexpected message type: DefaultFullHttpRequest, state: 1
  at io.netty.handler.codec.MessageToMessageEncoder.write(MessageToMessageEncoder.java:107)
  at io.netty.channel.CombinedChannelDuplexHandler.write(CombinedChannelDuplexHandler.java:346)
  at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:715)
  at io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:762)
  at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:788)
  at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:756)
  at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:806)
  at io.netty.channel.DefaultChannelPipeline.writeAndFlush(DefaultChannelPipeline.java:1025)
  at io.netty.channel.AbstractChannel.writeAndFlush(AbstractChannel.java:294)
  at reactor.netty.http.HttpOperations.lambda$send$0(HttpOperations.java:118)
  at reactor.core.publisher.FluxFlatMap.trySubscribeScalarMap(FluxFlatMap.java:151)
  at reactor.core.publisher.MonoFlatMap.subscribeOrReturn(MonoFlatMap.java:53)
  at reactor.core.publisher.Mono.subscribe(Mono.java:4090)
  at reactor.netty.NettyOutbound.subscribe(NettyOutbound.java:329)
  at reactor.core.publisher.MonoSource.subscribe(MonoSource.java:58)
  at reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:52)
  at reactor.netty.http.client.HttpClientConnect$HttpIOHandlerObserver.onStateChange(HttpClientConnect.java:430)
  at reactor.netty.ReactorNetty$CompositeConnectionObserver.onStateChange(ReactorNetty.java:503)
  at reactor.netty.resources.PooledConnectionProvider$DisposableAcquire.run(PooledConnectionProvider.java:571)
  at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
  at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
  at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
  at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
  at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
  at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
  at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalStateException: unexpected message type: DefaultFullHttpRequest, state: 1
  at io.netty.handler.codec.http.HttpObjectEncoder.encode(HttpObjectEncoder.java:86)
  at io.netty.handler.codec.http.HttpClientCodec$Encoder.encode(HttpClientCodec.java:167)
  at io.netty.handler.codec.MessageToMessageEncoder.write(MessageToMessageEncoder.java:89)
  ... 25 common frames omitted

Possible Solution

Disable the retry feature of HttpClient.

HttpClient.create().disableRetry(true)

With current implementation, retries of any HTTP request with a body (POST, PUT, etc) in case of connection reset are not safe. This feature should probably be removed (or at least disabled by default).

Your Environment

  • Reactor version(s) used: reactor-netty 0.9.18.RELEASE. reactor-core 3.3.15.RELEASE
  • Other relevant libraries versions (eg. netty, ...): netty 4.1.60.Final
  • JVM version (java -version):java version "11" 2018-09-25
  • OS and version (eg. uname -a):Darwin MacBook-Pro.local 20.3.0 Darwin Kernel Version 20.3.0: Thu Jan 21 00:07:06 PST 2021; root:xnu-7195.81.3~1/RELEASE_X86_64 x86_64

Activity

added this to the 0.9.20.RELEASE milestone on May 6, 2021
self-assigned this
on May 6, 2021
added a commit that references this issue on May 6, 2021
0b1df15
violetagg

violetagg commented on May 6, 2021

@violetagg
Member

@xiazuojie Are you able to test this PR #1625

xiazuojie

xiazuojie commented on May 7, 2021

@xiazuojie
Author

@xiazuojie Are you able to test this PR #1625

@violetagg, I've checked the PR. It should fix my specific scenario.
However, I'm not so sure that retries should be part of a library feature, especially when enabled by default. It's prone to errors and side effects. In my opinion, it's better to have the decision (of whether to retry or not) in business domain.

violetagg

violetagg commented on May 7, 2021

@violetagg
Member

@xiazuojie Are you able to test this PR #1625

@violetagg, I've checked the PR. It should fix my specific scenario.
However, I'm not so sure that retries should be part of a library feature, especially when enabled by default. It's prone to error and side effects. In my opinion, it's better to have the decision (of whether to retry or not) in business domain.

And for that purpose you can disable the retries if this is needed.

added a commit that references this issue on May 7, 2021
ba08f38
xiazuojie

xiazuojie commented on May 7, 2021

@xiazuojie
Author

@xiazuojie Are you able to test this PR #1625

@violetagg, I've checked the PR. It should fix my specific scenario.
However, I'm not so sure that retries should be part of a library feature, especially when enabled by default. It's prone to error and side effects. In my opinion, it's better to have the decision (of whether to retry or not) in business domain.

And for that purpose you can disable the retries if this is needed.

Yes, it can be opted out. But it probably should be an opt-in.
If I am not mistaken, I've not seen other HTTP clients with retry enabled by default.

added a commit that references this issue on May 7, 2021
7372f26

11 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Relationships

None yet

    Participants

    @violetagg@xiazuojie

    Issue actions

      HttpClient retry may cause "unexpected message type: DefaultFullHttpRequest, state: 1" · Issue #1622 · reactor/reactor-netty