Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ping exception problem under the mode of redisson blocking queue #3405

Closed
786991884 opened this issue Feb 7, 2021 · 5 comments
Closed

ping exception problem under the mode of redisson blocking queue #3405

786991884 opened this issue Feb 7, 2021 · 5 comments
Labels
Milestone

Comments

@786991884
Copy link

786991884 commented Feb 7, 2021

Expected behavior
使用redisson阻塞队列(带超时时间的阻塞方式poll(timeout,timeunit)或者pollAsync(timeout,timeunit))的这种场景下:
如果我开启了Ping,就会有如下几种异常出现:
1、PingConnectionHandler类报异常,之前分析过这个类,这个异常报得特别多 一晚上将近10w条
PingConnectionHandler第89行报错 :
2021-02-04 18:47:53.613 [TID: N/A] ERROR 16545 --- [redisson-timer-4-1] o.r.c.h.PingConnectionHandler :89 : {} Unable to send PING command over channel: [id: 0xe7012642, L:/xxxxxxxx:50954 ! R:/xxxxxx:6379]

org.redisson.client.RedisTimeoutException: Command execution timeout for command: (PING), params: [], Redis client: [addr=redis://xxxx:6379]
at org.redisson.client.RedisConnection.lambda$async$1(RedisConnection.java:207) ~[redisson-3.15.0.jar!/:3.15.0]
at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:672) [netty-common-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:747) [netty-common-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:472) [netty-common-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-common-4.1.45.Final.jar!/:4.1.45.Final]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_121]

2、RedisExecutor类异常,堆栈如下
at org.redisson.command.RedisExecutor$2.run(RedisExecutor.java:205) ~[redisson-3.15.0.jar!/:3.15.0]
at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:672) ~[netty-common-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:747) ~[netty-common-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:472) ~[netty-common-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[netty-common-4.1.45.Final.jar!/:4.1.45.Final]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_121]

3、Caused by: org.redisson.client.WriteRedisConnectionException: Channel has been closed!
org.redisson.client.WriteRedisConnectionException: Unable to write command into connection! Node source: NodeSource [slot=0, addr=null, redisClient=null, redirect=null, entry=null], connection: RedisConnection@1927498671 [redisClient=[addr=redis://xxxx:6379], channel=[id: 0x5fdf22a7, L:/10.65.140.249:11565 ! R:/10.111.17.89:6379], currentCommand=CommandData [promise=RedissonPromise [promise=ImmediateEventExecutor$ImmediatePromise@23022fc(failure: org.redisson.client.RedisTimeoutException: Command execution timeout for command: (PING), params: [], Redis client: [addr=redis://xxx:6379])], command=(PING), params=[], codec=org.redisson.client.codec.StringCodec]], command: (BLPOP), params: [KEC:MESSAGE:MIGRATE_VM_TO_DEDICATED_QUEUE:cn-shanghai-fin, 14] after 3 retry attempts
at org.redisson.command.RedisExecutor.checkWriteFuture(RedisExecutor.java:275) ~[redisson-3.15.0.jar!/:3.15.0]
at org.redisson.command.RedisExecutor.access$100(RedisExecutor.java:58) ~[redisson-3.15.0.jar!/:3.15.0]
at org.redisson.command.RedisExecutor$1.operationComplete(RedisExecutor.java:156) ~[redisson-3.15.0.jar!/:3.15.0]
at org.redisson.command.RedisExecutor$1.operationComplete(RedisExecutor.java:153) ~[redisson-3.15.0.jar!/:3.15.0]
at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:577) ~[netty-common-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:551) ~[netty-common-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:490) ~[netty-common-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:615) ~[netty-common-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:608) ~[netty-common-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:117) ~[netty-common-4.1.45.Final.jar!/:4.1.45.Final]
at org.redisson.client.handler.CommandsQueue.channelInactive(CommandsQueue.java:74) ~[redisson-3.15.0.jar!/:3.15.0]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:260) ~[netty-transport-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:246) ~[netty-transport-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:239) ~[netty-transport-4.1.45.Final.jar!/:4.1.45.Final]
at org.redisson.client.handler.ConnectionWatchdog.channelInactive(ConnectionWatchdog.java:82) ~[redisson-3.15.0.jar!/:3.15.0]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:260) ~[netty-transport-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:246) ~[netty-transport-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:239) ~[netty-transport-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1405) ~[netty-transport-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:260) ~[netty-transport-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:246) ~[netty-transport-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:901) ~[netty-transport-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:818) ~[netty-transport-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) ~[netty-common-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) ~[netty-common-4.1.45.Final.jar!/:4.1.45.Final]

如果我关闭了PING则会出现下面这样一种异常:
2021-02-07 10:35:57.250 [TID: N/A] ERROR 32018 --- [redisson-timer-4-1] o.r.c.h.PingConnectionHandler :89 : {} Unable to send PING command over channel: [id: 0x2c682c89, L:/1xxx:16100 ! R:/10xxx9:6379]

org.redisson.client.WriteRedisConnectionException: Channel has been closed! Can't write command: (PING), params: [] to channel: [id: 0x50381f72, L:/10.111.17.89:21618 ! R:/10.111.17.89:6379]
at org.redisson.client.handler.CommandsQueue.channelInactive(CommandsQueue.java:76) ~[redisson-3.15.0.jar!/:3.15.0]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:260) ~[netty-transport-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:246) ~[netty-transport-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:239) ~[netty-transport-4.1.45.Final.jar!/:4.1.45.Final]
at org.redisson.client.handler.ConnectionWatchdog.channelInactive(ConnectionWatchdog.java:82) ~[redisson-3.15.0.jar!/:3.15.0]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:260) ~[netty-transport-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:246) ~[netty-transport-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:239) ~[netty-transport-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1405) ~[netty-transport-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:260) ~[netty-transport-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:246) ~[netty-transport-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:901) ~[netty-transport-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:818) ~[netty-transport-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) ~[netty-common-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) ~[netty-common-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500) ~[netty-transport-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) ~[netty-common-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[netty-common-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-common-4.1.45.Final.jar!/:4.1.45.Final]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_121]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500) ~[netty-transport-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) ~[netty-common-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[netty-common-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[netty-common-4.1.45.Final.jar!/:4.1.45.Final]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_121]
Caused by: org.redisson.client.WriteRedisConnectionException: Channel has been closed! Can't write command: (BLPOP), params: [KEC:MESSAGE:MIGRATE_VM_TO_DEDICATED_QUEUE:cn-shanghai-fin, 14] to channel: [id: 0x5fdf22a7, L:/10.65.140.249:11565 ! R:/10.111.17.89:6379]
at org.redisson.client.handler.CommandsQueue.channelInactive(CommandsQueue.java:76) ~[redisson-3.15.0.jar!/:3.15.0]

Actual behavior
个人分析的结果就是在阻塞连接的断开和发送PING的时候有并发问题引起。
为证实猜想我在每次阻塞获取到数据并在处理完之后,让程序休眠1s后,PingConnectionHandler这个类的异常消失了。但是会出现另外一个异常:
2021-02-07 00:25:10.759 [TID: N/A] ERROR 32018 --- [redisson-timer-4-1] o.r.c.h.PingConnectionHandler :89 : {} Unable to send PING command over channel: [id: 0xdb83a2ad, L:/xxx9:37602 ! R:/1xxx9:6379]

org.redisson.client.WriteRedisConnectionException: Channel has been closed! Can't write command: (PING), params: [] to channel: [id: 0xaafedd5b, L:/10.111.xxx.xx:38828 ! R:/10.111.17.89:6379]
at org.redisson.client.handler.CommandsQueue.channelInactive(CommandsQueue.java:76) ~[redisson-3.15.0.jar!/:3.15.0]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:260) ~[netty-transport-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:246) ~[netty-transport-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:239) ~[netty-transport-4.1.45.Final.jar!/:4.1.45.Final]
at org.redisson.client.handler.ConnectionWatchdog.channelInactive(ConnectionWatchdog.java:82) ~[redisson-3.15.0.jar!/:3.15.0]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:260) ~[netty-transport-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:246) ~[netty-transport-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:239) ~[netty-transport-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1405) ~[netty-transport-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:260) ~[netty-transport-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:246) ~[netty-transport-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:901) ~[netty-transport-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:818) ~[netty-transport-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) ~[netty-common-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) ~[netty-common-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500) ~[netty-transport-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) ~[netty-common-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[netty-common-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-common-4.1.45.Final.jar!/:4.1.45.Final]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_121]

但是异常2

Steps to reproduce or test case
使用多个阻塞方式的Redisson队列,阻塞时间10s-20s,不停的循环阻塞获取队列元素,异常在一天之内就会出现

Redis version
2.8
Redisson version
3.15.0
Redisson configuration

@mrniko
Copy link
Member

mrniko commented Feb 10, 2021

fixed in #3412

@mrniko
Copy link
Member

mrniko commented Feb 10, 2021

You can check it with version attached

redisson-3.15.1-SNAPSHOT.jar.zip

@786991884
Copy link
Author

786991884 commented Feb 20, 2021

我认为 #3302
+1s 太容易触发连接失效,在测试中会发现大量连接断了重新连接,+1 在数据包比较大 或者网络慢的场景下是不保险的,ctx.isRemoved()(实际上就是当前的channel关闭后就直接结束掉)解决不了断连的问题,我建议官方增加一个参数,毕竟网络条件不一致,还有个问题就是我觉得大量重连吞吐可能会下降

目前测试中还是有异常出现,过早的认为连接需要重连会导致这两个问题:1 连接关闭 2 连接超时
过早断连 触发到新的阻塞 会导致ping timeout
触发到上一次不应该断连的的下次ping 会导致通道关闭错误,
所以我觉得光有一个ctx判断可能还不能覆盖所有问题,实际测试也确实会出现上面的2种异常

在我们的实际网络环境下+2s可以比较好的解决了这些异常

in my submissson (#3302)
+1s is too easy to trgger connection faiure. In the test, you will find that a large number of connections are disconnected and reconnected, and +1s is relatively large in the data packet or it's not safe when the netword is slow, ctx.isRemoved() (in fact, the current channel will be berminated immediately after it is shutdown). I suggest that the government addd a parmeter to solve the problem. After all, the netword conditions are inconsistent. Another problem is that I think the throughput of a large number of reconnections my decrease.
At present, there are still some exceptions is the test. early thinking that the connecctions needs to be reconnected will lead to these tow problems:

the connection is closed.
the connection times out.
premature disconnection triggers a new block, which results in a ping timeout.
triggering the next Ping that should not be disconnected will result in channel close error.
So I think a ctx.removed() judgment alone my not cover all problems, and the above two kinds of exceptions will indeed appear in the actual test.
In our actual network environment, + 2S can better solve these anomalies

@ZaNkss
Copy link

ZaNkss commented Jan 20, 2022

我的程序在发生这个异常时CPU占有率会快速上升,有相同的问题吗

@mrniko
Copy link
Member

mrniko commented May 31, 2022

Fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

3 participants