Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate PING when Redis connection got reconnected #3412

Closed
empty117 opened this issue Feb 9, 2021 · 6 comments
Closed

Duplicate PING when Redis connection got reconnected #3412

empty117 opened this issue Feb 9, 2021 · 6 comments
Labels
Milestone

Comments

@empty117
Copy link

empty117 commented Feb 9, 2021

When a blockQueue executed with TIMEOUT, it will always trigger reconnecting via one connection. PING will trigger again when connection reconnected, but previous PING will still ongoing.

ps: 3302 fix will reduce the possibility of reconnecting, but whenever reconnecting happen, duplication ping will happen.

Expected behavior

Redis PING performed one time per interval per connection

Actual behavior

Redis PING performed mutiple times per interval per connection when this connection got reconnected

Steps to reproduce or test case

public class BlockingOperationTest {
    public static void main(String[] args) throws InterruptedException {
        Config config = new Config();
        config.setCodec(StringCodec.INSTANCE).useSingleServer().setAddress("redis://localhost:6379")
                .setPingConnectionInterval(1000)
                .setTimeout(200)
                .setConnectionMinimumIdleSize(1);
        RedissonClient redisson = Redisson.create(config);

        RBlockingQueue<String> queue = redisson.getBlockingQueue("testQueue");
        for(;;){
            try {
//                System.out.println("start poll");
                queue.poll(3, TimeUnit.SECONDS);
                //reconnect every time
//                System.out.println("end poll");
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
        }
    }
}

Redis version

ANY

Redisson version
3.13.6

Redisson configuration
ANY

@empty117
Copy link
Author

empty117 commented Feb 9, 2021

maybe add ctx.isRemoved() in PingConnectionHandler can help

config.getTimer().newTimeout(new TimerTask() {
            @Override
            public void run(Timeout timeout) throws Exception {
                if (connection.isClosed() || ctx.isRemoved()) {
                    return;
                }
                if (future != null
                        && (future.cancel(false) || !future.isSuccess())) {
                    ctx.channel().close();
                    if (future.cause() != null) {
                        log.error("Unable to send PING command over channel: " + ctx.channel(), future.cause());
                    }
                    log.debug("channel: {} closed due to PING response timeout set in {} ms", ctx.channel(), config.getPingConnectionInterval());
                } else {
                    sendPing(ctx);
                }
            }
        }, config.getPingConnectionInterval(), TimeUnit.MILLISECONDS);

@mrniko mrniko added this to the 3.15.1 milestone Feb 9, 2021
@mrniko mrniko added the bug label Feb 9, 2021
@mrniko
Copy link
Member

mrniko commented Feb 9, 2021

Fixed! Thanks for suggestion

@786991884
Copy link

786991884 commented Feb 20, 2021

我认为 #3302
+1s 太容易触发连接失效,在测试中会发现大量连接断了重新连接,+1 在数据包比较大 或者网络慢的场景下是不保险的,ctx.isRemoved()(实际上就是当前的channel关闭后就直接结束掉)解决不了断连的问题,我建议官方增加一个参数,毕竟网络条件不一致,还有个问题就是我觉得大量重连吞吐可能会下降

目前测试中还是有异常出现,过早的认为连接需要重连会导致这两个问题:1 连接关闭 2 连接超时
过早断连 触发到新的阻塞 会导致ping timeout
触发到上一次不应该断连的的下次ping 会导致通道关闭错误,
所以我觉得光有一个ctx判断可能还不能覆盖所有问题,实际测试也确实会出现上面的2种异常

在我们的实际网络环境下+2s可以比较好的解决了这些异常

in my submissson (#3302)
+1s is too easy to trgger connection faiure. In the test, you will find that a large number of connections are disconnected and reconnected, and +1s is relatively large in the data packet or it's not safe when the netword is slow, ctx.isRemoved() (in fact, the current channel will be berminated immediately after it is shutdown). I suggest that the government addd a parmeter to solve the problem. After all, the netword conditions are inconsistent. Another problem is that I think the throughput of a large number of reconnections my decrease.
At present, there are still some exceptions is the test. early thinking that the connecctions needs to be reconnected will lead to these tow problems:

  1. the connection is closed.
  2. the connection times out.
    premature disconnection triggers a new block, which results in a ping timeout.
    triggering the next Ping that should not be disconnected will result in channel close error.
    So I think a ctx.removed() judgment alone my not cover all problems, and the above two kinds of exceptions will indeed appear in the actual test.
    In our actual network environment, + 2S can better solve these anomalies

@mrniko
Copy link
Member

mrniko commented Feb 20, 2021

@786991884

  • 2S can better solve these anomalies

What does it mean?

Can you provide test code to reproduce the issue?

@786991884
Copy link

重写RedisExecutor.java的363行代码popTimeout+1,修改为popTimeout+2

rewrite RedisExecutor.java 363 lines of code poptimeout + 1, changed to poptimeout + 2

@786991884
Copy link

  1. the connection is closed.

org.redisson.client.WriteRedisConnectionException: Unable to write command into connection! Node source: NodeSource [slot=0, addr=null, redisClient=null, redirect=null, entry=null], connection: RedisConnection@1898325484 [redisClient=[addr=redis://10.111.17.89:6379], channel=[id: 0x129e9fa4, L:/10.65.140.249:4596 ! R:/10.111.17.89:6379], currentCommand=CommandData [promise=RedissonPromise [promise=ImmediateEventExecutor$ImmediatePromise@67a25fd1(success)], command=(BLPOP), params=[KEC:MESSAGE:MIGRATE_VM_TO_DEDICATED_QUEUE:cn-beijing-fin, 13], codec=org.redisson.codec.JsonJacksonCodec]], command: (BLPOP), params: [KEC:MESSAGE:CROSS_INSTANCE_MIGRATE_QUEUE:cn-taipei-1, 11] after 0 retry attempts
at org.redisson.command.RedisExecutor.checkWriteFuture(RedisExecutor.java:275) ~[redisson-3.15.0.jar!/:3.15.0]
at org.redisson.command.RedisExecutor.access$100(RedisExecutor.java:58) ~[redisson-3.15.0.jar!/:3.15.0]
at org.redisson.command.RedisExecutor$1.operationComplete(RedisExecutor.java:156) ~[redisson-3.15.0.jar!/:3.15.0]
at org.redisson.command.RedisExecutor$1.operationComplete(RedisExecutor.java:153) ~[redisson-3.15.0.jar!/:3.15.0]
at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:577) ~[netty-common-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:551) ~[netty-common-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:490) ~[netty-common-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:615) ~[netty-common-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:608) ~[netty-common-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:117) ~[netty-common-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:993) ~[netty-transport-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:865) ~[netty-transport-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1367) ~[netty-transport-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:715) ~[netty-transport-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:762) ~[netty-transport-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.channel.AbstractChannelHandlerContext$WriteTask.run(AbstractChannelHandlerContext.java:1089) ~[netty-transport-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) ~[netty-common-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) ~[netty-common-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500) ~[netty-transport-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) ~[netty-common-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[netty-common-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[netty-common-4.1.45.Final.jar!/:4.1.45.Final]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_121]
Caused by: java.nio.channels.ClosedChannelException
at io.netty.channel.AbstractChannel$AbstractUnsafe.newClosedChannelException(AbstractChannel.java:957) ~[netty-transport-4.1.45.Final.jar!/:4.1.45.Final]
... 12 more

  1. the connection times out.

org.redisson.client.RedisTimeoutException: Command still hasn't been written into connection! Avoid to use blocking commands in Async/JavaRx/Reactive handlers. Try to increase nettyThreads setting. Payload size in bytes: 0. Node source: NodeSource [slot=0, addr=null, redisClient=null, redirect=null, entry=null], connection: RedisConnection@2084555758 [redisClient=[addr=redis://10.111.17.89:6379], channel=[id: 0xe60b38ab, L:/10.65.140.249:5009 - R:/10.111.17.89:6379], currentCommand=CommandData [promise=RedissonPromise [promise=ImmediateEventExecutor$ImmediatePromise@7d093eda(failure: org.redisson.client.RedisTimeoutException: Command execution timeout for command: (PING), params: [], Redis client: [addr=redis://10.111.17.89:6379])], command=(PING), params=[], codec=org.redisson.client.codec.StringCodec]], command: (BLPOP), params: [KEC:MESSAGE:CROSS_INSTANCE_MIGRATE_QUEUE:eu-east-1, 14] after 3 retry attempts
at org.redisson.command.RedisExecutor$2.run(RedisExecutor.java:205) ~[redisson-3.15.0.jar!/:3.15.0]
at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:672) ~[netty-common-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:747) ~[netty-common-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:472) ~[netty-common-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[netty-common-4.1.45.Final.jar!/:4.1.45.Final]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_121]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

3 participants