Skip to content

dubbo provider数量较大且频繁重启,造成consumer端服务器频繁FullGC,且FullGC一直持续 #376

Closed
@zhanghw89

Description

@zhanghw89

当dubbo provider数量较大时,进行频繁重启,会造成dubbo的consumer端服务器频繁FullGC,且FullGC一直持续造成服务不可用,必须重启Comsumer端的服务才能停止垃圾回收。

Activity

huifrank

huifrank commented on Feb 22, 2017

@huifrank

可否详细描述下?provider达到多少时会触发频繁fullgc? provider数量较大指的是一个应用发布多个服务,还是有多个应用? consumer服务器指的是订阅该服务的服务器还是所有服务器?
我们以后可能也会有大量的提供端,然后频繁重启服务 
现在我想先重现一下你说的问题

zhanghw89

zhanghw89 commented on Feb 22, 2017

@zhanghw89
Author

我们线上的环境是一个应用里provider大概有300个,provider应用台数大概有5台左右,对全部的5台provider应用同时进行重启,在这个时候consumer持续进行服务调用,就会发生consumer端的fullGC。通过观察jvm的dump发现内存中存有大量以下实例对象:
com.alibaba.dubbo.registry.zookeeper.ZookeeperRegistry
com.alibaba.dubbo.common.URL
com.alibaba.dubbo.registry.integration.RegistryDirectory

详情如下:
Problem Suspect 1

One instance of "com.alibaba.dubbo.registry.zookeeper.ZookeeperRegistry" loaded by "sun.misc.Launcher$AppClassLoader @ 0x80021b28" occupies 250,412,952 (17.39%) bytes. The memory is accumulated in one instance of "com.alibaba.dubbo.registry.zookeeper.ZookeeperRegistry" loaded by "sun.misc.Launcher$AppClassLoader @ 0x80021b28".

Keywords
sun.misc.Launcher$AppClassLoader @ 0x80021b28
com.alibaba.dubbo.registry.zookeeper.ZookeeperRegistry
Details »
Problem Suspect 2

88,920 instances of "com.alibaba.dubbo.common.URL", loaded by "sun.misc.Launcher$AppClassLoader @ 0x80021b28" occupy 473,116,384 (32.86%) bytes.

Keywords
com.alibaba.dubbo.common.URL
sun.misc.Launcher$AppClassLoader @ 0x80021b28
Details »
Problem Suspect 3

30,458 instances of "com.alibaba.dubbo.registry.integration.RegistryDirectory", loaded by "sun.misc.Launcher$AppClassLoader @ 0x80021b28" occupy 454,550,800 (31.57%) bytes. These instances are referenced from one instance of "java.util.concurrent.ConcurrentHashMap$Node[]", loaded by ""

Keywords
java.util.concurrent.ConcurrentHashMap$Node[]
sun.misc.Launcher$AppClassLoader @ 0x80021b28
com.alibaba.dubbo.registry.integration.RegistryDirectory
Details »
Problem Suspect 4

1,638,063 instances of "java.lang.String", loaded by "" occupy 200,439,840 (13.92%) bytes.

Keywords
java.lang.String
Details »
Hint 1

The problem suspects 1 and 3 may be related, because the reference chains to them have a common beginning.
Details »

zhanghw89

zhanghw89 commented on Feb 22, 2017

@zhanghw89
Author

其中的解决方法是,将服务拆分成尽可能小的粒度进行服务发布,这样就能避免在同一时间有大量的垃圾造成consumer的fullGC。

YoungHu

YoungHu commented on Mar 21, 2017

@YoungHu
Contributor

因为消费端一直在监听zk的服务节点,你同时注销然后注册300个服务,消费端也需要销毁之前建立的实例重新new实例出来。内存消耗比较大

YoungHu

YoungHu commented on Mar 21, 2017

@YoungHu
Contributor

我这里也有一个问题。dubbo注册服务的时候zk的最细路径到接口名称,我注册服务用同一个接口,但是group名称不一样,这样会导致的问题就是在我服务端重启的时候,消费端更新服务信息的时候会把zk服务器的流量打满,导致短时服务不可用的结果。比如我一个接口,有100个实现,都是通过group来区分并发布服务的,1台机器的话,zk的path下面list的size为100,一个元素的信息量大小大概是1K,服务端重启,每注册一次接口,消费端都要去读取一次zk的节点数据,读取一次100K,注册100个服务就要读取100次,完成一次重启,zk写出流量就是10M,这是1台服务端服务器重启,集群的话重启一次的流量就是10MxN(消费者节点数)xM(提供者节点数),除了服务拆分,有没有更好的办法

qct

qct commented on Mar 21, 2017

@qct

和这个是一个问题 #306

YoungHu

YoungHu commented on May 28, 2017

@YoungHu
Contributor

最后这个问题还是解决了,目前生产运行没有什么问题。我的解决方式就是在zk的interface下面再加group和version的节点,这样服务更新的时候dubbo就能依据interface+group+version准确的进行通知,减少重复数据同步占用带宽。

foreveryang321

foreveryang321 commented on Jul 6, 2017

@foreveryang321
Contributor

@YoungHu 请问一下,这个“在zk的interface下面再加group和version的节点”是怎么配置的,没明白这句话的意思

Tong-c

Tong-c commented on Aug 5, 2017

@Tong-c

@foreveryang321 ,http://dubbo.io/user-guide/reference-registry/zookeeper.html
文檔下面提到了group,而對外提供的服務接口里可以設置version,也可以設置group
,http://dubbo.io/user-guide/reference-xmlconf/dubbo-service.html

taige

taige commented on Aug 8, 2017

@taige

Proxy.java有内存泄露的BUG。
试试这个fix:taige@f869f0f

chickenlj

chickenlj commented on Aug 22, 2017

@chickenlj
Contributor

@taige 这个泄露的原理能解释下吗?

diecui1202

diecui1202 commented on Jul 31, 2018

@diecui1202

我们线上的环境是一个应用里provider大概有300个,provider应用台数大概有5台左右,对全部的5台provider应用同时进行重启,在这个时候consumer持续进行服务调用,就会发生consumer端的fullGC
////
You need to restart the app by groups. For example, 3 groups, 2, 2, 1.

Make sure there has providers online anytime.

diecui1202

diecui1202 commented on Jul 31, 2018

@diecui1202

Feel free to reopen it. &READY-TO-CLOSE&

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @diecui1202@qct@taige@foreveryang321@YoungHu

        Issue actions

          dubbo provider数量较大且频繁重启,造成consumer端服务器频繁FullGC,且FullGC一直持续 · Issue #376 · apache/dubbo