-
-
Notifications
You must be signed in to change notification settings - Fork 16.1k
Closed
Labels
Description
Expected behavior
No java.lang.OutOfMemoryError
Actual behavior
[2017-06-02T11:00:00,021][WARN ][c.f.s.h.SearchGuardHttpServerTransport] [GtfsfZB] caught exception while handling client http traffic, closing connection [id: 0x23038e55, L:/172.16.0.1:9200 - R:/172.16.0.1:40464] java.lang.OutOfMemoryError: Direct buffer memory at java.nio.Bits.reserveMemory(Bits.java:693) ~[?:1.8.0_131] at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) ~[?:1.8.0_131] at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) ~[?:1.8.0_131] at io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:764) ~[netty-buffer-4.1.11.Final.jar:4.1.11.Final] at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:740) ~[netty-buffer-4.1.11.Final.jar:4.1.11.Final] at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:244) ~[netty-buffer-4.1.11.Final.jar:4.1.11.Final] at io.netty.buffer.PoolArena.allocate(PoolArena.java:226) ~[netty-buffer-4.1.11.Final.jar:4.1.11.Final] at io.netty.buffer.PoolArena.allocate(PoolArena.java:146) ~[netty-buffer-4.1.11.Final.jar:4.1.11.Final] at io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:324) ~[netty-buffer-4.1.11.Final.jar:4.1.11.Final]
Steps to reproduce
Minimal yet complete reproducer code (or URL to code)
Netty version
4.1.11 with tcnative 2.0.1.Final
JVM version (e.g. java -version
)
1.8.0_131
OS version (e.g. uname -a
)
Ubuntu
Metadata
Metadata
Assignees
Labels
Type
Projects
Milestone
Relationships
Development
Select code repository
Activity
floragunn commentedon Jun 4, 2017
maybe related to #6789
rkapsi commentedon Jun 4, 2017
@floragunncom does your Netty app create and destroy a lot of SslContext instances? The leak in #6789 is a bit of native memory that gets allocated when a SslContext gets created and then not freed when it gets Garbage Collected by the JVM.
floragunn commentedon Jun 4, 2017
no, its not creating a lot of SslContext instances, so seems unrelated to #6789
Scottmitch commentedon Jun 5, 2017
io.netty.maxDirectMemory
or-XX:MaxDirectMemorySize
and if so to what value(s)?normanmaurer commentedon Jun 6, 2017
Also did you "stop" writing when the Channel become non writable ?
floragunn commentedon Jun 26, 2017
The error ist still there with 4.1.12 and tcnative 2.0.3 - will start investigating this now a bit more deeper. Its definitively related to the amount of data which is transferred. For small datasets it works well but if amount of data increases it fails.
floragunn commentedon Jun 28, 2017
seems
-XX:+DisableExplicitGC
is causing the trouble (https://stackoverflow.com/questions/32912702/impact-of-setting-xxdisableexplicitgc-when-nio-direct-buffers-are-used) ... report back soonfloragunn commentedon Jul 2, 2017
After removing from the command line flags
-XX:+DisableExplicitGC
thejava.lang.OutOfMemoryError: Direct buffer memory
seems to be gone.But unfortunately this is not really an option for us because in production we have no control over the JVM flags and so we have to deal with
-XX:+DisableExplicitGC
io.netty.maxDirectMemory
is not set explicitly and -XX:MaxDirectMemorySize is as much as Xmx (in our cases normally somewhere between 4 and 64Gb)It looks like the problem was introduced in netty 4.1.8 or 4.1.9 because 4.1.7 was reported stable. Unfortunately i am not able no create a minimal reproducer but i will assemble something which will demonstrate the problem.
Running netty without openssl (using Java SSL) work well for all versions and circumstances.
floragunn commentedon Jul 2, 2017
Pls download https://bintray.com/floragunncom/files/download_file?file_path=netty%2Fnetty-6813-1.tar.gz and extract it.
If you are on osx just run
bin/elasticsearch
and wait a few seconds than run in another terminal./load_sampledata.sh
unitl you see ajava.lang.OutOfMemoryError: Direct buffer memory
in your first console (normally after one or two times).If you are on linux look in
plugin/search-guard-5
and replace the tcnative jar with the one for linux.Without tcnative jar we fallback to Java SSL and all is running well. Remove
-XX:+DisableExplicitGC
fromconfig/jvm.options
and the OutOfMemoryError should not show up. If you increase-Xmx
or-XX:MaxDirectMemorySize
run./load_sampledata.sh
just a few time more and you see the error and the JVM dies.The issue was originally reported here https://github.com/floragunncom/search-guard/issues/343
floragunn commentedon Jul 10, 2017
@Scottmitch @normanmaurer ping
floragunn commentedon Jul 10, 2017
Looks like java/nio/Bits.java itself calls System.gc() OpenJDK source which will
in case of a present
-XX:+DisableExplicitGC
flag just do nothing and so it appears that the direct buffers did not get garbage collected fast enough.Curious is that i never hit this before tcnative 2.0.0. Unfortunately is just
removing
-XX:+DisableExplicitGC
not an option and i am running out of ideas.Relates JDK-8142537
normanmaurer commentedon Jul 11, 2017
floragunn commentedon Jul 11, 2017
Leak detector does not report any leaks (tried also in paranoid mode). I tried unpooled and pooled as well, no difference.
Scottmitch commentedon Jul 11, 2017
The OpenJDK invokes
System.gc()
during direct ByteBuffer allocation to provide a hint and hope for timely reclamation of directly memory by the GC (because these allocations are off heap). Using a properly tuned pooled allocator should reduce/remove the need to call ByteBuffer for allocations, and we even try to bypass the ByteBuffer allocation path for performance reasons by default (controlled viaio.netty.maxDirectMemory
[1]). It sounds like this occurs during a burst of activity ... are you tuning the pooled allocator bucket sizes for your application, and are you sure you are allowed for enough memory to be allocated for the increased load of traffic (how did you calculate the max)?Note that there were some memory leaks in netty-tcnative 2.0.0 and 2.0.1. However the known leaks have been fixed in 2.0.3.
[1] https://github.com/netty/netty/blob/4.1/common/src/main/java/io/netty/util/internal/PlatformDependent.java#L149
19 remaining items
brandond commentedon Aug 1, 2017
Result:
jvm.options for es:
floragunn commentedon Aug 1, 2017
With ES 5.5.1?
What is your java version and vendor?
Do you make big bulk requests? (just read that it happens even before cluster is green)
And pls. remove your extra MaxDirectMemorySize settings
brandond commentedon Aug 1, 2017
Here's the startup log, with ES and Java versions.
With netty-tcnative installed, I've never been able to get the cluster to stay up long enough to actually start making any queries. It OOMs while assigning shards.
brandond commentedon Aug 1, 2017
Just for the heck of it, I tried adding:
-XX:MaxDirectMemorySize=512m
to jvm.options. If I do this, it does not immediately get OOM killed. It does however chew through a ton of VM and eventually end up stalled in GC hell before I have to shut it down manually.floragunn commentedon Aug 1, 2017
Ok, so with Java SSL it works basically?
Have you set "bootstrap.memory_lock: true" in elasticsearch.yml?
Have you disabled hostname verification?
I am currently testing on AWS with the following parameters and it works without any hassle:
floragunn commentedon Aug 1, 2017
@Scottmitch @normanmaurer any ideas?
brandond commentedon Aug 1, 2017
memory lock and hostname verification are both off. You're running with 30GB heap with 61GB of RAM; I'm running with 20 of 31. I'll try dropping it down to 15GB with no MaxDirectMemorySize and memory_lock on and see if it makes any difference.
Edit: memory_lock was on. Trying again with smaller heap size.
brandond commentedon Aug 1, 2017
15GB heap and no MaxDirectMemorySize: OOM killed.
brandond commentedon Aug 1, 2017
10GB heap: no OOM kill. Seems to work?
For the record, here's what it looks like without netty-tcnative and 20gb heap:
Appears to essentially double the memory utilization?
Scottmitch commentedon Aug 2, 2017
@floragunncom - no ideas and limited cycles at the moment.
@brandond - can you provide a reproducer similar to #6813 (comment)?
jansimak commentedon Dec 8, 2017
Hi.
I can confirm it's still happening. During heavy bulk indexing or searching the heap goes up and never drops back.
Environment: heap size 30GB, jvm without -XX:+DisableExplicitGC, openjdk-8-jre-headless=8u141-b15-1~deb9u1, debian stretch, elasticsearch=5.6.0, netty-tcnative-openssl-1.0.2m-static-2.0.7
Is there anything I can do to help solve the problem? The issue forces us to use Java SSL, but openssl is preferable.
Thanks,
Honza
fbasar commentedon May 6, 2019
io.netty.channel.DefaultChannelPipeline - An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handler in the pipeline did not handle the exception.
java.lang.OutOfMemoryError: Direct buffer memory
at java.base/java.nio.Bits.reserveMemory(Bits.java:175)
at java.base/java.nio.DirectByteBuffer.(DirectByteBuffer.java:118)
at java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:317)
at io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:769)
at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:745)
at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:244)
at io.netty.buffer.PoolArena.allocate(PoolArena.java:226)
at io.netty.buffer.PoolArena.allocate(PoolArena.java:146)
at io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:332)
at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:185)
at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:176)
at io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:137)
at io.netty.channel.DefaultMaxMessagesRecvByteBufAllocator$MaxMessageHandle.allocate(DefaultMaxMessagesRecvByteBufAllocator.java:114)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:147)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:677)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:612)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:529)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:491)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:905)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:834)
Give error every day
Java 11.0.2
Netty 4.1.35.Final
TCNative 2.0.25.Final
-Xms2048m -Xmx2048m -server -verbosegc -XX:+HeapDumpOnOutOfMemoryError -XX:MaxDirectMemorySize=1024m -Dio.netty.tryReflectionSetAccessible=false -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimization=true -Dio.netty.recycler.maxCapacityPerThread=0 -server
mrkk commentedon May 29, 2019
why was the issue closed?
brandond commentedon May 29, 2019
Because it's hard to reproduce under controlled circumstances and easy to work around by just not using the tcnative openssl bindings.