Description
I'm on Go 1.4 and linux/amd64. I've noticed this since at least Go 1.2.
Large maps incur large GC pause times. This happens even when the map key and value types do not contain pointers, and when the map isn't changing between GC runs. I assume it comes from the collector traversing internal pointers in the map implementation.
Roughly speaking, with a single map containing millions of entries, one will typically see GC pause times of hundreds of ms.
Here's an example program that shows a few different approaches: http://play.golang.org/p/AeHSLyLz_c. In particular, it compares
- Large map with a pointer-typed value
- Large map with a pointer-free value
- Sharded map
- Big slice
In the real program where this issue caused me problems, I have a server that used a very large map (>10M entries) as an index into off-heap data. Even after getting rid of pointers in the map value and sharding the map, the pause times were >100ms which was too large for my use case. So I ended up writing a very simple hashmap implementation using only slices/indexes and that brought the pause times to under 1ms.
I wonder if the map implementation and GC can be made to work together better to mitigate this.
(It's possible that this is expected/unfortunate, or that it will get better with improved GC along with everything else, but I wanted to have a record of it to center the discussion.)
Activity
bradfitz commentedon Dec 30, 2014
/cc @randall77 @dvyukov @rsc @RLH
joliver commentedon Dec 31, 2014
The reference conversation for this support issue can be found here:
https://groups.google.com/forum/#!topic/golang-nuts/baU4PZFyBQQ
My findings at the end of the thread are here:
https://groups.google.com/forum/#!msg/golang-nuts/baU4PZFyBQQ/fCzQelfmbNYJ
randall77 commentedon Dec 31, 2014
This may be fixed in 1.5 with the concurrent gc. However, the work of scanning the hash tables will not go away, it will just be paid for in smaller chunks. Hash tables will still have overflow pointers so they will still need to be scanned and there are no plans to fix that. I'm open to ideas if anyone has them.
mish15 commentedon Jan 1, 2015
👍. I added some example cases with preallocated maps for comparison.
http://play.golang.org/p/E7z9npFXm-
@joliver agree with the slice issue too. We use arrays for very large blocks where possible, but it's annoying.
josharian commentedon Jan 2, 2015
@randall77 on pain of going into a rat hole, I ran @cespare's tests with a few different values of
loadFactor
.It's possible that in
loadFactor
will be worth revisiting, once the concurrent gc has stabilized.RLH commentedon Jan 5, 2015
Is the problem about how long a call to runtime.GC() takes or is it really
about GC latency, how much CPU time the application code (mutator) is
allotted over the course of some time delta, and how much HW needs to be
provisioned to achieved the goal? The 1.5 GC addresses the later. There are
no plans to address how long a call to runtime.GC() takes.
On Fri, Jan 2, 2015 at 2:46 PM, Josh Bleecher Snyder <
notifications@github.com> wrote:
joliver commentedon Jan 5, 2015
It actually is a problem with the garbage collection itself. The examples thus far call
runtime.GC()
to more easily demonstrate and expose the issue. In production I have a number of large maps and using the default implementation we have observed garbage collection pauses measuring hundreds of milliseconds. It was bad enough that we dug in to pinpoint exactly what was causing it.The biggest question being raised in this issue is whether optimizations could be made on certain kinds of structures such as maps where the contained type is known to not contain pointers, such as a
map[int]bool
, etc.RLH commentedon Jan 5, 2015
If go reduced the observed garbage collection pauses from hundreds of
milliseconds to 10 miliseconds out every 50 milliseconds would this solve
your problem?
On Mon, Jan 5, 2015 at 11:22 AM, Jonathan Oliver notifications@github.com
wrote:
rhysh commentedon Jan 5, 2015
I run an application that uses a lot of map and sees similar issues (~600ms pause time with a 1.6GB heap). Decreasing the pause to 10ms at a time would be a big help to this app. However, I wonder if the overall cost of GC could be decreased separately.
@randall77, I read through hashmap.go recently and it looks like the use of overflow buckets may be restricted enough that they could be allocated from a map-local arena instead of on the global heap. It may not have to lean on the general-purpose GC just to keep track of its overflow buckets.
It looks like overflow buckets aren't ever freed except during an evacuation, and that pointers to them are only accessible within small parts of the runtime. The memory for the overflow buckets could be backed by an array similar to hmap.buckets (with a bump-the-pointer allocator), and could be referenced by their offset into the array instead of a real pointer (which would be chased by the GC).
Is this approach possible?
siritinga commentedon Jan 5, 2015
I suppose that the total GC time would be the same or maybe slower if the start/stop takes some time, but if GC runs 20% of the time, instead of 600 ms pause, it would be 3 seconds at 80% your code, 20% the GC. Maybe it is a solution to avoid long pauses in interactive programs but the loss of performance is there in any case.
I wonder if it would be possible to produce garbage faster than it is collected...
randall77 commentedon Jan 5, 2015
@rhysh, it is an interesting idea to allocate overflow buckets as a single array (perhaps contiguous with the main buckets). Then we could use bucket ids instead of pointers to reference them.
The main problem is that there's no a priori maximum number of overflow buckets needed. It depends on how the hashing works out. It could be as bad as half the size of the main bucket array. You wouldn't want to allocate this worst case ahead of time. Maybe there would be a way to trigger growth on # of used overflow buckets instead of on just the number of entries as we do now. That way, we could have a fixed maximum, say 10%, of overflow buckets allocated at the start and we grow if we run out of them.
RLH commentedon Jan 5, 2015
A goroutine that is allocating or creating other GC work such as writing
pointers at a rate faster than the GC can deal with will need to be
throttled. This will allow the GC to complete in a timely fashion. What you
don't want is a goroutine that is not creating additional work for the GC
to also be throttled.
On Mon, Jan 5, 2015 at 4:41 PM, siritinga notifications@github.com wrote:
rmdamiao commentedon Jan 5, 2015
We would not have this kind of problem if go provided an "unmanaged" library which could implement manual memory management for regions of heap to be ignored by the GC. Neither a pointer to any address to the "unmanaged" heap nor any pointer type stored inside the "unmanaged" heap would be considered for GC. This could be a very simple solution which would solve once and for all the problems that go has with long-lived pointer values and which will probably never be solved by GC.
rhysh commentedon Jan 5, 2015
Right @randall77, there's no limit on the number of overflow buckets. If a map stays about the same size but has a lot of churn, it seems like each of the primary buckets could grow to have a considerable number of overflow buckets chained to them - the random distribution of elements could bump each bucket over 8 or 16 elements, without ever increasing the average beyond the load factor. Since the overflow buckets aren't freed when they're emptied via runtime·mapdelete, the expansion would be permanent.
There'd probably need to be an "overflow bucket evacuation" process that would operate like the current evacuation to allow resizing of the new array. Added complexity for sure, but it may be worth it.
Do you happen to have statistics on how many overflow buckets end up used for long-lived maps with churn? This could maybe be collected at runtime by chasing the overflow map pointers, or offline by inspecting debug.WriteHeapDump output?
23 remaining items