-
Notifications
You must be signed in to change notification settings - Fork 18.3k
Closed
Labels
FrozenDueToAgeNeedsDecisionFeedback is required from experts, contributors, and/or the community before a change can be made.Feedback is required from experts, contributors, and/or the community before a change can be made.PerformanceSuggestedIssues that may be good for new contributors looking for work to do.Issues that may be good for new contributors looking for work to do.
Milestone
Description
Consider the following benchmark:
func Benchmark(b *testing.B) {
b.ReportAllocs()
var a, z [1000]*flate.Writer
p := sync.Pool{New: func() interface{} { return &flate.Writer{} }}
for i := 0; i < b.N; i++ {
for j := 0; j < len(a); j++ {
a[j] = p.Get().(*flate.Writer)
}
for j := 0; j < len(a); j++ {
p.Put(a[j])
}
a = z
runtime.GC()
}
}
This currently reports: Benchmark-8 10 263962095 ns/op 663587733 B/op 1017 allocs/op
where I find it surprising that there is around 1000 allocations again every cycle. This seems to indicate that the Pool
is clearing its entire contents upon every GC. A peek at the implementation seems to indicate that this is so.
If the workload is such that it is cycling through a high number of items, then it makes sense to actually not clear everything. A simple heuristic to accomplish this is to keep track of the number of calls to Pool.Get
since the last attempt clearing the pool and ensure that we retain up to that number of items.
OneOfOne, cespare, rasky, tmwalaszek, CAFxX and 15 more
Metadata
Metadata
Assignees
Labels
FrozenDueToAgeNeedsDecisionFeedback is required from experts, contributors, and/or the community before a change can be made.Feedback is required from experts, contributors, and/or the community before a change can be made.PerformanceSuggestedIssues that may be good for new contributors looking for work to do.Issues that may be good for new contributors looking for work to do.
Type
Projects
Relationships
Development
Select code repository
Activity
[-]sync: avoid clearing [/-][+]sync: avoid clearing the full Pool on every GC[/+]ianlancetaylor commentedon Dec 1, 2017
When proposing any changes here please consider the discussion at https://groups.google.com/forum/#!msg/golang-dev/kJ_R6vYVYHU/LjoGriFTYxMJ .
OneOfOne commentedon Dec 1, 2017
I'd like to see that too, a lot of times I use a ghetto
chan xxx
as a memory pool just to get around that.cespare commentedon Dec 1, 2017
I've also found the sync.Pool clearing behavior to be less than satisfactory. Here's a concrete example taken from a real server.
We have a performance-critical HTTP server. Reducing allocations has been a big part of getting throughput to be consistently high and latency to be consistently low. There are a handful of objects that we cache and reuse between requests as part of the allocation strategy. This server tends to have a ~5 GB heap and runs a GC every ~4 seconds.
One kind of reused object is a certain buffer that we use at the rate of ~10000 / sec. For this use case, a sync.Pool works really well; we get a 99.9% cache hit rate since we reuse buffers tens of thousands of times between GCs.
Another kind of reused object is a gzip.Writer that we use at the rate of ~500 / sec. For this use case, a sync.Pool did not work well for us: the combination of GC frequency and relatively lower usage rate meant that the hit rate when we used a sync.Pool was as low as 50%. Fresh gzip.Writers are extremely expensive to create, so we needed to do better. Now our code uses a buffered channel of *gzip.Writers that it can reuse, but it still uses a sync.Pool as a backup once the channel is drained so that if we suddenly need more writers than the channel size we don't have to keep allocating all of the extras.
Ideally a sync.Pool would be suitable for both of these scenarios.
Finally, the tight coupling between GC frequency and pool efficiency also has weird implications for reasoning about server performance. For instance, if totally unrelated code in our server changes the GC frequency (say, by introducing more allocations or, more perversely, decreasing the overall heap size), the pool hit rates get worse and thus the cost of allocations for these objects increases, even though nothing about the usage or cache-ability of those objects changed.
dsnet commentedon Dec 1, 2017
@ianlancetaylor. Much of the discussion seems to be around the question of "when to drain the pool?" and not much around "how much of the pool to drain?". In the thread, Ingo Oeser asked "Will it be fully or partially drained?", but that question does not seem to have been addressed.
Using the GC seems like a great time to trigger drainages, but the scenario that @cespare describes was the failure mode I imagined could happen. I think there's a worthwhile discussion to be had about "how much to drain".
rasky commentedon Dec 1, 2017
I'll note my real world experience as well. We use go on embedded systems and we often have soft real-time constraints (aka, low latency processing). In a specific software we were working on, we tried using sync.Pool to reuse allocations for handling packets coming from a high freq UDP server. Unfortunately, whenever the GC was triggered, the full pool was exhausted and this caused a peak of high latency (due to the allocations) that went beyond the soft real-time constraints we had. We had to discard sync.Pool and use a hand-rolled memory pool implementation.
ianlancetaylor commentedon Dec 1, 2017
@dsnet It sounds like your proposal is:
count
tosync.Pool
Get
method, atomically increment thecount
field.poolCleanup
count
elements distributed across the p'sallPools
count
to0
Does that sound right? That seems reasonable to me. It basically delays the release of pool elements for one GC cycle.
It won't help @cespare 's case. It may help @rasky 's case. Of course,
sync.Pool
can't be a general mechanism that addresses all memory cache needs.dsnet commentedon Dec 1, 2017
The algorithm to determine how much to drain has some parallels with cache eviction algorithms,
for which most rely on the principal that the past is a good predictor of the future.
In the same way, there are several pieces of information that can be used as a heuristic to develop
a policy for how much of the pool to drain. Ultimately, it comes down to a trade-off between CPU time vs. memory space.
There are two extremes:
A pool of this type can trivially be implemented as a slice acting as a free-list.
My assertion is that neither of these extremes are ideal and well-suited for most use-cases.
The algorithm I proposed (which you correctly summarize) is a simple algorithm that tries to find a better middle-ground between those two extremes. It certainly not a perfect algorithm as it completely ignores Pool.Get statistics from farther than one GC ago. Thus, it suffers when the workload on the server is highly non-uniform. An extension to the algorithm may perform a weighted average over N GC cycles. As long it can be proven that the pool is eventually drained given no Pool.Get traffic, then we know it doesn't leak memory.
More complicated policies can take into account information from the GC like what proportion of the heap the pool is occupying.
However, those algorithms can get very complicated quickly and are much harder to prove that they are correct.
In terms of whether it helps @cespare's or @rasky's use-cases, I believe my simple algorithm helps both.
The degree to how much it helps really depends on the exact work-load.
For @cespare's example with the gzip.Writer Pool, there were ~2000 calls to Pool.Get between each GC (assuming ~4s between GC and ~500/s Pool.Get calls), this means that when the GC occurs, it will preserve at least up to 2000 items in the Pool when drain is called.
Of course, there are probably far less than 2000 items in the pool since some are in active use and the computation of 2000 Pool.Get calls doesn't track how many intervening Pool.Put calls were made. The important property is that it is a useful heuristic of the demand for allocations from that Pool and should (hopefully) increase the hit-rate from 50% to something much higher.
jimmyfrasche commentedon Dec 2, 2017
A simpler heuristic would be to only drain some fraction of the pool at each GC.
Doing both would be good, too. It would be less likely to take too much or leave too little.
cespare commentedon Dec 2, 2017
I believe that @dsnet's suggestion would work for my use case. The gzip.Writer pool that I described never gets very large (I don't have an exact number, but it's always <100 elements) so it seems like @dsnet's proposal would result in the pool never being cleared.
I don't quite see why using the number of Gets as the limit (without regard to Puts) is a good heuristic, though. For instance, in the scenario I described, if the pool grew abnormally large during one GC cycle, it would be trimmed down to 2000 elements and then stay at 2000, even though there are normally only <100 live elements.
ISTM that a better heuristic would be to use the max live set size during the cycle as the limit. This is somewhat more bookkeeping, though. (The straightforward way I can think of doing it would be atomic increment/decrement to track the size plus a CAS loop to track the max size.)
dsnet commentedon Dec 4, 2017
I was hoping using
Gets
alone would be a good approximation of "max live set", but your example provides an case where that approximation falls short.This could probably be simplified to just a atomic store. It's okay if it is slightly racy.
38 remaining items
gopherbot commentedon Mar 11, 2019
Change https://golang.org/cl/166958 mentions this issue:
sync: internal dynamically sized lock-free queue for sync.Pool
gopherbot commentedon Mar 11, 2019
Change https://golang.org/cl/166961 mentions this issue:
sync: smooth out Pool behavior over GC with a victim cache
gopherbot commentedon Mar 11, 2019
Change https://golang.org/cl/166959 mentions this issue:
sync: add Pool benchmarks to stress STW and reuse
gopherbot commentedon Mar 11, 2019
Change https://golang.org/cl/166960 mentions this issue:
sync: use lock-free structure for Pool stealing
sync: internal fixed size lock-free queue for sync.Pool
sync: internal dynamically sized lock-free queue for sync.Pool
sync: add Pool benchmarks to stress STW and reuse
sync: use lock-free structure for Pool stealing
newFirstChunk
more effective pingcap/tidb#9396