Skip to content

sync: avoid clearing the full Pool on every GC #22950

@dsnet

Description

@dsnet
Member

Consider the following benchmark:

func Benchmark(b *testing.B) {
	b.ReportAllocs()
	var a, z [1000]*flate.Writer
	p := sync.Pool{New: func() interface{} { return &flate.Writer{} }}
	for i := 0; i < b.N; i++ {
		for j := 0; j < len(a); j++ {
			a[j] = p.Get().(*flate.Writer)
		}
		for j := 0; j < len(a); j++ {
			p.Put(a[j])
		}
		a = z
		runtime.GC()
	}
}

This currently reports: Benchmark-8 10 263962095 ns/op 663587733 B/op 1017 allocs/op
where I find it surprising that there is around 1000 allocations again every cycle. This seems to indicate that the Pool is clearing its entire contents upon every GC. A peek at the implementation seems to indicate that this is so.

If the workload is such that it is cycling through a high number of items, then it makes sense to actually not clear everything. A simple heuristic to accomplish this is to keep track of the number of calls to Pool.Get since the last attempt clearing the pool and ensure that we retain up to that number of items.

\cc @LMMilewski @aclements

Activity

changed the title [-]sync: avoid clearing [/-] [+]sync: avoid clearing the full Pool on every GC[/+] on Dec 1, 2017
added
SuggestedIssues that may be good for new contributors looking for work to do.
on Dec 1, 2017
ianlancetaylor

ianlancetaylor commented on Dec 1, 2017

@ianlancetaylor
Contributor

When proposing any changes here please consider the discussion at https://groups.google.com/forum/#!msg/golang-dev/kJ_R6vYVYHU/LjoGriFTYxMJ .

OneOfOne

OneOfOne commented on Dec 1, 2017

@OneOfOne
Contributor

I'd like to see that too, a lot of times I use a ghetto chan xxx as a memory pool just to get around that.

cespare

cespare commented on Dec 1, 2017

@cespare
Contributor

I've also found the sync.Pool clearing behavior to be less than satisfactory. Here's a concrete example taken from a real server.

We have a performance-critical HTTP server. Reducing allocations has been a big part of getting throughput to be consistently high and latency to be consistently low. There are a handful of objects that we cache and reuse between requests as part of the allocation strategy. This server tends to have a ~5 GB heap and runs a GC every ~4 seconds.

One kind of reused object is a certain buffer that we use at the rate of ~10000 / sec. For this use case, a sync.Pool works really well; we get a 99.9% cache hit rate since we reuse buffers tens of thousands of times between GCs.

Another kind of reused object is a gzip.Writer that we use at the rate of ~500 / sec. For this use case, a sync.Pool did not work well for us: the combination of GC frequency and relatively lower usage rate meant that the hit rate when we used a sync.Pool was as low as 50%. Fresh gzip.Writers are extremely expensive to create, so we needed to do better. Now our code uses a buffered channel of *gzip.Writers that it can reuse, but it still uses a sync.Pool as a backup once the channel is drained so that if we suddenly need more writers than the channel size we don't have to keep allocating all of the extras.

Ideally a sync.Pool would be suitable for both of these scenarios.

Finally, the tight coupling between GC frequency and pool efficiency also has weird implications for reasoning about server performance. For instance, if totally unrelated code in our server changes the GC frequency (say, by introducing more allocations or, more perversely, decreasing the overall heap size), the pool hit rates get worse and thus the cost of allocations for these objects increases, even though nothing about the usage or cache-ability of those objects changed.

dsnet

dsnet commented on Dec 1, 2017

@dsnet
MemberAuthor

@ianlancetaylor. Much of the discussion seems to be around the question of "when to drain the pool?" and not much around "how much of the pool to drain?". In the thread, Ingo Oeser asked "Will it be fully or partially drained?", but that question does not seem to have been addressed.

Using the GC seems like a great time to trigger drainages, but the scenario that @cespare describes was the failure mode I imagined could happen. I think there's a worthwhile discussion to be had about "how much to drain".

rasky

rasky commented on Dec 1, 2017

@rasky
Member

I'll note my real world experience as well. We use go on embedded systems and we often have soft real-time constraints (aka, low latency processing). In a specific software we were working on, we tried using sync.Pool to reuse allocations for handling packets coming from a high freq UDP server. Unfortunately, whenever the GC was triggered, the full pool was exhausted and this caused a peak of high latency (due to the allocations) that went beyond the soft real-time constraints we had. We had to discard sync.Pool and use a hand-rolled memory pool implementation.

ianlancetaylor

ianlancetaylor commented on Dec 1, 2017

@ianlancetaylor
Contributor

@dsnet It sounds like your proposal is:

  1. Add a new field count to sync.Pool
  2. At each call to the Get method, atomically increment the count field.
  3. In poolCleanup
    • keep count elements distributed across the p's
    • if some elements were kept then keep the pool in allPools
    • set count to 0

Does that sound right? That seems reasonable to me. It basically delays the release of pool elements for one GC cycle.

It won't help @cespare 's case. It may help @rasky 's case. Of course, sync.Pool can't be a general mechanism that addresses all memory cache needs.

dsnet

dsnet commented on Dec 1, 2017

@dsnet
MemberAuthor

The algorithm to determine how much to drain has some parallels with cache eviction algorithms,
for which most rely on the principal that the past is a good predictor of the future.
In the same way, there are several pieces of information that can be used as a heuristic to develop
a policy for how much of the pool to drain. Ultimately, it comes down to a trade-off between CPU time vs. memory space.

There are two extremes:

  • We could drain nothing, in which case the CPU time may be great never needing to allocate, but is obviously costly in terms of memory.
    A pool of this type can trivially be implemented as a slice acting as a free-list.
  • We could drain everything, in which case we prioritize saving memory space, but may be costly in CPU time since the items cleared may need to be allocated again shortly. This is the current policy of sync.Pool.

My assertion is that neither of these extremes are ideal and well-suited for most use-cases.

The algorithm I proposed (which you correctly summarize) is a simple algorithm that tries to find a better middle-ground between those two extremes. It certainly not a perfect algorithm as it completely ignores Pool.Get statistics from farther than one GC ago. Thus, it suffers when the workload on the server is highly non-uniform. An extension to the algorithm may perform a weighted average over N GC cycles. As long it can be proven that the pool is eventually drained given no Pool.Get traffic, then we know it doesn't leak memory.

More complicated policies can take into account information from the GC like what proportion of the heap the pool is occupying.
However, those algorithms can get very complicated quickly and are much harder to prove that they are correct.

In terms of whether it helps @cespare's or @rasky's use-cases, I believe my simple algorithm helps both.
The degree to how much it helps really depends on the exact work-load.

For @cespare's example with the gzip.Writer Pool, there were ~2000 calls to Pool.Get between each GC (assuming ~4s between GC and ~500/s Pool.Get calls), this means that when the GC occurs, it will preserve at least up to 2000 items in the Pool when drain is called.
Of course, there are probably far less than 2000 items in the pool since some are in active use and the computation of 2000 Pool.Get calls doesn't track how many intervening Pool.Put calls were made. The important property is that it is a useful heuristic of the demand for allocations from that Pool and should (hopefully) increase the hit-rate from 50% to something much higher.

added
NeedsDecisionFeedback is required from experts, contributors, and/or the community before a change can be made.
on Dec 1, 2017
added this to the Go1.11 milestone on Dec 1, 2017
jimmyfrasche

jimmyfrasche commented on Dec 2, 2017

@jimmyfrasche
Member

A simpler heuristic would be to only drain some fraction of the pool at each GC.

Doing both would be good, too. It would be less likely to take too much or leave too little.

cespare

cespare commented on Dec 2, 2017

@cespare
Contributor

I believe that @dsnet's suggestion would work for my use case. The gzip.Writer pool that I described never gets very large (I don't have an exact number, but it's always <100 elements) so it seems like @dsnet's proposal would result in the pool never being cleared.

I don't quite see why using the number of Gets as the limit (without regard to Puts) is a good heuristic, though. For instance, in the scenario I described, if the pool grew abnormally large during one GC cycle, it would be trimmed down to 2000 elements and then stay at 2000, even though there are normally only <100 live elements.

ISTM that a better heuristic would be to use the max live set size during the cycle as the limit. This is somewhat more bookkeeping, though. (The straightforward way I can think of doing it would be atomic increment/decrement to track the size plus a CAS loop to track the max size.)

dsnet

dsnet commented on Dec 4, 2017

@dsnet
MemberAuthor

I was hoping using Gets alone would be a good approximation of "max live set", but your example provides an case where that approximation falls short.

ISTM that a better heuristic would be to use the max live set size during the cycle as the limit. This is somewhat more bookkeeping, though. (The straightforward way I can think of doing it would be atomic increment/decrement to track the size plus a CAS loop to track the max size.)

This could probably be simplified to just a atomic store. It's okay if it is slightly racy.

38 remaining items

gopherbot

gopherbot commented on Mar 11, 2019

@gopherbot
Contributor

Change https://golang.org/cl/166958 mentions this issue: sync: internal dynamically sized lock-free queue for sync.Pool

gopherbot

gopherbot commented on Mar 11, 2019

@gopherbot
Contributor

Change https://golang.org/cl/166961 mentions this issue: sync: smooth out Pool behavior over GC with a victim cache

gopherbot

gopherbot commented on Mar 11, 2019

@gopherbot
Contributor

Change https://golang.org/cl/166959 mentions this issue: sync: add Pool benchmarks to stress STW and reuse

gopherbot

gopherbot commented on Mar 11, 2019

@gopherbot
Contributor

Change https://golang.org/cl/166960 mentions this issue: sync: use lock-free structure for Pool stealing

locked and limited conversation to collaborators on Apr 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    FrozenDueToAgeNeedsDecisionFeedback is required from experts, contributors, and/or the community before a change can be made.PerformanceSuggestedIssues that may be good for new contributors looking for work to do.

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @bradfitz@cespare@jimmyfrasche@CAFxX@erikdubbelboer

        Issue actions

          sync: avoid clearing the full Pool on every GC · Issue #22950 · golang/go