-
Notifications
You must be signed in to change notification settings - Fork 18.3k
Description
Currently goroutines are only preemptible at function call points. Hence, it's possible to write a tight loop (e.g., a numerical kernel or a spin on an atomic) with no calls or allocation that arbitrarily delays preemption. This can result in arbitrarily long pause times as the GC waits for all goroutines to stop.
In unusual situations, this can even lead to deadlock when trying to stop the world. For example, the runtime's TestGoroutineParallelism tries to prevent GC during the test to avoid deadlock. It runs several goroutines in tight loops that communicate through a shared atomic variable. If the coordinator that starts these is paused part way through, it will deadlock.
One possible fix is to insert preemption points on control flow loops that otherwise have no preemption points. Depending on the cost of this check, it may also require unrolling such loops to amortize the overhead of the check.
This has been a longstanding issue, so I don't think it's necessary to fix it for 1.5. It can cause the 1.5 GC to miss its 10ms STW goal, but code like numerical kernels is probably not latency-sensitive. And as far as I can tell, causing a deadlock like the runtime test can do requires looking for trouble.
Activity
minux commentedon May 26, 2015
randall77 commentedon May 26, 2015
The problem with interrupts is that the goroutine can be at an arbitrary address. The interrupted address is probably at a place where we have no data about where the pointers are. We'd either have to record a lot more frame layout/type/liveness information (including registers), or we'd have to simulate the goroutine forward to a safepoint. Both are challenging.
minux commentedon May 26, 2015
ianlancetaylor commentedon May 26, 2015
It seems to me that we can preempt at a write barrier. So the only loops we are talking about are those that make no writes to the heap and make no function calls. If we think in terms of adding
then the normal path through the loop will have two extra instructions (add/beq) where the add may be to either a register or a memory location, depending on overall register pressure. This will be measurable in tight loops but for most cases shouldn't be too bad.
randall77 commentedon May 26, 2015
No pointer writes to the heap.
But maybe this is enough? If we can preempt at the first write barrier, then even if we let the mutator run it can't modify anything that the GC cares about. So maybe we just set the write barrier enabled bit and let goroutines run. Once a goroutine sees the write barrier bit (how do we force that?) it can't modify any memory the GC cares about. So it is safe to start the GC without waiting for every goroutine to stop.
aclements commentedon May 27, 2015
True. That's why I suggested unrolling loops, which AFAIK is a standard solution to this problem.
However, I think the check can be done in just two instructions, even without adding Ian's loop counter. Just CMP the preempt flag and branch if it's set. That branch will almost never be hit, so it will be highly predictable, and the preempt flag should be in the L1, so the check may in fact be extremely cheap.
This certainly reduces the set of programs that have this problem, but I don't think it actually helps with either of the examples I gave, since numerical kernels probably won't have heap pointer writes, and the runtime test that can deadlock the GC certainly has no pointer writes.
RLH commentedon May 27, 2015
This is the reference for doing a GC safepoint at every instruction. It is
not something we want to do for x86 much less the various other
architectures we need to support.
http://dl.acm.org/citation.cfm?id=301652
Most systems I know of do what Austin suggests, unroll the loop and insert
a check. No numbers but 8 unrolls seems to be what I recall. Not only is
the branch highly predictable but the check does not introduce any
dependencies making it close to free on an out of order machine. There have
been other approaches such as code plugging and predicated branches but HW
has moved on. I had not seen Ian's suggestion.
On Tue, May 26, 2015 at 9:24 PM, Austin Clements notifications@github.com
wrote:
kingluo commentedon Sep 9, 2015
@aclements,
To be precise, the preemption check seems only at newstack()?
For example,
The test() is not inline, and the infinite loop calls test(), but it would not preempt at the calls, because the morestack() --> newstack() not involved, so the "hello world" would never be printed.
randall77 commentedon Sep 9, 2015
test() is not inlined, but since it is a leaf it is promoted to nosplit, so it doesn't have the preemption check any more. Fixing that sounds much easier than the rest of this bug. Maybe we could forbid nosplit promotion for functions called from loops?
griesemer commentedon Sep 10, 2015
For the reference, the same problem appeared in the HotSpot JVM. I remember two approaches:
This mechanism was ingenious but also insanely complicated. It was abandoned eventually for:
My vote would be for 2) to start in loops where @ianlancetaylor's suggestion doesn't work. I suspect that for all but the smallest long-running inner loops (where unrolling might make sense independently), the performance penalty is acceptable.
If this is not good enough, and we don't want to pay the code size cost of unrolling a loop multiple times, here's another idea based on 1): Instead of the backward branch check as in 2) plus unrolling, keep the original loop, and generate (at compile time) a 2nd version of the loop body that ends in a runtime call to suspend itself instead of the backward branch. The code size cost is about the cost of having the loop unrolled twice. When the goroutine needs to run to a safe point, use a signal to temporarily suspend the goroutine, switch the pc to the corresponding pc in the copy of the loop body, continue with execution there and have the code suspend itself at a safe point. There's no dynamic code generation involved, generating the extra loop body is trivial at compile-time, the extra amount of code is less than with loop unrolling, and there's only a little bit of runtime work to modify the pc. The regular code would run at full speed if no garbage collection is needed.
271 remaining items
aclements commentedon Jan 2, 2020
Go 1.14 will handle loop preemption far better than previous versions, but there are still some loose ends to tie up in 1.15, so I'm bumping this issue to 1.15.
aclements commentedon Jan 2, 2020
Actually, rather than bump this to 1.15, since tight loops are preemptible now, I've created a new issue to keep track of loose ends (#36365) and will go ahead and close this issue as fixed.
gopherbot commentedon Jan 8, 2020
Change https://golang.org/cl/213837 mentions this issue:
runtime: protect against external code calling ExitProcess
runtime: protect against external code calling ExitProcess
Remove runtime.Gosched loops, source issue golang/go#10958 is resolved.