-
Notifications
You must be signed in to change notification settings - Fork 40.6k
Deadlock in PLEG relist() for health check and Kubelet syncLoop() #72482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
/sig node |
Sure, thank you
We had some internal fix to break the deadlock, but I am wondering if there
are good fix for such kind of issue.
…On Tue, Jan 8, 2019 at 3:01 PM Penghao Cen ***@***.***> wrote:
@denverdino <https://github.com/denverdino> Thanks for your report, we
have the same problem in our cluster, and will file a PR to fix this :)
@answer1991 <https://github.com/answer1991> FYI.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#72482 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAZAecsJCkPnZUmrlVIuFdHq7UMB4kEHks5vBEI6gaJpZM4ZmqCY>
.
|
When pleg do relist, can we invoke |
I submit a PR #72709 |
Thanks for finding the issue! The issue description makes sense to me. I think PLEG is a core component, we should try our best to unblock it. The event channel becoming full seems more like an issue in the event consumer side, which shouldn't block PLEG. Option 1One solution is to queue the events in PLEG and send later, but that doesn't fundamentally solve the problem, if we don't set a limit on the event number we queue, it seems like a potential memory leak to me; if we set a limit, we may eventually hit this problem again. Option 2Another solution is to add a retry for sending event in PLEG, and set a bit in PLEG in between the retry which tells that PLEG is not blocking, it is trying to send out the event. And in the Option 3Actually, even if we fail to send the event and discard it, since we have already updated the pod cache, the periodical pod resync should eventually pick it up. This may delay pod sync for up to
We can:
In the 3 options above, I'm fine with either 2 or 3. Option 2 doesn't have the 1min pod sync delay, but a bit ad-hoc; option 3 is more generic, but may delay pod sync. There might be other options. /cc @yujuhong Any thoughts? |
If PLEG is already blocked by a full event channel, this doesn't help right? |
cc/ @jiayingz This looks like the one production issue you mentioned to me a while back. |
@Random-Liu of options enumerated, option 3 was preferred for me. |
Thanks for reporting the issue!
@denverdino, @resouer what was the load on the node that triggered this condition? If the node wasn't function at all, and the events slowly built up over time, bumping the size of the channel won't help.
This should work, and doesn't complicate the logic in PLEG more, so it's my preferred option too. |
Agree, giving a proper size for event channel is tricky as the scale also grows with time. We will ship fix as option 3 (#72709) for 1.14. Thanks for summarizing all these up as sig meeting! /assign |
@resouer If possible, could you share in how large the stress is? It helps understand the issue and kubelet stress handling more. :) |
@Random-Liu @yujuhong Sure, our product team (@denverdino ) will input the stress test data after removing sensitive information in follow up comments. |
@resouer @Random-Liu @yujuhong Thanks for reply. |
@BSWANG Yeah, I can see how frequent container state change can cause this issue. The number is very useful, thanks a lot! |
@denverdino cloud you please help to share how to dump the goroutines stack trace of kubelet like what you pasted in your comment. thanks a lot. |
This issue is fixed by #72709 |
using pprof is a answer to my question @denverdino thank you anyway. |
What happened:
In our stress testing, we found the deadlock issue in PLEG relist() for healthy checking and Kubelet syncLoop()
The goroutine stack trace is as following
We did some investigation, and found the following issue.
The main loop for event processing in Kubelet will check if there are some runtimeErrors: if yes, wait until runtime back to normal; if no, it will call syncLoopIteration to read and process events.
And the runtime healthChecks depends on the GenericPLEG.relist. But in the relist() impl, it will send the pod lifecycle event to eventChannel which may introduce deadlock when eventChannel is full.
And in the above code, the eventChannel will not be consumed if the runtime is not healthy. In some situation, the eventChannel may full, and the relist() will hang in sending new pod lifecycle event. And the runtime status will not be able to recovered.
It is related to issues #45419
What you expected to happen:
No such issue
How to reproduce it (as minimally and precisely as possible):
It is not easy to reproduce, it happens after few days stress testing (scaling the deployment repeatedly with high load).
Anything else we need to know?:
Environment:
kubectl version
): 1.11.5uname -a
):Linux bjc-qtt-k8s-node-010 3.10.0-693.2.2.el7.x86_64 Unit test coverage in Kubelet is lousy. (~30%) #1 SMP Tue Sep 12 22:26:13 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
/kind bug
The text was updated successfully, but these errors were encountered: