Description
Description
Hi!
We are running containerd on GKE with pretty much all defaults. A dozen nodes, and a few hundreds pods. Plenty of memory and disk free.
We started to have many pods fail due to failed to reserve container name
error in the last week or so. I do not recall any specific changes to the cluster, or containers themselves.
Any help will be greatly appreciated!
Steps to reproduce the issue:
I have no clue how to specifically reproduce this issue.
Cluster have nothing special, deployment is straightforward. The only thing that could be relevant is that our images are quite large, around 3Gb.
I got a few more details here : https://serverfault.com/questions/1036683/gke-context-deadline-exceeded-createcontainererror-and-failed-to-reserve-contai
Describe the results you received:
2020-10-07T08:01:45Z Successfully assigned default/apps-abcd-6b6cb5876b-nn9md to gke-bap-mtl-1-preemptible-e2-s4-e6a8ddb4-ng3v I
2020-10-07T08:01:50Z Pulling image "gcr.io/my/appImage:223c133ff631c41e1bc21a8b7d7554036da4fb4e" I
2020-10-07T08:16:45Z Successfully pulled image "gcr.io/my/appImage:223c133ff631c41e1bc21a8b7d7554036da4fb4e" I
2020-10-07T08:18:45Z Error: context deadline exceeded W
2020-10-07T08:18:45Z Container image "redis:4.0-alpine" already present on machine I
2020-10-07T08:18:53Z Created container redis I
2020-10-07T08:18:53Z Started container redis I
2020-10-07T08:18:53Z Pulling image "gcr.io/my/appImage:223c133ff631c41e1bc21a8b7d7554036da4fb4e" I
2020-10-07T08:19:02Z Successfully pulled image "gcr.io/my/appImage:223c133ff631c41e1bc21a8b7d7554036da4fb4e" I
2020-10-07T08:19:02Z Error: failed to reserve container name "web_apps-abcd-6b6cb5876b-nn9md_default_3dc00fd6-0c5d-42be-bec8-e4f6cad616da_0": name "web_apps-abcd-6b6cb5876b-nn9md_default_3dc00fd6-0c5d-42be-bec8-e4f6cad616da_0" is reserved for "8b21a9870e3ecc09bbb92da2036bd3c9b35f5829873d80cfbd14dc1e1827923f" W
2020-10-07T08:19:03Z Pulling image "gcr.io/my/appImage:223c133ff631c41e1bc21a8b7d7554036da4fb4e" I
2020-10-07T08:19:20Z Successfully pulled image "gcr.io/my/appImage:223c133ff631c41e1bc21a8b7d7554036da4fb4e" I
2020-10-07T08:19:20Z Error: failed to reserve container name "web_apps-abcd-6b6cb5876b-nn9md_default_3dc00fd6-0c5d-42be-bec8-e4f6cad616da_0": name "web_apps-abcd-6b6cb5876b-nn9md_default_3dc00fd6-0c5d-42be-bec8-e4f6cad616da_0" is reserved for "8b21a9870e3ecc09bbb92da2036bd3c9b35f5829873d80cfbd14dc1e1827923f" W
2020-10-07T08:19:21Z Pulling image "gcr.io/my/appImage:223c133ff631c41e1bc21a8b7d7554036da4fb4e" I
2020-10-07T08:19:34Z Successfully pulled image "gcr.io/my/appImage:223c133ff631c41e1bc21a8b7d7554036da4fb4e" I
2020-10-07T08:19:34Z Error: failed to reserve container name "web_apps-abcd-6b6cb5876b-nn9md_default_3dc00fd6-0c5d-42be-bec8-e4f6cad616da_0": name "web_apps-abcd-6b6cb5876b-nn9md_default_3dc00fd6-0c5d-42be-bec8-e4f6cad616da_0" is reserved for "8b21a9870e3ecc09bbb92da2036bd3c9b35f5829873d80cfbd14dc1e1827923f" W
2020-10-07T08:19:35Z Pulling image "gcr.io/my/appImage:223c133ff631c41e1bc21a8b7d7554036da4fb4e" I
2020-10-07T08:19:44Z Successfully pulled image "gcr.io/my/appImage:223c133ff631c41e1bc21a8b7d7554036da4fb4e" I
2020-10-07T08:19:44Z Error: failed to reserve container name "web_apps-abcd-6b6cb5876b-nn9md_default_3dc00fd6-0c5d-42be-bec8-e4f6cad616da_0": name "web_apps-abcd-6b6cb5876b-nn9md_default_3dc00fd6-0c5d-42be-bec8-e4f6cad616da_0" is reserved for "8b21a9870e3ecc09bbb92da2036bd3c9b35f5829873d80cfbd14dc1e1827923f" W
2020-10-07T08:19:54Z Pulling image "gcr.io/my/appImage:223c133ff631c41e1bc21a8b7d7554036da4fb4e" I
2020-10-07T08:20:08Z Successfully pulled image "gcr.io/my/appImage:223c133ff631c41e1bc21a8b7d7554036da4fb4e" I
2020-10-07T08:20:08Z Error: failed to reserve container name "web_apps-abcd-6b6cb5876b-nn9md_default_3dc00fd6-0c5d-42be-bec8-e4f6cad616da_0": name "web_apps-abcd-6b6cb5876b-nn9md_default_3dc00fd6-0c5d-42be-bec8-e4f6cad616da_0" is reserved for "8b21a9870e3ecc09bbb92da2036bd3c9b35f5829873d80cfbd14dc1e1827923f" W
2020-10-07T08:20:18Z Pulling image "gcr.io/my/appImage:223c133ff631c41e1bc21a8b7d7554036da4fb4e" I
2020-10-07T08:20:30Z Successfully pulled image "gcr.io/my/appImage:223c133ff631c41e1bc21a8b7d7554036da4fb4e" I
2020-10-07T08:20:30Z Error: failed to reserve container name "web_apps-abcd-6b6cb5876b-nn9md_default_3dc00fd6-0c5d-42be-bec8-e4f6cad616da_0": name "web_apps-abcd-6b6cb5876b-nn9md_default_3dc00fd6-0c5d-42be-bec8-e4f6cad616da_0" is reserved for "8b21a9870e3ecc09bbb92da2036bd3c9b35f5829873d80cfbd14dc1e1827923f" W
2020-10-07T08:21:19Z Successfully pulled image "gcr.io/my/appImage:223c133ff631c41e1bc21a8b7d7554036da4fb4e" I
2020-10-07T08:26:35Z Successfully pulled image "gcr.io/my/appImage:223c133ff631c41e1bc21a8b7d7554036da4fb4e" I
2020-10-07T08:31:36Z Successfully pulled image "gcr.io/my/appImage:223c133ff631c41e1bc21a8b7d7554036da4fb4e" I
2020-10-07T08:36:26Z Successfully pulled image "gcr.io/my/appImage:223c133ff631c41e1bc21a8b7d7554036da4fb4e" I
2020-10-07T08:41:18Z Pulling image "gcr.io/my/appImage:223c133ff631c41e1bc21a8b7d7554036da4fb4e" I
I 2020-10-07T08:46:41Z Successfully pulled image "gcr.io/my/appImage:223c133ff631c41e1bc21a8b7d7554036da4fb4e"
Describe the results you expected:
Live an happy life, error free :)
Output of containerd --version
:
containerd github.com/containerd/containerd 1.3.2 ff48f57fc83a8c44cf4ad5d672424a98ba37ded6
Any other relevant information:
Activity
windniw commentedon Nov 20, 2020
It looks like there is a container with name web_apps-abcd-6b6cb5876b-nn9md_default_3dc00fd6-0c5d-42be-bec8-e4f6cad616da_0 and id
8b21a9870e3ecc09bbb92da2036bd3c9b35f5829873d80cfbd14dc1e1827923f
in containerd. While kubelet want to create to a new one with name web_apps-abcd-6b6cb5876b-nn9md_default_3dc00fd6-0c5d-42be-bec8-e4f6cad616da_0, cri plugin failed on it.Could you show
docker ps -a
orctr c list
pfuhrmann commentedon Dec 27, 2020
Did you manage to resolve this issue @sadortun? We are experiencing the same. Also on GKE with containerd runtime.
We are deploying the same image to multiple deployments (30 - 40 pods) at the same time. No such issues with docker runtime.
Eventually, kubelet is able to resolve this issue without manual intervention, however, it is significantly slowing the deployment of new images during the release (extra 2-3 minutes to resolve name conflicts).
sadortun commentedon Dec 28, 2020
Hi @pfuhrmann
We did investigate this quite deeply with GKE Dev team and we were not able to reproduce it.
That said, We are pretty convince the issue comes from one of the two following issue:
Unfortunately after a month of back and forth with GKE devs, we were not able to find the solution.
The good new is, for us, we refactored our application and were able to reduce the number of starting pods from about 20, down to 5. Since then, we had no issues.
You might also want to increase node boot drive size. It seems to help too.
kmarji commentedon Apr 25, 2021
any update on this? did anybody manage to solve this? we are facing the same issue
chrisroat commentedon May 22, 2021
We are also seeing the same issue, GKE with containerd. It does seem to be correlated with starting many pods at once.
Switching from cos_containerd back to cos (docker based) seems to have resolved the situation, at least in the short term.
kmarji commentedon May 22, 2021
Same for us once we switched back to cos with docker everything worked
sadortun commentedon May 22, 2021
At the end we still had occasional issues and We also had to switch back to
cos
mikebrow commentedon May 22, 2021
jotting down some notes here, apologies if it's lengthy:
Let me try to explain/figure out the reason you got "failed to reserve container name" ..
Kubelet tried to create a container that it had already asked containerd to create at least once.. when containerd tried the first time it received a variable in the container create meta data named
attempt
and that variable held the default value0
.. then containerd reserved the unique name for attempt 0 that you see in your log (see _0 at end of name)"web_apps-abcd-6b6cb5876b-nn9md_default_3dc00fd6-0c5d-42be-bec8-e4f6cad616da_0"
... something happened causing a context timeout between kubelet and containerd .. the kubelet context timeout value is configurable.. "--runtime-request-timeout duration Default:2m0s
" a 2min timeout could happen for any number of reasons.. an unusually long garbage collection a file system hiccup, locked files, deadlocks while waiting, some very expensive init operation occurring in the node for one of your other containers.. who knows? That's why we have/need recovery procedures.What should have happened is kubelet should've incremented the
attempt
number (or at least that's how I see it from this side (the containerd side) of the CRI api, but kubelet did not increment the attempt number and further containerd was still trying to create the container from the first request.. or the create on the containerd side may even be finished at this point, it is possible the timeout only happened on the kubelet side and containerd continued finishing the create, possibly even attempting to return the success result. If containerd actually failed it would have deleted the reservation for that container id as the immediate thing after we reserve the id in containerd is to defer it's removal on any error in the create.. https://github.com/containerd/containerd/blob/master/pkg/cri/server/container_create.go#L65-L84So ok.. skimming over the kubelet code.. I believe this is the code that decides what attempt number we are on? https://github.com/kubernetes/kubernetes/blame/master/pkg/kubelet/kuberuntime/kuberuntime_container.go#L173-L292
In my skim.. I think I see a window where kubelet will try attempt 0 a second time after the first create attempt fails with a context timeout. But I may be reading the code wrong? @dims @feiskyer @Random-Liu
containers with unready status: [main]|failed to reserve container name
flyteorg/flyte#1234CyberHippo commentedon Jul 20, 2021
Bumped into this issue as well. Switching back to cos with docker.
jsoref commentedon Aug 26, 2021
Fwiw, we're hitting this this week.
k8s 1.20.8-gke.900; containerd://1.4.3
In my case, the pod is owned by a (batch/v1)job, and the job by a (batch/v1beta1)cronjob.
The reserved for item only appears in the error, nothing else seems to know about it
Using Google cloud logging, I can search:
w/ a search range of 2021-08-22 01:58:00.000 AM EDT..2021-08-22 02:03:00.000 AM EDT
This is the first hit:
And this is the second hit:
There are additional hits, but they aren't exciting.
For reference, this search (with the same time params) yields nothing:
This search
yields two entries:
(The are additional hits if i extend the time window forward, but as they appear to be identical, other than the timestamp, I don't see any value in repeating them.)
Relevant log events
The best query I've found is:
(The former is to limit which part of GCloud to search, and the latter is the search.)
matti commentedon Sep 17, 2021
same, switching back to docker
38 remaining items
sadortun commentedon Jan 25, 2022
@fuweid
Thanks for your time on this issue.
Unfortunately, I did stop using COS back in 2020 after we could not find a solution.
I'm 97% sure we were using
overlayfs
and as for the rest I have no way to find this historical data.Sorry about that.
oci: use readonly mount to read user/group info
oci: use readonly mount to read user/group info
oci: use readonly mount to read user/group info
fuweid commentedon Jan 28, 2022
@sadortun I file pr to enhance this. #6478 (comment)
No sure that what different between docker and containerd in GKE. sorry about that.
derekperkins commentedon Jan 31, 2022
We're on GKE 1.21.6-gke1500 and we've been seeing this problem for the last 1-2 months
qiutongs commentedon Feb 1, 2022
I got some good results showing this patch improves the latency of "CreateContainer".
stress-ng --io 1 -d 1 --timeout 7200 --hdd-bytes 8M
nginx
with 25 replicasCreateContainer
complete within 2 minsPlease note this is based on a couple of experiments, not ample data set.
stress-ng
doesn't produce stable IOPS so the disk state cannot be the exact same in two cases.qiutongs commentedon Feb 1, 2022
Summary (2022/02)
"failed to reserve container name" error is returned by containerd CRI if there is an in-flight
CreateContainer
request reserving the same container name (like below).T1: 1st CreateContainer(XYZ) request is sent. (Timeout on Kubelet side)
T2: 2nd CreateContainer(XYZ) request is sent (Kubelet retry)
T3: 2nd CreateContainer request returns "failed to reserve container name XYZ" error
T4: 1st CreateContainer request is still in-flight…
Don't panic. Given sufficient time, the container and pod will be created successfully, as long as you are using
restartPolicy:Always
orrestartPolicy:OnFailure
in PodSpec.Root Cause and Fix
Slow disk operations((e.g. disk throttle on GKE) are the culprit. What generates lots of disk IO can come from a number of factors: user's disk-heavy workload, big images pulling and containerd CRI implementation.
An unnecessary
sync-fs
operation was found as part ofCreateContainer
stack. It is the whereCreateContainer
gets stuck. Thesync-fs
is got rid of in #6478. Not only it makesCreateContainer
return faster, but it reduces disk IO generated by containerd.Please note there are perhaps other undiscovered reason contributing to this problem.
Mitigation
restartPolicy:Always
orrestartPolicy:OnFailure
in PodSpecoci: use readonly mount to read user/group info
oci: use readonly mount to read user/group info
oci: use readonly mount to read user/group info
oci: use readonly mount to read user/group info