-
Notifications
You must be signed in to change notification settings - Fork 38.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pod is stuck in terminating with "MountVolume.SetUp failed for volume ... not registered" error #113289
Comments
/sig node |
/assign @rphillips sounds pretty close to other bug we looked at earlier when clean up was failing causing many pod GC loops keep running. @rphillips to take a look if it's the same issue |
/triage accepted |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
We have the same problem from time to time with two different k8s version: kubelet logs:
|
I fixed my this error
by installing nfs-common in my worker node
Hope you don't have such silly mistake |
I also met this issue too, the secret and configmap block the pod terminating when delete namespace |
/sig storage |
@sharad740 that is a different mount error from the "not registered" issue. @SergeyKanzhelev we're hitting this issue somewhat frequently in large clusters and it seems to be one of the few sorts of issue like this where restarting kubelet does not resolve the problem and we have to force delete the pod.
This is on kubernetes v1.27.13 It seems to often happen for pods in the Terminating state for us and impacts all of the secret and configmap volumes, but no others. |
^ For that case in particular, we see pods that have been running a while, and successfully mounting volumes start to fail to mount volumes while terminating. I can see it stop watching the secret that it needs after eviction starts, but before termination completes. Something peculiar is that the logs show that kubelet thinks this is a new pod just starting up as it is terminating, which then results in a problem because it cannot mount the volume for the secret it stopped watching
Several questions arise here:
Update here- looks like kubelet had restarted here, so that explains the ADD |
I got to the bottom of what was happening in our case, and it's possible it's the same thing that was happening in the original report: In this pod worker loop, running pods get the SyncPod function, but terminating pods get the SyncTerminatingPod function. SyncPod has the pod registration flow, but SyncTerminatingPod does not, but this is needed for kubelet to know what secrets/configmaps back mounted volumes. I added in:
in |
What happened?
Pod is stuck in terminating state when rapidly create and delete the pod, and the kubelet reported the volume setup error:
It has the same error with #105204, but the k8s 1.23.13 we used have fixed the issue with the PR #108756.
It appears that the PR haven't totally fixed the issue. The cause may be if the pod is deleted when the volume manager is setting up the volume of the pod, the
syncPod
loop in kubelet won't add the reference count because it's in terminating state.So the volume manager will still try to setup the volume of the pod but failed with the "not registed" error.
What did you expect to happen?
The pod should be deleted without the volume setup error.
How can we reproduce it (as minimally and precisely as possible)?
It's the possibel reproduce steps:
Maybe the steps should be runned several times.
Anything else we need to know?
No response
Kubernetes version
Cloud provider
OS version
Install tools
Container runtime (CRI) and version (if applicable)
Related plugins (CNI, CSI, ...) and versions (if applicable)
The text was updated successfully, but these errors were encountered: