Skip to content

Can't Running: Unable to set Type=notify in systemd service file #7

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
KarlCyan opened this issue Nov 20, 2019 · 14 comments
Closed

Can't Running: Unable to set Type=notify in systemd service file #7

KarlCyan opened this issue Nov 20, 2019 · 14 comments

Comments

@KarlCyan
Copy link

Describe the bug

Container logs:

I1120 02:49:13.645198  262971 volume.go:152] Volume manager is running
E1120 02:49:13.645282  262971 server.go:132] Unable to set Type=notify in systemd service file?
I1120 02:49:14.019519  262971 app.go:87] Wait for internal server ready

And wait for internal server ready timeout 10s and will exit .

Environment

OS: centos 3.10.0-957.27.2.el7.x86_64
kubernetes: 1.13.2

@KarlCyan
Copy link
Author

I find function "SdNotify" use env "NOTIFY_SOCKET" to establish socket connection but I can't find it in the container env

@mYmNeo
Copy link
Contributor

mYmNeo commented Nov 21, 2019

Unable to set Type=notify in systemd service file?
This error is not the reason why the server is down.

How did you run gpu-manager? Please provides the details?

@KarlCyan
Copy link
Author

Unable to set Type=notify in systemd service file?
This error is not the reason why the server is down.

How did you run gpu-manager? Please provides the details?

Image name : gpu-manager:v1.0.0
I start using kubectl create -f gpu-manager.yaml , just change namespace and image name

@mYmNeo
Copy link
Contributor

mYmNeo commented Nov 21, 2019

In /etc/gpu-manager/log directory of each node, it places the log of the gpu-manager. Could you find and paste it?

@KarlCyan
Copy link
Author

In /etc/gpu-manager/log directory of each node, it places the log of the gpu-manager. Could you find and paste it?

I use parameter “--logtostderr” and run start.sh

# ... copy file

# ... mirror file

I1125 06:36:36.331062  477934 volume.go:158] Mirror /usr/local/nvidia/lib64/libnvcuvid.so.430.40 to /etc/gpu-manager/vdriver/origin/lib64
I1125 06:36:36.334182  477934 volume.go:158] Mirror /usr/local/nvidia/lib64/libcuda.so.430.40 to /etc/gpu-manager/vdriver/origin/lib64
I1125 06:36:36.346570  477934 volume.go:167] Driver version: 430.40
I1125 06:36:36.346592  477934 volume.go:158] Mirror /usr/local/nvidia/lib64/libOpenGL.so.0 to /etc/gpu-manager/vdriver/origin/lib64
I1125 06:36:36.347056  477934 volume.go:158] Mirror /usr/local/nvidia/lib64/libGLdispatch.so.0 to /etc/gpu-manager/vdriver/origin/lib64
I1125 06:36:36.347922  477934 volume.go:158] Mirror /usr/local/nvidia/lib64/libGLX_nvidia.so.430.40 to /etc/gpu-manager/vdriver/origin/lib64
I1125 06:36:36.349222  477934 volume.go:158] Mirror /usr/local/nvidia/lib64/libGLX.so.0 to /etc/gpu-manager/vdriver/origin/lib64
I1125 06:36:36.349545  477934 volume.go:158] Mirror /usr/local/nvidia/lib64/libGLESv2_nvidia.so.430.40 to /etc/gpu-manager/vdriver/origin/lib64
I1125 06:36:36.350007  477934 volume.go:158] Mirror /usr/local/nvidia/lib64/libGLESv2.so.2.1.0 to /etc/gpu-manager/vdriver/origin/lib64
I1125 06:36:36.350437  477934 volume.go:158] Mirror /usr/local/nvidia/lib64/libGLESv1_CM_nvidia.so.430.40 to /etc/gpu-manager/vdriver/origin/lib64
I1125 06:36:36.350856  477934 volume.go:158] Mirror /usr/local/nvidia/lib64/libGLESv1_CM.so.1.2.0 to /etc/gpu-manager/vdriver/origin/lib64
I1125 06:36:36.351242  477934 volume.go:158] Mirror /usr/local/nvidia/lib64/libGL.so.1.7.0 to /etc/gpu-manager/vdriver/origin/lib64
I1125 06:36:36.352192  477934 volume.go:158] Mirror /usr/local/nvidia/lib64/libEGL_nvidia.so.430.40 to /etc/gpu-manager/vdriver/origin/lib64
I1125 06:36:36.353594  477934 volume.go:158] Mirror /usr/local/nvidia/lib64/libEGL.so.1.1.0 to /etc/gpu-manager/vdriver/origin/lib64
I1125 06:36:36.354065  477934 volume.go:158] Mirror /usr/local/nvidia/bin/nvidia-cuda-mps-control to /etc/gpu-manager/vdriver/origin/bin
I1125 06:36:36.354429  477934 volume.go:158] Mirror /usr/local/nvidia/bin/nvidia-cuda-mps-server to /etc/gpu-manager/vdriver/origin/bin
I1125 06:36:36.354740  477934 volume.go:158] Mirror /usr/local/nvidia/bin/nvidia-debugdump to /etc/gpu-manager/vdriver/origin/bin
I1125 06:36:36.355194  477934 volume.go:158] Mirror /usr/local/nvidia/bin/nvidia-persistenced to /etc/gpu-manager/vdriver/origin/bin
I1125 06:36:36.355511  477934 volume.go:158] Mirror /usr/local/nvidia/bin/nvidia-smi to /etc/gpu-manager/vdriver/origin/bin
I1125 06:36:36.356749  477934 volume.go:189] Vcuda /usr/lib64/libcuda-control.so to /etc/gpu-manager/vdriver/nvidia/lib/libcuda.so.1
I1125 06:36:36.357310  477934 volume.go:200] Vcuda /usr/lib64/libcuda-control.so to /etc/gpu-manager/vdriver/nvidia/lib/libcuda.so
I1125 06:36:36.357841  477934 volume.go:189] Vcuda /usr/lib64/libcuda-control.so to /etc/gpu-manager/vdriver/nvidia/lib64/libcuda.so.1
I1125 06:36:36.358382  477934 volume.go:200] Vcuda /usr/lib64/libcuda-control.so to /etc/gpu-manager/vdriver/nvidia/lib64/libcuda.so
I1125 06:36:36.358888  477934 volume.go:215] Vcuda /usr/lib64/libcuda-control.so to /etc/gpu-manager/vdriver/nvidia/lib/libnvidia-ml.so.1
I1125 06:36:36.359418  477934 volume.go:226] Vcuda /usr/lib64/libcuda-control.so to /etc/gpu-manager/vdriver/nvidia/lib/libnvidia-ml.so
I1125 06:36:36.359932  477934 volume.go:215] Vcuda /usr/lib64/libcuda-control.so to /etc/gpu-manager/vdriver/nvidia/lib64/libnvidia-ml.so.1
I1125 06:36:36.360469  477934 volume.go:226] Vcuda /usr/lib64/libcuda-control.so to /etc/gpu-manager/vdriver/nvidia/lib64/libnvidia-ml.so
I1125 06:36:36.360490  477934 volume.go:135] Volume manager is running
E1125 06:36:36.360530  477934 server.go:114] Unable to set Type=notify in systemd service file?
I1125 06:36:36.759829  477934 app.go:68] Wait for internal server ready
I1125 06:36:37.760107  477934 app.go:68] Wait for internal server ready
I1125 06:36:38.760520  477934 app.go:68] Wait for internal server ready
I1125 06:36:39.760786  477934 app.go:68] Wait for internal server ready
I1125 06:36:40.762543  477934 app.go:68] Wait for internal server ready
I1125 06:36:41.763197  477934 app.go:68] Wait for internal server ready
I1125 06:36:42.763444  477934 app.go:68] Wait for internal server ready
I1125 06:36:43.763795  477934 app.go:68] Wait for internal server ready
I1125 06:36:44.764208  477934 app.go:68] Wait for internal server ready
W1125 06:36:45.764519  477934 app.go:74] Wait too long for server ready, restarting

@mYmNeo
Copy link
Contributor

mYmNeo commented Nov 25, 2019

Did you have unix socket file like vcore.sock, vmemory.sock in /var/lib/kubelet/device-plugins/? Your log didn't have a pattern like Server %s is ready at %s, that means the plugin servers were not started.

@KarlCyan
Copy link
Author

Did you have unix socket file like vcore.sock, vmemory.sock in /var/lib/kubelet/device-plugins/? Your log didn't have a pattern like Server %s is ready at %s, that means the plugin servers were not started.

I just have kubelet.sock in container path : /var/lib/kubelet/device-plugins/

@mYmNeo
Copy link
Contributor

mYmNeo commented Nov 25, 2019

Your log indicated that gpu-manager was stuck at https://github.com/tkestack/gpu-manager/blob/master/pkg/server/server.go#L155

@KarlCyan
Copy link
Author

KarlCyan commented Nov 26, 2019

Your log indicated that gpu-manager was stuck at https://github.com/tkestack/gpu-manager/blob/master/pkg/server/server.go#L155

I have solved the problem and gpu-manager is now working properly.
gpu-manager was stuck at https://github.com/tkestack/gpu-manager/blob/master/pkg/server/server.go#L141 , so I copy the config file from master node path /root/.kube/ to container path "/root/.kube/".

I think we need to add some notes and change gpu-manager.yaml

@KarlCyan
Copy link
Author

The problem has been solved. I will close this issu

@mYmNeo
Copy link
Contributor

mYmNeo commented Nov 26, 2019

Thanks for reporting this. I'll update the README about this ASAP

@chenjie222
Copy link

hi KarlCyan, I have the same problem as you,
image
can you help me?
My docker cgroup is systemd,then,I changed the cgroup to cgroupfs, it has the same problem.

@DeepDarkOdyssey
Copy link

Same issue as @chenjie222 here, any luck to find a solution? I followed https://cloud.tencent.com/developer/article/1685122 this blog and everything works fine and all pods in running, while the gpu-manger-daemon pod logs shows it stuck at server.go: 132
image
Just like what @chenjie222 went into. I tried many solutions like change the drive from systemd to cgroup, it won't work. The pod is running with no response.
Meanwhile I found the extra flags that need to be set from https://github.com/tkestack/gpu-manager/blob/master/docs/faq.md, but when I configured the gpu-manager like this:
image
the pod can't get started, it seems like the extra flags here directly pass to the gpu-manager as a call option but it doesn't has such option named "cgroup-driver". What did I missing?

@pandaoknight
Copy link

After I install nvidia-container-toolkit, the Unable to set Type=notify in systemd service file problem disapeared.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants