Skip to content

Dockershim streaming server conflicts with NodePort #85418

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
yvespp opened this issue Nov 18, 2019 · 16 comments
Closed

Dockershim streaming server conflicts with NodePort #85418

yvespp opened this issue Nov 18, 2019 · 16 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/node Categorizes an issue or PR as relevant to SIG Node.

Comments

@yvespp
Copy link
Contributor

yvespp commented Nov 18, 2019

What happened:
kubectl exec fails or times out:

Connection refused:

kubectl exec -it -n my-ns my-pod sh
error: unable to upgrade connection: error dialing backend: dial tcp 127.0.0.1:37751: connect: connection refused

Timout:

kubectl -n my-ns exec -it my-pod -v 99 bash
...
I1118 10:11:32.945560    9992 round_trippers.go:419] curl -k -v -XPOST  -H "X-Stream-Protocol-Version: v4.channel.k8s.io" -H "X-Stream-Protocol-Version: v3.channel.k8s.io" -H "X-Stream-Protocol-Version: v2.channel.k8s.io" -H "
X-Stream-Protocol-Version: channel.k8s.io" -H "User-Agent: kubectl.exe/v1.15.3 (windows/amd64) kubernetes/2d3c76f" -H "Authorization: Bearer kubeconfig-u-ob5wqxfcaq:fc5gvmsxt2j5z8s227gk5t9v7f5rc9hlc7fqpxv8tnm56g8lbjnws2" 'http
s://api-sever.mycorp.com/k8s/clusters/c-9skrw/api/v1/namespaces/my-ns/pods/my-pod/exec?command=bash&container=minio&stdin=true&stdout=true&tty=true'
I1118 10:13:43.699847    9992 round_trippers.go:438] POST https://api-sever.mycorp.com/k8s/clusters/c-9skrw/api/v1/namespaces/my-ns/pods/my-pod/exec?command=bash&container=minio&stdin=true&stdout
=true&tty=true 500 Internal Server Error in 130752 milliseconds
I1118 10:13:43.712034    9992 round_trippers.go:444] Response Headers:
I1118 10:13:43.712034    9992 round_trippers.go:447]     Server: openresty/1.15.8.1
I1118 10:13:43.712034    9992 round_trippers.go:447]     Date: Mon, 18 Nov 2019 09:13:43 GMT
I1118 10:13:43.713032    9992 round_trippers.go:447]     Content-Type: text/plain; charset=utf-8
I1118 10:13:43.713032    9992 round_trippers.go:447]     Content-Length: 79
I1118 10:13:43.713032    9992 round_trippers.go:447]     Connection: keep-alive
I1118 10:13:43.714030    9992 round_trippers.go:447]     X-Content-Type-Options: nosniff
I1118 10:13:43.714030    9992 round_trippers.go:447]     Strict-Transport-Security: max-age=15724800; includeSubDomains
F1118 10:13:43.717022    9992 helpers.go:114] error: unable to upgrade connection: error dialing backend: dial tcp 127.0.0.1:32935: connect: connection timed out

What you expected to happen: kubectl exec works...

How to reproduce it (as minimally and precisely as possible):

  • kube-proxy runs in ipvs mode
  • api-server config: service-node-port-range: 30000-39999
  • kubelet starts the docker shim streaming server in the NodePort range (here 127.0.0.1:32935):
root@kubedev-worker-8b005e396435:~# netstat -anp | grep kubelet
tcp        0      0 127.0.0.1:32935         0.0.0.0:*               LISTEN      2419/kubelet
tcp        0      0 127.0.0.1:10248         0.0.0.0:*               LISTEN      2419/kubelet
tcp        0      0 0.0.0.0:10250           0.0.0.0:*               LISTEN      2419/kubelet
  • create a service with the same NodePort the streaming server uses (here 32935)
  • Wait a few seconds so kube-proxy syncs then try to run kubectl exec with a pod on that node (kubedev-worker-8b005e396435).
  • Error: error: unable to upgrade connection: error dialing backend: dial tcp 127.0.0.1:32935: connect: connection refused

Anything else we need to know?:
I'm not sure how to reproduce the kubectl exec connection timed out problem but I observed that the kubelet streaming server was using a existing NodePort in this case as well. Maybe this happens after a reboot when the kubelet starts before the kube-proxy and the kubelet uses a NodePort that is already used...
Seems to me that the streaming server uses a random port that doesn't take into account the NodePort Range:

config.Addr = net.JoinHostPort("localhost", "0")

Maybe an option to specify the streaming server port would fix it?

Environment:

  • Kubernetes version (use kubectl version): v1.15.5 kubelet and api-server
  • Cloud provider or hardware configuration:
  • OS (e.g: cat /etc/os-release): Ubuntu 18.04.3 LTS
  • Kernel (e.g. uname -a): 5.0.0-31-generic
  • Install tools: kubeadm, kubectl
  • Network plugin and version (if this is a network-related bug): canal with calico v3.10
  • Others:
@yvespp yvespp added the kind/bug Categorizes issue or PR as related to a bug. label Nov 18, 2019
@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Nov 18, 2019
@yvespp
Copy link
Contributor Author

yvespp commented Nov 18, 2019

/sig node

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Nov 18, 2019
@yvespp
Copy link
Contributor Author

yvespp commented Nov 19, 2019

Fixed be setting sysctl net.ipv4.ip_local_port_range = 40000 60999

Maybe this should be documented somewhere?

@gongguan
Copy link
Contributor

gongguan commented Nov 19, 2019

I wonder why you could use 32935 or 37751 as nodePort.
There is a DefaultServiceNodePortRange: 30000-32767

@yvespp
Copy link
Contributor Author

yvespp commented Nov 19, 2019

You can set a custom node-port-range in api-server which we did:
service-node-port-range: 30000-39999

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 17, 2020
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 18, 2020
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot
Copy link
Contributor

@wktmeow: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@fl-max
Copy link

fl-max commented Jan 19, 2021

For anyone else stumbling upon this issue with the error error: unable to upgrade connection: error dialing backend: dial tcp 127.0.0.1:<port>, the problem for me was that the loopback device was never started (ifconfig lo). Simply running ifup lo fixed this issue for me.

@JasonRD
Copy link

JasonRD commented Aug 6, 2021

Had the same problem. In the cluster of someone cloud vendor, which set apiserver with option --service-node-port-range=30000-50000, the streaming server startup with port 32859, it conflicted with nodeport of one service.

AFAIK, the 'redirect-container-streaming' options will disable streaming server, but it had been removed from v1.20.
So, in the case, set option service-node-port-range with 30000-50000, the probability of confliction will be increases, with the number of nodeport increases. @gongguan

If i read it right,issue #100643 is disccussing it.

@JasonRD
Copy link

JasonRD commented Aug 15, 2021

/reopen

@k8s-ci-robot
Copy link
Contributor

@JasonRD: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@amine250
Copy link

/reopen

@k8s-ci-robot
Copy link
Contributor

@amine250: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@aojea
Copy link
Member

aojea commented Sep 5, 2021

dockershim has been deprecated so this issue is not likely to be reopen, if you have another issue then please open a new issue with all the details

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
None yet
Development

No branches or pull requests

8 participants