Skip to content

Dockershim streaming server conflicts with NodePort #85418

Closed
@yvespp

Description

@yvespp
Contributor

What happened:
kubectl exec fails or times out:

Connection refused:

kubectl exec -it -n my-ns my-pod sh
error: unable to upgrade connection: error dialing backend: dial tcp 127.0.0.1:37751: connect: connection refused

Timout:

kubectl -n my-ns exec -it my-pod -v 99 bash
...
I1118 10:11:32.945560    9992 round_trippers.go:419] curl -k -v -XPOST  -H "X-Stream-Protocol-Version: v4.channel.k8s.io" -H "X-Stream-Protocol-Version: v3.channel.k8s.io" -H "X-Stream-Protocol-Version: v2.channel.k8s.io" -H "
X-Stream-Protocol-Version: channel.k8s.io" -H "User-Agent: kubectl.exe/v1.15.3 (windows/amd64) kubernetes/2d3c76f" -H "Authorization: Bearer kubeconfig-u-ob5wqxfcaq:fc5gvmsxt2j5z8s227gk5t9v7f5rc9hlc7fqpxv8tnm56g8lbjnws2" 'http
s://api-sever.mycorp.com/k8s/clusters/c-9skrw/api/v1/namespaces/my-ns/pods/my-pod/exec?command=bash&container=minio&stdin=true&stdout=true&tty=true'
I1118 10:13:43.699847    9992 round_trippers.go:438] POST https://api-sever.mycorp.com/k8s/clusters/c-9skrw/api/v1/namespaces/my-ns/pods/my-pod/exec?command=bash&container=minio&stdin=true&stdout
=true&tty=true 500 Internal Server Error in 130752 milliseconds
I1118 10:13:43.712034    9992 round_trippers.go:444] Response Headers:
I1118 10:13:43.712034    9992 round_trippers.go:447]     Server: openresty/1.15.8.1
I1118 10:13:43.712034    9992 round_trippers.go:447]     Date: Mon, 18 Nov 2019 09:13:43 GMT
I1118 10:13:43.713032    9992 round_trippers.go:447]     Content-Type: text/plain; charset=utf-8
I1118 10:13:43.713032    9992 round_trippers.go:447]     Content-Length: 79
I1118 10:13:43.713032    9992 round_trippers.go:447]     Connection: keep-alive
I1118 10:13:43.714030    9992 round_trippers.go:447]     X-Content-Type-Options: nosniff
I1118 10:13:43.714030    9992 round_trippers.go:447]     Strict-Transport-Security: max-age=15724800; includeSubDomains
F1118 10:13:43.717022    9992 helpers.go:114] error: unable to upgrade connection: error dialing backend: dial tcp 127.0.0.1:32935: connect: connection timed out

What you expected to happen: kubectl exec works...

How to reproduce it (as minimally and precisely as possible):

  • kube-proxy runs in ipvs mode
  • api-server config: service-node-port-range: 30000-39999
  • kubelet starts the docker shim streaming server in the NodePort range (here 127.0.0.1:32935):
root@kubedev-worker-8b005e396435:~# netstat -anp | grep kubelet
tcp        0      0 127.0.0.1:32935         0.0.0.0:*               LISTEN      2419/kubelet
tcp        0      0 127.0.0.1:10248         0.0.0.0:*               LISTEN      2419/kubelet
tcp        0      0 0.0.0.0:10250           0.0.0.0:*               LISTEN      2419/kubelet
  • create a service with the same NodePort the streaming server uses (here 32935)
  • Wait a few seconds so kube-proxy syncs then try to run kubectl exec with a pod on that node (kubedev-worker-8b005e396435).
  • Error: error: unable to upgrade connection: error dialing backend: dial tcp 127.0.0.1:32935: connect: connection refused

Anything else we need to know?:
I'm not sure how to reproduce the kubectl exec connection timed out problem but I observed that the kubelet streaming server was using a existing NodePort in this case as well. Maybe this happens after a reboot when the kubelet starts before the kube-proxy and the kubelet uses a NodePort that is already used...
Seems to me that the streaming server uses a random port that doesn't take into account the NodePort Range:

config.Addr = net.JoinHostPort("localhost", "0")

Maybe an option to specify the streaming server port would fix it?

Environment:

  • Kubernetes version (use kubectl version): v1.15.5 kubelet and api-server
  • Cloud provider or hardware configuration:
  • OS (e.g: cat /etc/os-release): Ubuntu 18.04.3 LTS
  • Kernel (e.g. uname -a): 5.0.0-31-generic
  • Install tools: kubeadm, kubectl
  • Network plugin and version (if this is a network-related bug): canal with calico v3.10
  • Others:

Activity

added
kind/bugCategorizes issue or PR as related to a bug.
on Nov 18, 2019
added
needs-sigIndicates an issue or PR lacks a `sig/foo` label and requires one.
on Nov 18, 2019
yvespp

yvespp commented on Nov 18, 2019

@yvespp
ContributorAuthor

/sig node

added
sig/nodeCategorizes an issue or PR as relevant to SIG Node.
and removed
needs-sigIndicates an issue or PR lacks a `sig/foo` label and requires one.
on Nov 18, 2019
yvespp

yvespp commented on Nov 19, 2019

@yvespp
ContributorAuthor

Fixed be setting sysctl net.ipv4.ip_local_port_range = 40000 60999

Maybe this should be documented somewhere?

gongguan

gongguan commented on Nov 19, 2019

@gongguan
Contributor

I wonder why you could use 32935 or 37751 as nodePort.
There is a DefaultServiceNodePortRange: 30000-32767

yvespp

yvespp commented on Nov 19, 2019

@yvespp
ContributorAuthor

You can set a custom node-port-range in api-server which we did:
service-node-port-range: 30000-39999

fejta-bot

fejta-bot commented on Feb 17, 2020

@fejta-bot

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

added
lifecycle/staleDenotes an issue or PR has remained open with no activity and has become stale.
on Feb 17, 2020
fejta-bot

fejta-bot commented on Mar 18, 2020

@fejta-bot

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

added
lifecycle/rottenDenotes an issue or PR that has aged beyond stale and will be auto-closed.
and removed
lifecycle/staleDenotes an issue or PR has remained open with no activity and has become stale.
on Mar 18, 2020
fejta-bot

fejta-bot commented on Apr 17, 2020

@fejta-bot

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot

k8s-ci-robot commented on Apr 17, 2020

@k8s-ci-robot
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot

k8s-ci-robot commented on Jun 17, 2020

@k8s-ci-robot
Contributor

@wktmeow: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

fl-max

fl-max commented on Jan 19, 2021

@fl-max

For anyone else stumbling upon this issue with the error error: unable to upgrade connection: error dialing backend: dial tcp 127.0.0.1:<port>, the problem for me was that the loopback device was never started (ifconfig lo). Simply running ifup lo fixed this issue for me.

JasonRD

JasonRD commented on Aug 6, 2021

@JasonRD

Had the same problem. In the cluster of someone cloud vendor, which set apiserver with option --service-node-port-range=30000-50000, the streaming server startup with port 32859, it conflicted with nodeport of one service.

AFAIK, the 'redirect-container-streaming' options will disable streaming server, but it had been removed from v1.20.
So, in the case, set option service-node-port-range with 30000-50000, the probability of confliction will be increases, with the number of nodeport increases. @gongguan

If i read it right,issue #100643 is disccussing it.

JasonRD

JasonRD commented on Aug 15, 2021

@JasonRD

/reopen

k8s-ci-robot

k8s-ci-robot commented on Aug 15, 2021

@k8s-ci-robot
Contributor

@JasonRD: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

amine250

amine250 commented on Aug 30, 2021

@amine250

/reopen

k8s-ci-robot

k8s-ci-robot commented on Aug 30, 2021

@k8s-ci-robot
Contributor

@amine250: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

aojea

aojea commented on Sep 5, 2021

@aojea
Member

dockershim has been deprecated so this issue is not likely to be reopen, if you have another issue then please open a new issue with all the details

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.lifecycle/rottenDenotes an issue or PR that has aged beyond stale and will be auto-closed.sig/nodeCategorizes an issue or PR as relevant to SIG Node.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @gongguan@aojea@amine250@yvespp@JasonRD

        Issue actions

          Dockershim streaming server conflicts with NodePort · Issue #85418 · kubernetes/kubernetes