Skip to content

CLOSE_WAIT connections on master node when ELBs point there #43212

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dkapanidis opened this issue Mar 16, 2017 · 6 comments
Closed

CLOSE_WAIT connections on master node when ELBs point there #43212

dkapanidis opened this issue Mar 16, 2017 · 6 comments

Comments

@dkapanidis
Copy link

Is this a request for help? (If yes, you should use our troubleshooting guide and community support channels, see http://kubernetes.io/docs/troubleshooting/.):
No.

What keywords did you search in Kubernetes issues before filing this one? (If you have found any duplicates, you should instead reply there.):

CLOSE_WAIT


Is this a BUG REPORT or FEATURE REQUEST? (choose one):

BUG REPORT

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.4", GitCommit:"7243c69eb523aa4377bce883e7c0dd76b84709a1", GitTreeState:"clean", BuildDate:"2017-03-08T02:48:58Z", GoVersion:"go1.8", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.2", GitCommit:"08e099554f3c31f6e6f07b448ab3ed78d0520507", GitTreeState:"clean", BuildDate:"2017-01-12T04:52:34Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}

Environment:

  • Cloud provider or hardware configuration: AWS
  • OS (e.g. from /etc/os-release): Debian GNU/Linux 8 (jessie)
  • Kernel (e.g. uname -a): Linux ip-172-31-64-14 4.4.41-k8s Unit test coverage in Kubelet is lousy. (~30%) #1 SMP Mon Jan 9 15:34:39 UTC 2017 x86_64 GNU/Linux
  • Install tools: kops 1.5.3
  • Others:

What happened:

The problem was triggered when an Service type=LoadBalancer was left without ready Pods. This triggers a wave of CLOSE_WAIT on the master node(s) that is reproducible.

What you expected to happen:

There should not be any flooding of CLOSE_WAIT connections.

How to reproduce it (as minimally and precisely as possible):

  • Start a cluster with Kops v1.5.3 (kubernetes v1.5.2) at AWS
  • Create a Service type=LoadBalancer (without attached Pods)

This should trigger the CLOSE_WAIT on the master.

Take note that because the kops v1.5.3 is using taints instead of SchedulingDisabled (kubernetes/kops#639) the master nodes are also added under the ELB on AWS.

Anything else we need to know:

  • As a workaround the master can be tagged as unscheduled. This will configure ELB to exclude the master and the CLOSE_WAITs will stop raising.
kubectl patch node MASTER_NAME -p "{\"spec\":{\"unschedulable\":true}}"
  • If Pods are added to the LoadBalancer service, the CLOSE_WAITS will stop raising when ready.

  • The CLOSE_WAITs start raising on master when ELB is tagging the master node as "InService" not before that.

  • Once too many CLOSE_WAITs are generated the following error appears, and master is marked as not_ready and ssh is unresponsive. Logs were gathered from "AWS > Instance Settings > Get System Log"

TCP: out of memory… consider tuning tcp_mem
  • Issue has been reproduced in a different cluster and AWS account.

reported together with @mikim83

@justinsb
Copy link
Member

justinsb commented Mar 18, 2017

Thank you for the excellent report!

So my theory is this:

  1. CLOSE_WAIT happens when we get a TCP reset from the other end of the connection, but we don't close the socket
  2. For normal kube-proxy traffic using iptables, this won't happen - it is iptables packet-manipulation, not a "real connection"
  3. But when there are no pods in a service, the nodeport iptables rules are simply not there; connections to the nodeport are not rewritten.
  4. So during this time period, connections go to kube-proxy on the NodePort. kube-proxy is listening on the NodePort (because it wants to make sure the port is not in use), at https://github.com/kubernetes/kubernetes/blob/master/pkg/proxy/iptables/proxier.go#L1349 But kube-proxy never actually does anything with the socket. Technically we don't even accept the connections, and we certainly don't actively close them.
  5. Connections that hit the kube-proxy nodeport socket will be in CLOSE_WAIT until kube-proxy is restarted; effectively forever on a healthy node.

For 3: I confirmed that during the time a service had no pods, the KUBE-SVC-UV4XIKEQLMZEPCEV (in my case) was removed entirely...

-A KUBE-NODEPORTS -p tcp -m comment --comment "default/ingress-nginx:http" -m tcp --dport 31445 -j KUBE-MARK-MASQ
-A KUBE-NODEPORTS -p tcp -m comment --comment "default/ingress-nginx:http" -m tcp --dport 31445 -j KUBE-SVC-UV4XIKEQLMZEPCEV

And kube-proxy was listening on 31445 (the NodePort):

tcp6       0      0 :::31445                :::*                    LISTEN      1333/kube-proxy 
tcp6      53      0 172.20.122.50:31445     172.20.108.40:38369     CLOSE_WAIT  -               
tcp6      54      0 172.20.122.50:31445     172.20.104.161:2100     CLOSE_WAIT  -               
tcp6      54      0 172.20.122.50:31445     172.20.104.161:2092     CLOSE_WAIT  -               
tcp6      55      0 172.20.122.50:31445     172.20.104.161:44177    CLOSE_WAIT  -               
tcp6      53      0 172.20.122.50:31445     172.20.108.40:16199     CLOSE_WAIT  -               

Also, the number of CLOSE_WAIT connections went up during the time period when I restarted the pod in my service.

I confirmed that 172.20.104.161 and 172.20.108.40 are the IP addresses of my ELBs. They are doing TCP health checks every 5s (IIRC). It is also possible that the health check is an unusual TCP pattern (because it is not an HTTP health check; it merely opens and closes the connection).

For this particular issue, which was about the master, my suspicion is that the same will happen on the nodes, in that the kube-proxy configurations should be the same. If we actually know that this does not happen on the nodes, that is interesting information.

Two possible fixes spring to mind:

A) Add a rule when there are no Pods on the NodePort that rejects the connection. Efficient. But: iptables is never easy. Also I don't know if this will cause health checks to fail, which isn't wrong but would slow down recovery.
B) Have kube-proxy accept connections and immediately close them. This feels correct, though I think it will consume a goroutine per nodeport. But goroutines are cheap.

cc @felipejfc as this looks similar to what you are reporting in #41640

cc @thockin for kube-proxy guru-ness and advice on which option to pursue

@felipejfc
Copy link

felipejfc commented Mar 18, 2017

I actually had 2 namespaces with 3 services each(ELB type) that had no pods associated because someone forgotten do delete them after deleting the pods and they had healthchecks configured, after deleting the services today we've seen a massive networking performance boost (thousands of sockets in CLOSE_WAIT state got closed).

I'll look for other services with no pod associated and delete them and will keep an eye on the cluster

Thanks for helping @justinsb !!

k8s-github-robot pushed a commit that referenced this issue Mar 21, 2017
Automatic merge from submit-queue

Install a REJECT rule for nodeport with no backend

Rather than actually accepting the connection, REJECT.  This will avoid
CLOSE_WAIT.

Fixes #43212


@justinsb @felipejfc @Spiddy
@thockin
Copy link
Member

thockin commented May 11, 2017

@justinsb do you recall if we ported this back to 1.6?

@justinsb
Copy link
Member

justinsb commented May 11, 2017

@thockin looks like we got the first one into 1.6, but not the second one :-(

These are the two commits (for some reason github only shows the branches in this view, not the PR view...)

2ec8799

9a423b6

I did reopen the cherry-pick for the 1st to 1.5 this morning (I've been getting pings on this issue): #43858

Looks like we should get #43858 in, and then cherry-pick 9a423b6 to 1.5 and 1.6.

I do recommend to people hitting this in the real world that they remove services without endpoints - it is almost always just an error/oversight. I don't think it's a huge problem to leak a few connections on a restart if you happen to end up with no pods for a ~minute. Also, typically this saves the cost of an extra ELB.

@exarkun
Copy link

exarkun commented Jul 17, 2017

What version of Kubernetes is it expected that this issue is resolved in? The problem still manifests on my Kubernetes 1.6.3 deployment.

@0xMadao
Copy link

0xMadao commented Sep 28, 2017

still on kubernetes 1.6.4 on aws, service which type is load balancer in our cluster are all pods-associated, but still have thousands of CLOSE_WAIT

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants