Skip to content

Why docker overlay network is so poor? #37855

Open
@kevinsuo

Description

@kevinsuo

Hi,

recently I am doing some docker network research but I cannot figure out why the docker overlay network performance is so bad. I built two containers running on two VMs. I used docker VXLAN for the multi hosts connection. For the bandwidth measurement, I used iperf3.

For the VM-VM throughput, the bandwidth could be 20+Gbits/sec
[ 5] 0.00-1.00 sec 1.93 GBytes 16.6 Gbits/sec
[ 5] 1.00-2.00 sec 2.91 GBytes 25.0 Gbits/sec
[ 5] 2.00-3.00 sec 2.07 GBytes 21.8 Gbits/sec

However, the container-container throughput is just 2-3 Gbits/sec and the Retr happened a lot.
[ 4] 0.00-1.00 sec 291 MBytes 2.44 Gbits/sec 218 714 KBytes
[ 4] 1.00-2.00 sec 414 MBytes 3.47 Gbits/sec 663 942 KBytes
[ 4] 2.00-3.00 sec 384 MBytes 3.22 Gbits/sec 1182 846 KBytes

I also tested the container-container on the same VM, the throughput could also be 20+ Gbits/sec, which means the docker0 should not be the bottleneck for the multiple hosts connection.
[ 4] 0.00-1.00 sec 2.63 GBytes 22.6 Gbits/sec 328 657 KBytes
[ 4] 1.00-2.00 sec 3.14 GBytes 26.9 Gbits/sec 0 856 KBytes
[ 4] 2.00-3.00 sec 3.86 GBytes 33.2 Gbits/sec 0 856 KBytes

I also measure the CPU utilization on both client and server:

image

image

Both the client and server CPU do not use up although the VXLAN consumes extra CPU for encapsulation and decapsulation.

If the throughput cannot go up, there must be some resources limited. Could anyone give any hints why the throughput of docker overlay network is so poor?

Activity

zq-david-wang

zq-david-wang commented on Sep 19, 2018

@zq-david-wang
Contributor

The cpu usage signature on server side indicates that vxlan udp traffic could only be processed by one single processor. I guess u r running a kernel with a version kind of old.
Maybe you could try upgrading kernel, or figure out how to balance the cpu usage.
(I had the similar issue with centos7.0, kernel 3.10.x, with 10Gbit/s nic, vxlan bandwidth could only reach about 2Gbit/s, after upgrading kernel to 4.x, the bandwidth improved significantly

kevinsuo

kevinsuo commented on Sep 19, 2018

@kevinsuo
Author

@zq-david-wang thanks for your reply.
I also used a very new kernel 4.9. However, compared to the network without container overlay, the container network of vxlan is very bad as the above shows.

HosseinAgha

HosseinAgha commented on Nov 25, 2019

@HosseinAgha

@kevinsuo I'm experiencing the same issue except I have about 99% drop in throughput and many many tcp retransmissions.
I'm using the latest 19.03.5 Docker community edition, Ubuntu 18.04.3 with linux kernel 4.15.0-70-generic.

Here is the result running iperf between hosts:

[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec   903 MBytes   757 Mbits/sec  328             sender
[  4]   0.00-10.00  sec   899 MBytes   754 Mbits/sec                  receiver

When running iperf3 from 2 swarm services connected through overlay network (on the same 2 hosts):

[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  5.83 MBytes  4.89 Mbits/sec  922             sender
[  4]   0.00-10.00  sec  5.78 MBytes  4.85 Mbits/sec                  receiver

We found this issue when we encountered very slow performance in record propagation between our database replicas.

I've already checked #35082 and #33133 but I don't think they apply here as I'm not using an encrypted overlay network and iperf is not making a lot of parallel requests. The #30768 may also be related.

I think this is a major performance issue.

HosseinAgha

HosseinAgha commented on Nov 25, 2019

@HosseinAgha

I performed the same test on a similar 19.03.5 Docker community edition, Ubuntu 18.04.3 with linux kernel 4.15.0-1054-aws on an AWS instance.

Here is the result running iperf between 2 hosts:

[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  5.79 GBytes  4.98 Gbits/sec    0             sender
[  4]   0.00-10.00  sec  5.79 GBytes  4.97 Gbits/sec                  receiver

When running iperf3 from 2 swarm services connected through overlay network (on the same 2 hosts):

[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  5.30 GBytes  4.55 Gbits/sec  39527             sender
[  4]   0.00-10.00  sec  5.30 GBytes  4.55 Gbits/sec                  receiver

I don't get as much drop in throughput but the retransmission rate is still very high.

@thaJeztah I think there is a major issue in latest swarm overlay networking.

thaJeztah

thaJeztah commented on Nov 25, 2019

@thaJeztah
Member

I think there is a major issue in latest swarm overlay networking.

In your situation, this problem did not occur in older versions of docker in the same setup?

HosseinAgha

HosseinAgha commented on Nov 25, 2019

@HosseinAgha

No, I don't think so. We used to use docker swarm for our production servers 2 years ago and we did not have any issues.
I think that there may be something wrong with the network/instance configuration of our current cloud provider (which uses OpenStack) as the problem is less severe in AWS.
But the issue remains in any setting:
using swarm's overlay network we have drop in bandwidth + very high tcp packet retransmission rate

thaJeztah

thaJeztah commented on Nov 25, 2019

@thaJeztah
Member

@arkodg any ideas?

arkodg

arkodg commented on Dec 4, 2019

@arkodg
Contributor

The default overlay network created by docker has an MTU of 1500 which might limit BW if the host outgoing interface can support a higher MTU . Increasing the MTU of the overlay network is one knob that can be used to improve/tune network BW performance

I have a Swarm cluster with 2 nodes

Node1

Host primary interface has an MTU of 9001

ifconfig ens3
ens3      Link encap:Ethernet  HWaddr 0a:10:10:c4:94:5c  
          inet addr:172.31.10.181  Bcast:172.31.15.255  Mask:255.255.240.0
          inet6 addr: fe80::810:10ff:fec4:945c/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:9001  Metric:1
          RX packets:1883430 errors:0 dropped:0 overruns:0 frame:0
          TX packets:439535 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:5959127850 (5.9 GB)  TX bytes:47602234 (47.6 MB)

Created 3 iperf servers

  1. Attached to the host network
docker run --name iperf_host -d -ti --net host mustafaakin/alpine-iperf iperf -s
  1. Attached to a overlay network with default MTU (1500)
docker network create -d overlay --attachable iperf_overlay_no_mtu
docker run --name iperf_overlay_no_mtu -d -ti --net iperf_overlay_no_mtu mustafaakin/alpine-iperf iperf -s
  1. Attached to a overlay network with MTU = 8000 (comparable to the MTU of host interface which is 9000)
docker network create -d overlay --opt com.docker.network.driver.mtu=8000 --attachable iperf_overlay
docker run --name iperf_overlay -d -ti --net iperf_overlay mustafaakin/alpine-iperf iperf -s

Node 2

Ran 3 iperf client containers for each type of network

  1. Host Network
ocker run --net host -ti mustafaakin/alpine-iperf iperf -c 172.31.10.181 -m
Unable to find image 'mustafaakin/alpine-iperf:latest' locally
latest: Pulling from mustafaakin/alpine-iperf
Image docker.io/mustafaakin/alpine-iperf:latest uses outdated schema1 manifest format. Please upgrade to a schema2 image for better future compatibility. More information at https://docs.docker.com/registry/spec/deprecated-schema-v1/
12b41071e6ce: Pull complete 
4d55717007e4: Pull complete 
Digest: sha256:5724f79034d0f0e1843efe0d477fac55a22ad2b73f6967da49a683f3595727c0
Status: Downloaded newer image for mustafaakin/alpine-iperf:latest
------------------------------------------------------------
Client connecting to 172.31.10.181, TCP port 5001
TCP window size:  325 KByte (default)
------------------------------------------------------------
[  3] local 172.31.11.137 port 36500 connected with 172.31.10.181 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  1.78 GBytes  1.53 Gbits/sec
[  3] MSS size 8949 bytes (MTU 8989 bytes, unknown interface)
  1. Overlay Network with 1500 MTU
docker run --net iperf_overlay_no_mtu -ti mustafaakin/alpine-iperf iperf -c 10.0.2.2 -m
------------------------------------------------------------
Client connecting to 10.0.2.2, TCP port 5001
TCP window size: 45.0 KByte (default)
------------------------------------------------------------
[  3] local 10.0.2.4 port 34060 connected with 10.0.2.2 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  1002 MBytes   840 Mbits/sec
[  3] MSS size 1398 bytes (MTU 1438 bytes, unknown interface)
  1. Overlay Network with 8000 MTU (results much better than 2 and comparable to 1)
docker run --net iperf_overlay -ti mustafaakin/alpine-iperf iperf -c 10.0.1.2 -m
------------------------------------------------------------
Client connecting to 10.0.1.2, TCP port 5001
TCP window size:  325 KByte (default)
------------------------------------------------------------
[  3] local 10.0.1.4 port 53094 connected with 10.0.1.2 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  1.76 GBytes  1.51 Gbits/sec
[  3] MSS size 7898 bytes (MTU 7938 bytes, unknown interface)
HosseinAgha

HosseinAgha commented on Dec 4, 2019

@HosseinAgha

Thank you @arkodg for extensive test. I think it would be awesome if you mention the need for tuning the docker overlay network MTU in the documentation https://docs.docker.com/network/overlay
I was completely clueless until now.

2 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @thaJeztah@zq-david-wang@Glusk@arkodg@GordonTheTurtle

        Issue actions

          Why docker overlay network is so poor? · Issue #37855 · moby/moby