Skip to content

ETCD with TLS showing warning "transport: authentication handshake failed: remote error: tls: bad certificate" #9785

Closed
@JinsYin

Description

@JinsYin

I refer to the following two articles:

https://github.com/coreos/etcd/blob/master/Documentation/op-guide/security.md
https://github.com/coreos/docs/blob/master/os/generate-self-signed-certificates.md

Initialize a certificate authority

$ cat ca-config.json
{
  "signing": {
    "default": {
      "expiry": "8760h"
    },
    "profiles": {
      "server": {
        "expiry": "8760h",
        "usages": [
          "signing",
          "key encipherment",
          "server auth"
        ]
      },
      "client": {
        "expiry": "8760h",
        "usages": [
          "signing",
          "key encipherment",
          "client auth"
        ]
      },
      "peer": {
        "expiry": "8760h",
        "usages": [
          "signing",
          "key encipherment",
          "server auth",
          "client auth"
        ]
      }
    }
  }
}

$ cat ca-csr.json
{
  "CN": "My own CA",
  "key": {
    "algo": "rsa",
    "size": 2048
  },
  "names": [
    {
      "C": "US",
      "L": "CA",
      "O": "My Company Name",
      "ST": "San Francisco",
      "OU": "Org Unit 1",
      "OU": "Org Unit 2"
    }
  ]
}

$ cfssl gencert -initca ca-csr.json | cfssljson -bare ca -

Generate server certificate

# cfssl print-defaults csr > server.json
$ cat server.json
{
  "CN": "etcd1",
  "hosts": [
    "192.168.1.221"
  ],
  "key": {
    "algo": "ecdsa",
    "size": 256
  },
  "names": [
    {
        "C": "US",
        "L": "CA",
        "ST": "San Francisco"
    }
  ]
}

$ cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=server server.json | cfssljson -bare server

Etcd Server

etcd --name infra0 --data-dir infra0 \
  --client-cert-auth --trusted-ca-file=ca.pem --cert-file=server.pem --key-file=server-key.pem \
  --advertise-client-urls https://127.0.0.1:2379 --listen-client-urls https://127.0.0.1:2379
2018-05-29 11:17:10.374455 I | etcdmain: etcd Version: 3.3.5
2018-05-29 11:17:10.374527 I | etcdmain: Git SHA: 70c872620
2018-05-29 11:17:10.374534 I | etcdmain: Go Version: go1.9.6
2018-05-29 11:17:10.374540 I | etcdmain: Go OS/Arch: linux/amd64
2018-05-29 11:17:10.374546 I | etcdmain: setting maximum number of CPUs to 4, total number of available CPUs is 4
2018-05-29 11:17:10.374859 I | embed: listening for peers on http://localhost:2380
2018-05-29 11:17:10.374899 I | embed: listening for client requests on 127.0.0.1:2379
2018-05-29 11:17:10.377043 I | etcdserver: name = infra0
2018-05-29 11:17:10.377067 I | etcdserver: data dir = infra0
2018-05-29 11:17:10.377074 I | etcdserver: member dir = infra0/member
2018-05-29 11:17:10.377079 I | etcdserver: heartbeat = 100ms
2018-05-29 11:17:10.377087 I | etcdserver: election = 1000ms
2018-05-29 11:17:10.377092 I | etcdserver: snapshot count = 100000
2018-05-29 11:17:10.377125 I | etcdserver: advertise client URLs = https://127.0.0.1:2379
2018-05-29 11:17:10.377133 I | etcdserver: initial advertise peer URLs = http://localhost:2380
2018-05-29 11:17:10.377143 I | etcdserver: initial cluster = infra0=http://localhost:2380
2018-05-29 11:17:10.379279 I | etcdserver: starting member 8e9e05c52164694d in cluster cdf818194e3a8c32
2018-05-29 11:17:10.379320 I | raft: 8e9e05c52164694d became follower at term 0
2018-05-29 11:17:10.379337 I | raft: newRaft 8e9e05c52164694d [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]
2018-05-29 11:17:10.379344 I | raft: 8e9e05c52164694d became follower at term 1
2018-05-29 11:17:10.385248 W | auth: simple token is not cryptographically signed
2018-05-29 11:17:10.388175 I | etcdserver: starting server... [version: 3.3.5, cluster version: to_be_decided]
2018-05-29 11:17:10.388842 I | etcdserver: 8e9e05c52164694d as single-node; fast-forwarding 9 ticks (election ticks 10)
2018-05-29 11:17:10.389395 I | etcdserver/membership: added member 8e9e05c52164694d [http://localhost:2380] to cluster cdf818194e3a8c32
2018-05-29 11:17:10.392890 I | embed: ClientTLS: cert = server.pem, key = server-key.pem, ca = , trusted-ca = ca.pem, client-cert-auth = true, crl-file = 
2018-05-29 11:17:10.479773 I | raft: 8e9e05c52164694d is starting a new election at term 1
2018-05-29 11:17:10.479819 I | raft: 8e9e05c52164694d became candidate at term 2
2018-05-29 11:17:10.479887 I | raft: 8e9e05c52164694d received MsgVoteResp from 8e9e05c52164694d at term 2
2018-05-29 11:17:10.479906 I | raft: 8e9e05c52164694d became leader at term 2
2018-05-29 11:17:10.479915 I | raft: raft.node: 8e9e05c52164694d elected leader 8e9e05c52164694d at term 2
2018-05-29 11:17:10.480540 I | etcdserver: published {Name:infra0 ClientURLs:[https://127.0.0.1:2379]} to cluster cdf818194e3a8c32
2018-05-29 11:17:10.480670 E | etcdmain: forgot to set Type=notify in systemd service file?
2018-05-29 11:17:10.480694 I | embed: ready to serve client requests
2018-05-29 11:17:10.480718 I | etcdserver: setting up the initial cluster version to 3.3
2018-05-29 11:17:10.481430 N | etcdserver/membership: set the initial cluster version to 3.3
2018-05-29 11:17:10.481638 I | etcdserver/api: enabled capabilities for version 3.3
2018-05-29 11:17:10.532133 I | embed: serving client requests on 127.0.0.1:2379
2018-05-29 11:17:10.539294 I | embed: rejected connection from "127.0.0.1:39794" (error "tls: failed to verify client's certificate: x509: certificate specifies an incompatible key usage", ServerName "")
WARNING: 2018/05/29 11:17:10 Failed to dial 127.0.0.1:2379: connection error: desc = "transport: authentication handshake failed: remote error: tls: bad certificate"; please retry.

Activity

JinsYin

JinsYin commented on May 29, 2018

@JinsYin
Author

When I replaced the server certificate with the peer certificate, the warning was gone. Why?

# -profile=peer
$ cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=peer server.json | cfssljson -bare server
$ etcd --name infra0 --data-dir infra0 \
  --client-cert-auth --trusted-ca-file=ca.pem --cert-file=server.pem --key-file=server-key.pem \
  --advertise-client-urls https://127.0.0.1:2379 --listen-client-urls https://127.0.0.1:2379
2018-05-29 11:21:09.053070 I | etcdmain: etcd Version: 3.3.5
2018-05-29 11:21:09.053133 I | etcdmain: Git SHA: 70c872620
2018-05-29 11:21:09.053141 I | etcdmain: Go Version: go1.9.6
2018-05-29 11:21:09.053146 I | etcdmain: Go OS/Arch: linux/amd64
2018-05-29 11:21:09.053152 I | etcdmain: setting maximum number of CPUs to 4, total number of available CPUs is 4
2018-05-29 11:21:09.053557 I | embed: listening for peers on http://localhost:2380
2018-05-29 11:21:09.053597 I | embed: listening for client requests on 127.0.0.1:2379
2018-05-29 11:21:09.055180 I | etcdserver: name = infra0
2018-05-29 11:21:09.055195 I | etcdserver: data dir = infra0
2018-05-29 11:21:09.055202 I | etcdserver: member dir = infra0/member
2018-05-29 11:21:09.055207 I | etcdserver: heartbeat = 100ms
2018-05-29 11:21:09.055212 I | etcdserver: election = 1000ms
2018-05-29 11:21:09.055220 I | etcdserver: snapshot count = 100000
2018-05-29 11:21:09.055230 I | etcdserver: advertise client URLs = https://127.0.0.1:2379
2018-05-29 11:21:09.055237 I | etcdserver: initial advertise peer URLs = http://localhost:2380
2018-05-29 11:21:09.055246 I | etcdserver: initial cluster = infra0=http://localhost:2380
2018-05-29 11:21:09.056700 I | etcdserver: starting member 8e9e05c52164694d in cluster cdf818194e3a8c32
2018-05-29 11:21:09.056732 I | raft: 8e9e05c52164694d became follower at term 0
2018-05-29 11:21:09.056747 I | raft: newRaft 8e9e05c52164694d [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]
2018-05-29 11:21:09.056753 I | raft: 8e9e05c52164694d became follower at term 1
2018-05-29 11:21:09.059841 W | auth: simple token is not cryptographically signed
2018-05-29 11:21:09.061318 I | etcdserver: starting server... [version: 3.3.5, cluster version: to_be_decided]
2018-05-29 11:21:09.061669 I | etcdserver: 8e9e05c52164694d as single-node; fast-forwarding 9 ticks (election ticks 10)
2018-05-29 11:21:09.062072 I | etcdserver/membership: added member 8e9e05c52164694d [http://localhost:2380] to cluster cdf818194e3a8c32
2018-05-29 11:21:09.063469 I | embed: ClientTLS: cert = server.pem, key = server-key.pem, ca = , trusted-ca = ca.pem, client-cert-auth = true, crl-file = 
2018-05-29 11:21:09.657081 I | raft: 8e9e05c52164694d is starting a new election at term 1
2018-05-29 11:21:09.657149 I | raft: 8e9e05c52164694d became candidate at term 2
2018-05-29 11:21:09.657179 I | raft: 8e9e05c52164694d received MsgVoteResp from 8e9e05c52164694d at term 2
2018-05-29 11:21:09.657203 I | raft: 8e9e05c52164694d became leader at term 2
2018-05-29 11:21:09.657215 I | raft: raft.node: 8e9e05c52164694d elected leader 8e9e05c52164694d at term 2
2018-05-29 11:21:09.657608 I | etcdserver: setting up the initial cluster version to 3.3
2018-05-29 11:21:09.658381 N | etcdserver/membership: set the initial cluster version to 3.3
2018-05-29 11:21:09.658457 I | etcdserver/api: enabled capabilities for version 3.3
2018-05-29 11:21:09.658520 I | etcdserver: published {Name:infra0 ClientURLs:[https://127.0.0.1:2379]} to cluster cdf818194e3a8c32
2018-05-29 11:21:09.658536 I | embed: ready to serve client requests
2018-05-29 11:21:09.658751 E | etcdmain: forgot to set Type=notify in systemd service file?
2018-05-29 11:21:09.712055 I | embed: serving client requests on 127.0.0.1:2379
changed the title [-]WARNING: Failed to dial 127.0.0.1:2379: connection error: desc = "transport: authentication handshake failed: remote error: tls: bad certificate"; please retry.[/-] [+]ETCD with TLS showing warning "transport: authentication handshake failed: remote error: tls: bad certificate"[/+] on May 29, 2018
hexfusion

hexfusion commented on May 29, 2018

@hexfusion
Contributor

@JinsYin your config defines server profile as server auth only while peer profile has both server auth and client auth extensions. I see how this is confusing as the example uses server in the file name.

embed: rejected connection from "127.0.0.1:39794" (error "tls: failed to verify client's certificate: x509: certificate specifies an incompatible key usage", ServerName "")
WARNING: 2018/05/29 11:17:10 Failed to dial 127.0.0.1:2379: connection error: desc = "transport: authentication handshake failed: remote error: tls: bad certificate"; please retry.

So it seems as soon as client auth is attempted it fails because the server config does not output certificates that will facilitate client auth. This is how I read it at least.

ref https://github.com/cloudflare/cfssl/blob/master/doc/cmd/cfssl.txt

JinsYin

JinsYin commented on May 30, 2018

@JinsYin
Author

@hexfusion I agree. My confusion is why etcd server needs client auth.

JinsYin

JinsYin commented on May 30, 2018

@JinsYin
Author

When I set the --client-cert-auth parameter to false, the warning was gone. So I guess the etcd process will do a health check as a client.

# server auth & --client-cert-auth=false
$ etcd --name infra0 --data-dir infra0 \
  --client-cert-auth=false --trusted-ca-file=ca.pem --cert-file=server.pem --key-file=server-key.pem \
  --advertise-client-urls https://127.0.0.1:2379 --listen-client-urls https://127.0.0.1:2379
2018-05-30 11:43:23.150450 I | etcdmain: etcd Version: 3.3.5
2018-05-30 11:43:23.150561 I | etcdmain: Git SHA: 70c872620
2018-05-30 11:43:23.150577 I | etcdmain: Go Version: go1.9.6
2018-05-30 11:43:23.150590 I | etcdmain: Go OS/Arch: linux/amd64
2018-05-30 11:43:23.150602 I | etcdmain: setting maximum number of CPUs to 4, total number of available CPUs is 4
2018-05-30 11:43:23.150699 N | etcdmain: the server is already initialized as member before, starting as etcd member...
2018-05-30 11:43:23.151409 I | embed: listening for peers on http://localhost:2380
2018-05-30 11:43:23.151494 I | embed: listening for client requests on 127.0.0.1:2379
2018-05-30 11:43:23.152450 I | etcdserver: name = infra0
2018-05-30 11:43:23.152471 I | etcdserver: data dir = infra0
2018-05-30 11:43:23.152484 I | etcdserver: member dir = infra0/member
2018-05-30 11:43:23.152496 I | etcdserver: heartbeat = 100ms
2018-05-30 11:43:23.152516 I | etcdserver: election = 1000ms
2018-05-30 11:43:23.152529 I | etcdserver: snapshot count = 100000
2018-05-30 11:43:23.152550 I | etcdserver: advertise client URLs = https://127.0.0.1:2379
2018-05-30 11:43:23.153964 I | etcdserver: restarting member 8e9e05c52164694d in cluster cdf818194e3a8c32 at commit index 14
2018-05-30 11:43:23.154047 I | raft: 8e9e05c52164694d became follower at term 7
2018-05-30 11:43:23.154074 I | raft: newRaft 8e9e05c52164694d [peers: [], term: 7, commit: 14, applied: 0, lastindex: 14, lastterm: 7]
2018-05-30 11:43:23.158976 W | auth: simple token is not cryptographically signed
2018-05-30 11:43:23.161144 I | etcdserver: starting server... [version: 3.3.5, cluster version: to_be_decided]
2018-05-30 11:43:23.162710 I | etcdserver/membership: added member 8e9e05c52164694d [http://localhost:2380] to cluster cdf818194e3a8c32
2018-05-30 11:43:23.163138 N | etcdserver/membership: set the initial cluster version to 3.3
2018-05-30 11:43:23.163261 I | etcdserver/api: enabled capabilities for version 3.3
2018-05-30 11:43:23.165712 I | embed: ClientTLS: cert = server.pem, key = server-key.pem, ca = , trusted-ca = ca.pem, client-cert-auth = false, crl-file = 
2018-05-30 11:43:25.054746 I | raft: 8e9e05c52164694d is starting a new election at term 7
2018-05-30 11:43:25.054839 I | raft: 8e9e05c52164694d became candidate at term 8
2018-05-30 11:43:25.054875 I | raft: 8e9e05c52164694d received MsgVoteResp from 8e9e05c52164694d at term 8
2018-05-30 11:43:25.054908 I | raft: 8e9e05c52164694d became leader at term 8
2018-05-30 11:43:25.054930 I | raft: raft.node: 8e9e05c52164694d elected leader 8e9e05c52164694d at term 8
2018-05-30 11:43:25.056827 I | etcdserver: published {Name:infra0 ClientURLs:[https://127.0.0.1:2379]} to cluster cdf818194e3a8c32
2018-05-30 11:43:25.056909 I | embed: ready to serve client requests
2018-05-30 11:43:25.057110 E | etcdmain: forgot to set Type=notify in systemd service file?
2018-05-30 11:43:25.113424 I | embed: serving client requests on 127.0.0.1:2379
detiber

detiber commented on Jun 12, 2018

@detiber

I found this issue as I was troubleshooting issues that arose during an etcd upgrade from 3.1.x to 3.2.x using kubeadm. After some debugging I was able to determine that the new (as of etcd 3.2.x) client usage requirement of the serving certificate is due to the use of the server certificate as a client certificate for the grpc gateway.

This requirement doesn't appear to be documented in any of the places I would expect, such as:
https://coreos.com/os/docs/latest/generate-self-signed-certificates.html
https://coreos.com/etcd/docs/latest/op-guide/security.html
https://coreos.com/etcd/docs/latest/dev-guide/api_grpc_gateway.html
https://coreos.com/etcd/docs/latest/op-guide/configuration.html
https://coreos.com/etcd/docs/latest/upgrades/upgrade_3_2.html

Ideally, I would expect there to be a configuration option to specify a separate client cert for the grpc gateway (and tangentially also be able to specify separate client/server certs for the peer certificates as well).

KIVagant

KIVagant commented on Oct 23, 2018

@KIVagant

TL;DR: How to fix the issue:

ca-config.json: add "client auth" to the "server" section

{
    "signing": {
        "default": {
            "expiry": "1000000h"
        },
        "profiles": {
            "server": {
                "expiry": "1000000h",
                "usages": [
                    "signing",
                    "key encipherment",
                    "server auth",
                    "client auth"
                ]
            },
            "client": {
                "expiry": "1000000h",
                "usages": [
                    "signing",
                    "key encipherment",
                    "client auth"
                ]
            },
            "peer": {
                "expiry": "43800h",
                "usages": [
                    "signing",
                    "key encipherment",
                    "server auth",
                    "client auth"
                ]
            }
        }
    }
}

Regenerate the cert

cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=server server.json | cfssljson -bare server

Check server certificate: (I copied it to /etc/etcd/server.pem)

$ openssl x509 -in /etc/etcd/server.pem -text -noout
Certificate:
    Data:
        Version: 3 (0x2)
...
    Signature Algorithm: sha256WithRSAEncryption
...
        X509v3 extensions:
            X509v3 Key Usage: critical
                Digital Signature, Key Encipherment
            X509v3 Extended Key Usage:
                TLS Web Server Authentication, TLS Web Client Authentication
            X509v3 Basic Constraints: critical
                CA:FALSE

Environment vars:

ETCD_CLIENT_CERT_AUTH=true
ETCD_KEY_FILE=/etc/etcd/server-key.pem
ETCD_CERT_FILE=/etc/etcd/server.pem
ETCD_TRUSTED_CA_FILE=/etc/etcd/ca.pem
...

Run etcd

sudo etcd --peer-auto-tls=true
...

KIVagant

KIVagant commented on Oct 23, 2018

@KIVagant

Btw, even after the issue was fixed, I still see a lot of messages like this in log:

embed: rejected connection from "35.111.222.111:41886" (error "EOF", ServerName "")

I feel like it could be related to health checks from a Network Load Balancer.

wenjiaswe

wenjiaswe commented on Oct 23, 2018

@wenjiaswe
Contributor

@JinsYin For your confusion about server and client auth, here is the up to date documentation on etcd tls setup, example 1 refers to "client-cert-auth" situation and example 2 refers to "client-cert-auth" set to true. Thanks to @KIVagant 's detailed demo!

@KIVagant for your "embed: rejected connection from "35.111.222.111:41886" (error "EOF", ServerName "")" comment, may I ask if you are using etcd in k8s? Because there is a bug in k8s that would lead to that. If you are, I will add more details, never mind if not.

40 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @mindcrime@KIVagant@detiber@hexfusion@dromadaire54

        Issue actions

          ETCD with TLS showing warning "transport: authentication handshake failed: remote error: tls: bad certificate" · Issue #9785 · etcd-io/etcd