Skip to content

Install OpenShift 3.11 get error: Could not find csr for nodes #11365

Closed
@scao0920

Description

@scao0920

Description

Provide a brief description of your issue here. For example:
Installed OpenShift 3.1.1 into Redhat 7.6 and got error: Could find csr for nodes

On a multi master install, if the first master goes down we can no: N/A only 1 master
longer scaleup the cluster with new nodes or masters: N/A

Version

Please put the following version information in the code block
indicated below.

  • Your ansible version per ansible --version
    ansible 2.6.14
    config file = /usr/share/ansible/openshift-ansible/ansible.cfg
    configured module search path = [u'/root/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
    ansible python module location = /usr/lib/python2.7/site-packages/ansible
    executable location = /usr/bin/ansible
    python version = 2.7.5 (default, Sep 12 2018, 05:31:16) [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)]

If you're running from playbooks installed via RPM

  • The output of rpm -q openshift-ansible
    ansible-2.6.14-1.el7ae.noarch
    Place the output between the code block below:
VERSION INFORMATION HERE PLEASE

Steps To Reproduce

Step1: Prepare VMs:
I have 4 Redhat7.6 VMs
Followed the doc https://docs.openshift.com/container-platform/3.11/install/index.html
to install setup hosts, Inventory File (/etc/ansible/hosts):

Create an OSEv3 group that contains the masters, nodes,

[OSEv3:children]
masters
nodes
etcd

Set variables common for all OSEv3 hosts

[OSEv3:vars]
os_firewall_use_firewalld=True

SSH user, this user should allow ssh based auth without requiring a password

ansible_ssh_user=root

If ansible_ssh_user is not root, ansible_become must be set to true

ansible_become=false
openshift_master_default_subdomain=apps.fyre.ibm.com
openshift_deployment_type=openshift-enterprise
oreg_url=registry.redhat.io/openshift3/ose-${component}:${version}
oreg_auth_user=
oreg_auth_password=xxxxxxxxxxxxxxxxxxxxxx

uncomment the following to enable htpasswd authentication; defaults to DenyAllPasswordIdentityProvider

#openshift_master_identity_providers=[{'name': 'htpasswd_auth', 'login': 'true', 'challenge': 'true', 'kind': 'HTPasswdPasswordIdentityProvider'}]

host group for masters

[masters]
scaorh-master.fyre.ibm.com

host group for etcd

[etcd]
scaorh-master.fyre.ibm.com

host group for nodes, includes region info

[nodes]
scaorh-master.fyre.ibm.com openshift_node_group_name='node-config-master'
scaorh-worker1.fyre.ibm.com openshift_node_group_name='node-config-compute'
scaorh1-worker2.fyre.ibm.com openshift_node_group_name='node-config-compute'
scaorh2-infranode.fyre.ibm.com openshift_node_group_name='node-config-infra'

Step 2: deploy:
cd /usr/share/ansible/openshift-ansible
Run:
ansible-playbook -i /etc/ansible/hosts playbooks/prerequisites.yml

ansible-playbook -i /etc/ansible/hosts playbooks/deploy_cluster.yml
Got error:
TASK [Approve node certificates when bootstrapping] ***********************************************************
Sunday 17 March 2019 12:36:15 -0700 (0:00:00.137) 0:30:15.928 **********
FAILED - RETRYING: Approve node certificates when bootstrapping (30 retries left).
FAILED - RETRYING: Approve node certificates when bootstrapping (29 retries left).
FAILED - RETRYING: Approve node certificates when bootstrapping (28 retries left).
FAILED - RETRYING: Approve node certificates when bootstrapping (27 retries left).
FAILED - RETRYING: Approve node certificates when bootstrapping (26 retries left).
FAILED - RETRYING: Approve node certificates when bootstrapping (25 retries left).
FAILED - RETRYING: Approve node certificates when bootstrapping (24 retries left).
FAILED - RETRYING: Approve node certificates when bootstrapping (23 retries left).
FAILED - RETRYING: Approve node certificates when bootstrapping (22 retries left).
FAILED - RETRYING: Approve node certificates when bootstrapping (21 retries left).
FAILED - RETRYING: Approve node certificates when bootstrapping (20 retries left).
FAILED - RETRYING: Approve node certificates when bootstrapping (19 retries left).
FAILED - RETRYING: Approve node certificates when bootstrapping (18 retries left).
FAILED - RETRYING: Approve node certificates when bootstrapping (17 retries left).
FAILED - RETRYING: Approve node certificates when bootstrapping (16 retries left).
FAILED - RETRYING: Approve node certificates when bootstrapping (15 retries left).
FAILED - RETRYING: Approve node certificates when bootstrapping (14 retries left).
FAILED - RETRYING: Approve node certificates when bootstrapping (13 retries left).
FAILED - RETRYING: Approve node certificates when bootstrapping (12 retries left).
FAILED - RETRYING: Approve node certificates when bootstrapping (11 retries left).
FAILED - RETRYING: Approve node certificates when bootstrapping (10 retries left).
FAILED - RETRYING: Approve node certificates when bootstrapping (9 retries left).
FAILED - RETRYING: Approve node certificates when bootstrapping (8 retries left).
FAILED - RETRYING: Approve node certificates when bootstrapping (7 retries left).
FAILED - RETRYING: Approve node certificates when bootstrapping (6 retries left).
FAILED - RETRYING: Approve node certificates when bootstrapping (5 retries left).
FAILED - RETRYING: Approve node certificates when bootstrapping (4 retries left).
FAILED - RETRYING: Approve node certificates when bootstrapping (3 retries left).
FAILED - RETRYING: Approve node certificates when bootstrapping (2 retries left).
FAILED - RETRYING: Approve node certificates when bootstrapping (1 retries left).
fatal: [scaorh-master.fyre.ibm.com]: FAILED! => {"all_subjects_found": ["subject=/O=system:nodes/CN=system:node:scaorh-master.fyre.ibm.com\n", "subject=/O=system:nodes/CN=system:node:scaorh-master.fyre.ibm.com\n", "subject=/O=system:nodes/CN=system:node:scaorh-master.fyre.ibm.com\n", "subject=/O=system:nodes/CN=system:node:scaorh-master.fyre.ibm.com\n", "subject=/O=system:nodes/CN=system:node:scaorh1-worker2.fyre.ibm.com\n", "subject=/O=system:nodes/CN=system:node:scaorh-worker1.fyre.ibm.com\n"], "attempts": 30, "changed": false, "client_approve_results": [], "client_csrs": {"node-csr-8e-uSNcl4xSbMe02CoIcaelY5mjC1eqCIXaXEu4Vjco": "scaorh1-worker2.fyre.ibm.com", "node-csr-J-1_iIVS5-hgaQz5xGifBwWTf5l4CcXgvOzvKs7yufU": "scaorh-worker1.fyre.ibm.com"}, "msg": "Could not find csr for nodes: scaorh2-infranode.fyre.ibm.com", "oc_get_nodes": {"apiVersion": "v1", "items": [{"apiVersion": "v1", "kind": "Node", "metadata": {"annotations": {"node.openshift.io/md5sum": "6ada87691866d0068b8c8cfe0df773b2", "volumes.kubernetes.io/controller-managed-attach-detach": "true"}, "creationTimestamp": "2019-03-17T19:26:30Z", "labels": {"beta.kubernetes.io/arch": "amd64", "beta.kubernetes.io/os": "linux", "kubernetes.io/hostname": "scaorh-master.fyre.ibm.com", "node-role.kubernetes.io/master": "true"}, "name": "scaorh-master.fyre.ibm.com", "namespace": "", "resourceVersion": "2860", "selfLink": "/api/v1/nodes/scaorh-master.fyre.ibm.com", "uid": "90c98d93-48ea-11e9-bf0d-00163e01f117"}, "spec": {}, "status": {"addresses": [{"address": "172.16.241.23", "type": "InternalIP"}, {"address": "scaorh-master.fyre.ibm.com", "type": "Hostname"}], "allocatable": {"cpu": "16", "hugepages-1Gi": "0", "hugepages-2Mi": "0", "memory": "32676344Ki", "pods": "250"}, "capacity": {"cpu": "16", "hugepages-1Gi": "0", "hugepages-2Mi": "0", "memory": "32778744Ki", "pods": "250"}, "conditions": [{"lastHeartbeatTime": "2019-03-17T19:39:14Z", "lastTransitionTime": "2019-03-17T19:26:30Z", "message": "kubelet has sufficient disk space available", "reason": "KubeletHasSufficientDisk", "status": "False", "type": "OutOfDisk"}, {"lastHeartbeatTime": "2019-03-17T19:39:14Z", "lastTransitionTime": "2019-03-17T19:26:30Z", "message": "kubelet has sufficient memory available", "reason": "KubeletHasSufficientMemory", "status": "False", "type": "MemoryPressure"}, {"lastHeartbeatTime": "2019-03-17T19:39:14Z", "lastTransitionTime": "2019-03-17T19:26:30Z", "message": "kubelet has no disk pressure", "reason": "KubeletHasNoDiskPressure", "status": "False", "type": "DiskPressure"}, {"lastHeartbeatTime": "2019-03-17T19:39:14Z", "lastTransitionTime": "2019-03-17T19:26:30Z", "message": "kubelet has sufficient PID available", "reason": "KubeletHasSufficientPID", "status": "False", "type": "PIDPressure"}, {"lastHeartbeatTime": "2019-03-17T19:39:14Z", "lastTransitionTime": "2019-03-17T19:26:30Z", "message": "runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized", "reason": "KubeletNotReady", "status": "False", "type": "Ready"}], "daemonEndpoints": {"kubeletEndpoint": {"Port": 10250}}, "images": [{"names": ["registry.redhat.io/openshift3/ose-node@sha256:8d28f961c74f033b3df9ed0d7a2a1bfb5e6ebb0611cb6b018f7e623961f7ea52", "registry.redhat.io/openshift3/ose-node:v3.11"], "sizeBytes": 1171108452}, {"names": ["registry.redhat.io/openshift3/ose-control-plane@sha256:200a14df0fdf3c467588f5067ab015cd316e49856114ba7602d4ca9e5f42b0f3", "registry.redhat.io/openshift3/ose-control-plane:v3.11"], "sizeBytes": 808610884}, {"names": ["registry.redhat.io/rhel7/etcd@sha256:be1c3e3f002ac41c35f2994f1c0cb3bd28a8ff59674941ca1a6223a8b72c2758", "registry.redhat.io/rhel7/etcd:3.2.22"], "sizeBytes": 259048769}, {"names": ["registry.redhat.io/openshift3/ose-pod@sha256:f27c68d225803ca3a97149083b5211ccc3def3230f8147fd017eef5b11d866d5", "registry.redhat.io/openshift3/ose-pod:v3.11", "registry.redhat.io/openshift3/ose-pod:v3.11.88"], "sizeBytes": 238366131}], "nodeInfo": {"architecture": "amd64", "bootID": "bdeaf185-56b0-4cff-b344-2fe95351d324", "containerRuntimeVersion": "docker://1.13.1", "kernelVersion": "3.10.0-957.5.1.el7.x86_64", "kubeProxyVersion": "v1.11.0+d4cacc0", "kubeletVersion": "v1.11.0+d4cacc0", "machineID": "cbb00030e5204543a0474ffff17ec26f", "operatingSystem": "linux", "osImage": "OpenShift Enterprise", "systemUUID": "E21E048B-6EB8-4685-A3EA-57F5CF1F2BF3"}}}], "kind": "List", "metadata": {"resourceVersion": "", "selfLink": ""}}, "raw_failures": [], "rc": 0, "server_approve_results": [], "server_csrs": null, "state": "unknown", "unwanted_csrs": [{"apiVersion": "certificates.k8s.io/v1beta1", "kind": "CertificateSigningRequest", "metadata": {"creationTimestamp": "2019-03-17T19:36:13Z", "generateName": "csr-", "name": "csr-58dj9", "namespace": "", "resourceVersion": "2555", "selfLink": "/apis/certificates.k8s.io/v1beta1/certificatesigningrequests/csr-58dj9", "uid": "ecbad18b-48eb-11e9-bf0d-00163e01f117"}, "spec": {"groups": ["system:nodes", "system:authenticated"], "request": "LS0tLS1CRUdJTiBDRVJUSUZJQ0FURSBSRVFVRVNULS0tLS0KTUlJQlR6Q0I5Z0lCQURCSU1SVXdFd1lEVlFRS0V3eHplWE4wWlcwNmJtOWtaWE14THpBdEJnTlZCQU1USm5ONQpjM1JsYlRwdWIyUmxPbk5qWVc5eWFDMXRZWE4wWlhJdVpubHlaUzVwWW0wdVkyOXRNRmt3RXdZSEtvWkl6ajBDCkFRWUlLb1pJemowREFRY0RRZ0FFS1VZbGZFai9WUlFQL09ETFpORDFMYXh4VnNGc0RaSllTeDBkOGdEUityWVcKaC9rUUhFL0QvVHE4SHIwOENRT2pQaGlkbHFGWkZjcExkQlpMSVdQcWdLQk1NRW9HQ1NxR1NJYjNEUUVKRGpFOQpNRHN3T1FZRFZSMFJCREl3TUlJYWMyTmhiM0pvTFcxaGMzUmxjaTVtZVhKbExtbGliUzVqYjIyQ0FJY0VyQkR4CkY0Y0VDUjdDbzRjRXJCRUFBVEFLQmdncWhrak9QUVFEQWdOSUFEQkZBaUJMRmVrbmRjVm4zSGlYNGVwN0ZOMi8KTi9WYm5VbXlINmhTb1VOUFowTWE1Z0loQU5zdGU4QUNSR1BnWGNIS3YzT0g3cnNEWk92N1FuVm5XOFNOUWZUTwpzMm9rCi0tLS0tRU5EIENFUlRJRklDQVRFIFJFUVVFU1QtLS0tLQo=", "usages": ["digital signature", "key encipherment", "server auth"], "username": "system:node:scaorh-master.fyre.ibm.com"}, "status": {}}, {"apiVersion": "certificates.k8s.io/v1beta1", "kind": "CertificateSigningRequest", "metadata": {"creationTimestamp": "2019-03-17T19:26:52Z", "generateName": "csr-", "name": "csr-lzvjj", "namespace": "", "resourceVersion": "949", "selfLink": "/apis/certificates.k8s.io/v1beta1/certificatesigningrequests/csr-lzvjj", "uid": "9e264342-48ea-11e9-bf0d-00163e01f117"}, "spec": {"groups": ["system:masters", "system:cluster-admins", "system:authenticated"], "request": "LS0tLS1CRUdJTiBDRVJUSUZJQ0FURSBSRVFVRVNULS0tLS0KTUlJQkJEQ0JxZ0lCQURCSU1SVXdFd1lEVlFRS0V3eHplWE4wWlcwNmJtOWtaWE14THpBdEJnTlZCQU1USm5ONQpjM1JsYlRwdWIyUmxPbk5qWVc5eWFDMXRZWE4wWlhJdVpubHlaUzVwWW0wdVkyOXRNRmt3RXdZSEtvWkl6ajBDCkFRWUlLb1pJemowREFRY0RRZ0FFdm1CRmppdm9qMlBkWDJyRmM0eE5rVERSYjROclVWSGRCRDFNRk50OHV2L1AKdTZ3aUdVbTZpdTRqOVdrb2Y1TS9LOUE2eGRBdVRlUzU2WkRRaEdNSllxQUFNQW9HQ0NxR1NNNDlCQU1DQTBrQQpNRVlDSVFDS3o4dVBqcSt0ZzJwNkNxdC9NZks0OGQ2cjFFWUNEeHRhcmFjMlRpN3I1QUloQU4yeUY2QVlUcU5LCmhNVlJKSTJIMzIxVWN0R08zRi9wbTltL1IreDhYMTFuCi0tLS0tRU5EIENFUlRJRklDQVRFIFJFUVVFU1QtLS0tLQo=", "usages": ["digital signature", "key encipherment", "client auth"], "username": "system:admin"}, "status": {"certificate": "LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUNoVENDQVcyZ0F3SUJBZ0lVSEFMc0FQQXNlYXFmUUhpUytMM2hIWHFEWDZzd0RRWUpLb1pJaHZjTkFRRUwKQlFBd
PLAY RECAP ****************************************************************************************************
localhost : ok=11 changed=0 unreachable=0 failed=0
scaorh-master.fyre.ibm.com : ok=487 changed=238 unreachable=0 failed=1
scaorh-worker1.fyre.ibm.com : ok=109 changed=66 unreachable=0 failed=0
scaorh1-worker2.fyre.ibm.com : ok=109 changed=66 unreachable=0 failed=0
scaorh2-infranode.fyre.ibm.com : ok=101 changed=19 unreachable=0 failed=0

INSTALLER STATUS **********************************************************************************************
Initialization : Complete (0:00:25)
Health Check : Complete (0:00:55)
Node Bootstrap Preparation : Complete (0:13:04)
etcd Install : Complete (0:02:25)
Master Install : Complete (0:07:07)
Master Additional Install : Complete (0:06:11)
Node Join : In Progress (0:03:06)
This phase can be restarted by running: playbooks/openshift-node/join.yml
Sunday 17 March 2019 12:39:16 -0700 (0:03:01.349) 0:33:17.277 **********

cockpit : Install cockpit-ws ------------------------------------------------------------------------- 316.13s
openshift_node : install needed rpm(s) --------------------------------------------------------------- 237.61s
Approve node certificates when bootstrapping --------------------------------------------------------- 181.35s
openshift_node : Install iSCSI storage plugin dependencies ------------------------------------------- 120.08s
openshift_node : Install node, clients, and conntrack packages --------------------------------------- 103.55s
etcd : Install etcd ----------------------------------------------------------------------------------- 83.24s
openshift_control_plane : Wait for all control plane pods to become ready ----------------------------- 70.09s
Run health checks (install) - EL ---------------------------------------------------------------------- 54.79s
openshift_control_plane : Wait for control plane pods to appear --------------------------------------- 54.14s
openshift_node : Install Ceph storage plugin dependencies --------------------------------------------- 47.59s
openshift_node : Install dnsmasq ---------------------------------------------------------------------- 46.75s
openshift_ca : Install the base package for admin tooling --------------------------------------------- 45.79s
openshift_node : Install GlusterFS storage plugin dependencies ---------------------------------------- 43.07s
openshift_excluder : Install openshift excluder - yum ------------------------------------------------- 39.41s
openshift_excluder : Install docker excluder - yum ---------------------------------------------------- 24.91s
openshift_cli : Install clients ----------------------------------------------------------------------- 24.76s
openshift_node_group : Wait for the sync daemonset to become ready and available ---------------------- 11.54s
openshift_manageiq : Configure role/user permissions -------------------------------------------------- 10.10s
nickhammond.logrotate : nickhammond.logrotate | Install logrotate -------------------------------------- 9.12s
openshift_node : Install NFS storage plugin dependencies ----------------------------------------------- 8.84s

Failure summary:

  1. Hosts: scaorh-master.fyre.ibm.com
    Play: Approve any pending CSR requests from inventory nodes
    Task: Approve node certificates when bootstrapping
    Message: Could not find csr for nodes: scaorh2-infranode.fyre.ibm.com
Expected Results

Describe what you expected to happen.

Example command and output or error messages
Observed Results

Describe what is actually happening.

Example command and output or error messages

For long output or logs, consider using a gist

Additional Information

Provide any additional information which may help us diagnose the
issue.

  • Your operating system and version, ie: RHEL 7.2, Fedora 23 ($ cat /etc/redhat-release)
  • Your inventory file (especially any non-standard configuration parameters)
  • Sample code, etc
    Red Hat Enterprise Linux Server release 7.6 (Maipo)
EXTRA INFORMATION GOES HERE

Activity

rafaelvico

rafaelvico commented on Mar 25, 2019

@rafaelvico

Same Here

ficofer

ficofer commented on Apr 17, 2019

@ficofer

Same issue with me similar inventory, no idea why its going on

Deepika-Kamalla

Deepika-Kamalla commented on Apr 18, 2019

@Deepika-Kamalla

Same issue with me.

INSTALLER STATUS ******************************************************************************************************************************************************************************
Initialization : Complete (0:00:48)
Health Check : Complete (0:01:01)
Node Bootstrap Preparation : Complete (0:05:06)
etcd Install : Complete (0:01:55)
Master Install : Complete (0:09:24)
Master Additional Install : Complete (0:01:26)
Node Join : In Progress (0:03:37)
This phase can be restarted by running: playbooks/openshift-node/join.yml

Failure summary:

  1. Hosts: okd-Master.xyz.com
    Play: Approve any pending CSR requests from inventory nodes
    Task: Approve node certificates when bootstrapping
    Message: Could not find csr for nodes: okd-worker-node2.xyz.com
SaqibHussain44

SaqibHussain44 commented on Apr 22, 2019

@SaqibHussain44

Same issue Could not find csr for nodes:

Initialization              : Complete (0:00:24)
Health Check                : Complete (0:00:15)
Node Bootstrap Preparation  : Complete (0:01:43)
etcd Install                : Complete (0:00:24)
Master Install              : Complete (0:02:39)
Master Additional Install   : Complete (0:00:27)
Node Join                   : In Progress (0:02:57)

This phase can be restarted by running: playbooks/openshift-node/join.yml
scao0920

scao0920 commented on Apr 22, 2019

@scao0920
Author

I opened the issue. Finally, I have openshift installed.
The way I get away from the error is combine the master node and the infra node into one.
here is what I changed to inventory file. I share master and infra.

host group for nodes, includes region info

[nodes]
scrh-master.fyre.ibm.com openshift_node_group_name='node-config-master-infra'
scrh-worker1.fyre.ibm.com openshift_node_group_name='node-config-compute'
scrh-worker2.fyre.ibm.com openshift_node_group_name='node-config-compute'
scrh1-worker3.fyre.ibm.com openshift_node_group_name='node-config-compute'

ztanaka1971

ztanaka1971 commented on Apr 25, 2019

@ztanaka1971

I have same issue. My cluster consists of two nodes, one master and one compute node. As scao0920 mentioned, I've modified "node-config-master" to "node-config-master-infra" in /etc/ansible/hosts and re-run the playbook, but still same symptom persists.
BTW I realized that port 53 of master node was blocked by iptables in my case, so I opened it for DNS. But looks like it also has nothing to do with this issue.

rahnarsson

rahnarsson commented on May 2, 2019

@rahnarsson

I had the same issue running on OpenStack, but fixed by making sure that my configured hostnames matched exactly the inventory file.

Before the change the DNS was pointing correctly to node1.example.com but hostname was something like node1.novalocal. Fixed the hostnames and rebooted the nodes and playbook went through ok.

harryliu123

harryliu123 commented on May 24, 2019

@harryliu123

i had the same issue running on ansible openshift 3.11
i try run
" ansible-playbook -i inventory /usr/share/ansible/openshift-ansible/playbooks/deploy_cluster.yml --limit @/usr/share/ansible/openshift-ansible/playbooks/deploy_cluster.retry "
my cluster OK but isn't included router , webconsloe

#install router at master node
oc adm policy add-scc-to-user hostnetwork -z router
oc adm router router --replicas=2 --service-account=router
oc adm router --dry-run --service-account=router

install webconsole at ansible-deploy

ansible-playbook -i inventory /usr/share/ansible/openshift-ansible/playbooks/openshift-web-console/config.yml

Good luck !!

danielkucera

danielkucera commented on Aug 8, 2019

@danielkucera

I've set following because I need docker to access internet via proxy

openshift_http_proxy=http://webproxy:8080/
openshift_https_proxy=http://webproxy:8080/
openshift_no_proxy=".rdev.mydomain"

but the result is that in task:

TASK [Approve node certificates when bootstrapping] ************************************************************************************************************************************************************** 
task path: openshift-ansible/playbooks/openshift-node/private/join.yml:43  

I'm getting:

FAILED - RETRYING: Approve node certificates when bootstrapping (29 retries left).

I traced it to following http request which tries to use set proxy which it should not:

[root@en-ose-master1 ~]# oc --v=8 --config=/etc/origin/master/admin.kubeconfig get --raw /api/v1/nodes/en-ose-master1/proxy/healthz
I0808 15:45:03.089015   34129 loader.go:359] Config loaded from file /etc/origin/master/admin.kubeconfig
I0808 15:45:03.089844   34129 round_trippers.go:383] GET https://en-ose-lb1.rdev.<domain>:8443/api/v1/nodes/en-ose-master1/proxy/healthz
I0808 15:45:03.089882   34129 round_trippers.go:390] Request Headers:
I0808 15:45:03.089896   34129 round_trippers.go:393]     User-Agent: oc/v1.11.0+d4cacc0 (linux/amd64) kubernetes/d4cacc0
I0808 15:45:03.089906   34129 round_trippers.go:393]     Accept: application/json, */*
I0808 15:45:03.119559   34129 round_trippers.go:408] Response Status: 503 Service Unavailable in 29 milliseconds
I0808 15:45:03.119603   34129 round_trippers.go:411] Response Headers:
I0808 15:45:03.119617   34129 round_trippers.go:414]     Cache-Control: no-store
I0808 15:45:03.119626   34129 round_trippers.go:414]     Content-Type: text/plain; charset=utf-8
I0808 15:45:03.119635   34129 round_trippers.go:414]     Content-Length: 84
I0808 15:45:03.119644   34129 round_trippers.go:414]     Date: Thu, 08 Aug 2019 13:45:03 GMT
I0808 15:45:03.119688   34129 request.go:897] Response Body: Error: 'Service Unavailable'
Trying to reach: 'https://en-ose-master1:10250/healthz'
I0808 15:45:03.119785   34129 helpers.go:201] server response object: [{
  "metadata": {},
  "status": "Failure",
  "message": "the server is currently unable to handle the request",
  "reason": "ServiceUnavailable",
  "details": {
    "causes": [
      {
        "reason": "UnexpectedServerResponse",
        "message": "Error: 'Service Unavailable'\nTrying to reach: 'https://en-ose-master1:10250/healthz'"
      }
    ]
  },
  "code": 503
}]
F0808 15:45:03.119873   34129 helpers.go:119] Error from server (ServiceUnavailable): the server is currently unable to handle the request

How do I set it to not use proxy? Is this behavior expected/usefull somewhere?

vrutkovs

vrutkovs commented on Aug 8, 2019

@vrutkovs
Member

See https://docs.openshift.com/container-platform/3.11/install_config/http_proxies.html#configuring-hosts-for-proxies-using-ansible

Your node is reporting its hostname is en-ose-master1, but no_proxy is set to use .rdev.mydomain. Seems that you need to ensure it reports en-ose-master1.rdev.mydomain when hostname -f is called on the node

danielkucera

danielkucera commented on Aug 9, 2019

@danielkucera

I have switched to

docker_http_proxy
docker_https_proxy
docker_no_proxy

variables and now it works.

imranrazakhan

imranrazakhan commented on Sep 21, 2019

@imranrazakhan

in my case i have two entries of same server in authorize_key, i removed old one and it works fine

dlewis7444

dlewis7444 commented on Nov 5, 2019

@dlewis7444

This happens to us if there is a failed install later in the deploy_cluster.yaml playbook, due to some other issue. The CSRs are approved initially and, if we re-run the deploy quick enough, it's fine. But if we wait too long the approved CSRs disappear and now the deploy won't get past "Approve node certificates when bootstrapping".

WORKAROUND: edit whichever pb is running this task (in my case, it was openshift-ansible/playbooks/openshift-node/private/join.yml) and add "tags: csr" to the "Approve node..." task. Then re-run the deploy with --skip-tags=csr.

I'm thinking a redeploy of the certificates might also be a workaround.

melanco

melanco commented on Sep 23, 2020

@melanco

I had the exact same issue, and my problem was that I forgot to add "new_nodes" to the OSEv3:children group of the inventory file.

[OSEv3:children]
masters
nodes
etcd
new_nodes

The result was that none of the "openshift_*" variables declared in the inventory file were actually loaded.

You can use this small playbook to check if the variables are loaded correctly :


- hosts: all
become: yes
gather_facts: no
tasks:

- name: "Ansible | List all known variables and facts"
  debug:
    var: hostvars["$NAMEANEW_NODE"]

And then you simply call the playbook using your Openshift Inventory:

ansible-playbook -i $YOUR_OPENSHIFT_INVENTORY $NAME_OF_THE_PLAYBOOK_ABOVE -vvv

Hope this helps someone :)

myasas

myasas commented on Nov 4, 2020

@myasas

I had the same issue running on OpenStack, but fixed by making sure that my configured hostnames matched exactly the inventory file.

Before the change the DNS was pointing correctly to node1.example.com but hostname was something like node1.novalocal. Fixed the hostnames and rebooted the nodes and playbook went through ok.

This answer was the fix for the issue. In simple make sure to validate following points to avoid this issue,

  1. Check the hostnames in your OKD inventory
  2. Check the host names on your VMs (on which you install OKD)
    E.g: hostname -A
  3. In above step 1 and 2, the host names should match. If not you will get this error.
    Note: In hostname -A command, all host names should match with hostnames defined in inventory.

Hope it helps ;)

kavana-14

kavana-14 commented on Jun 26, 2024

@kavana-14

while running the,
ansible-playbook -i inventory.ini openshift-ansible/playbooks/deploy_cluster.yml
I'm getting the below error.
I'm using Fedora OS
ansible version core 2.16.6
python version = 3.12.3
jinja version = 3.1.4

TASK [openshift_node : Install node, clients, and conntrack packages] ******************************************************************
FAILED - RETRYING: [node1.kavana.io]: Install node, clients, and conntrack packages (3 retries left).
FAILED - RETRYING: [node1.kavana.io]: Install node, clients, and conntrack packages (2 retries left).
FAILED - RETRYING: [node1.kavana.io]: Install node, clients, and conntrack packages (1 retries left).
fatal: [node1.kavana.io]: FAILED! => {"attempts": 3, "changed": false, "failures": ["No package origin-3.11 available.", "No package origin-hyperkube-3.11 available.", "No package origin-node-3.11 available.", "No package origin-clients-3.11 available."], "msg": "Failed to install some of the specified packages", "rc": 1, "results": []}

PLAY RECAP *****************************************************************************************************************************
console.kavana.io          : ok=48   changed=2    unreachable=0    failed=0    skipped=61   rescued=0    ignored=0
localhost                  : ok=12   changed=0    unreachable=0    failed=0    skipped=4    rescued=0    ignored=0
node1.kavana.io            : ok=51   changed=5    unreachable=0    failed=1    skipped=77   rescued=0    ignored=0


INSTALLER STATUS ***********************************************************************************************************************
Initialization              : Complete (0:01:23)
Health Check                : Complete (0:00:02)
Node Bootstrap Preparation  : In Progress (0:01:48)
        This phase can be restarted by running: playbooks/openshift-node/bootstrap.yml


Failure summary:


  1. node1.kavana.io
     Configure nodes
     Install node, clients, and conntrack packages
     Failed to install some of the specified packages

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @vrutkovs@danielkucera@ficofer@harryliu123@imranrazakhan

        Issue actions

          Install OpenShift 3.11 get error: Could not find csr for nodes · Issue #11365 · openshift/openshift-ansible