diff --git a/modules/nodes-add-new-etcd-member.adoc b/modules/nodes-add-new-etcd-member.adoc index 6ef75736a2c0..f5e3632abb94 100644 --- a/modules/nodes-add-new-etcd-member.adoc +++ b/modules/nodes-add-new-etcd-member.adoc @@ -6,6 +6,7 @@ [id="add-new-etcd-member_{context}"] = Adding the new etcd member +[role="_abstract"] Finish adding the new control plane node by adding the new etcd member to the cluster. .Procedure diff --git a/modules/nodes-create-new-control-plane-node.adoc b/modules/nodes-create-new-control-plane-node.adoc index 83f036277dc9..e192b434d26c 100644 --- a/modules/nodes-create-new-control-plane-node.adoc +++ b/modules/nodes-create-new-control-plane-node.adoc @@ -6,6 +6,7 @@ [id="create-new-machine_{context}"] = Creating the new control plane node +[role="_abstract"] Begin creating the new control plane node by creating a `BareMetalHost` object and node. .Procedure @@ -136,5 +137,6 @@ $ coreos-installer iso customize rhcos-live.86_64.iso \ Replace `` with the path to the target device on which the ISO will be generated. . Boot the new control plane node with the customized {op-system} live ISO. +The node will automatically reboot twice before the pending Certificate Signing Requests (CSRs) appear. -. Approve the Certificate Signing Requests (CSR) to join the new node to the cluster. +. Approve the pending CSRs to join the new node to the cluster. diff --git a/modules/nodes-delete-machine-unhealthy-etcd.adoc b/modules/nodes-delete-machine-unhealthy-etcd.adoc index 9426ee9306f5..fdadcfee5ced 100644 --- a/modules/nodes-delete-machine-unhealthy-etcd.adoc +++ b/modules/nodes-delete-machine-unhealthy-etcd.adoc @@ -6,6 +6,7 @@ [id="deleting-machine_{context}"] = Deleting the machine of the unhealthy etcd member +[role="_abstract"] Finish removing the failed control plane node by deleting the machine of the unhealthy etcd member. .Procedure @@ -62,9 +63,7 @@ $ oc get machines -n openshift-machine-api -o wide NAME PHASE TYPE REGION ZONE AGE NODE PROVIDERID STATE examplecluster-control-plane-0 Running 3h11m openshift-control-plane-0 baremetalhost:///openshift-machine-api/openshift-control-plane-0/da1ebe11-3ff2-41c5-b099-0aa41222964e externally provisioned examplecluster-control-plane-1 Running 3h11m openshift-control-plane-1 baremetalhost:///openshift-machine-api/openshift-control-plane-1/d9f9acbc-329c-475e-8d81-03b20280a3e1 externally provisioned -examplecluster-control-plane-2 Running 3h11m openshift-control-plane-2 baremetalhost:///openshift-machine-api/openshift-control-plane-2/3354bdac-61d8-410f-be5b-6a395b056135 externally provisioned -examplecluster-compute-0 Running 165m openshift-compute-0 baremetalhost:///openshift-machine-api/openshift-compute-0/3d685b81-7410-4bb3-80ec-13a31858241f provisioned -examplecluster-compute-1 Running 165m openshift-compute-1 baremetalhost:///openshift-machine-api/openshift-compute-1/0fdae6eb-2066-4241-91dc-e7ea72ab13b9 provisioned +examplecluster-control-plane-2 Failed 3h11m openshift-control-plane-2 baremetalhost:///openshift-machine-api/openshift-control-plane-2/3354bdac-61d8-410f-be5b-6a395b056135 externally provisioned ---- . Delete the machine of the unhealthy member by running the following command: diff --git a/modules/nodes-link-node-machine-bmh.adoc b/modules/nodes-link-node-machine-bmh.adoc index 2cb971dbff0c..e493211b9e0f 100644 --- a/modules/nodes-link-node-machine-bmh.adoc +++ b/modules/nodes-link-node-machine-bmh.adoc @@ -6,11 +6,12 @@ [id="linking-node-machine-bmh_{context}"] = Linking the node, bare metal host, and machine together +[role="_abstract"] Continue creating the new control plane node by creating a machine and then linking it with the new `BareMetalHost` object and node. .Procedure -. Get the `providerID` for control plane nodes by running the following command: +. Get the `providerID` for the replaced node by running the following command: + [source,terminal] ---- @@ -25,7 +26,7 @@ baremetalhost:///openshift-machine-api/master-01/58fb60bd-b2a6-4ff3-a88d-208c33a baremetalhost:///openshift-machine-api/master-02/dc5a94f3-625b-43f6-ab5a-7cc4fc79f105 ---- -. Get cluster information for labels by running the following command: +. Get the `cluster-api-cluster` label by running the following command: + [source,terminal] ---- @@ -40,10 +41,11 @@ $ oc get machine -n openshift-machine-api \ NAME PHASE TYPE REGION ZONE AGE CLUSTER-API-CLUSTER ci-op-jcp3s7wx-ng5sd-master-0 Running 10h ci-op-jcp3s7wx-ng5sd ci-op-jcp3s7wx-ng5sd-master-1 Running 10h ci-op-jcp3s7wx-ng5sd -ci-op-jcp3s7wx-ng5sd-master-2 Running 10h ci-op-jcp3s7wx-ng5sd ---- -. Create a `Machine` object for the new control plane node by creating a yaml file similar to the following: +. Create a `Machine` object for the new control plane node: + +.. Create a YAML file named `new-machine.yaml` similar to the following: + [source,yaml] ---- @@ -75,13 +77,18 @@ spec: name: master-user-data-managed ---- + --- where: ``:: Specifies the name of the new machine, which can be the same as the previously deleted machine name. ``:: Specifies the `CLUSTER-API-CLUSTER` value for the other control plane machines, shown in the output of the previous step. ``:: Specifies the `providerID` value of the new bare metal host, shown in the output of an earlier step. --- + +.. Apply the YAML file by running the following command: ++ +[source,terminal] +---- +$ oc apply -f new-machine.yaml +---- + The following warning is expected: + @@ -100,6 +107,17 @@ $ NEW_NODE_NAME= ---- + Replace `` with the name of the new control plane node. ++ +[NOTE] +==== +The name of the new node might be different from the name of the node you are replacing. +You can check the name of the new node by running the following command: + +[source,terminal] +---- +$ oc get nodes +---- +==== .. Define the `NEW_MACHINE_NAME` variable by running the following command: + diff --git a/modules/nodes-remove-unhealthy-etcd-member.adoc b/modules/nodes-remove-unhealthy-etcd-member.adoc index 8330a09c9f75..b9450bbc16aa 100644 --- a/modules/nodes-remove-unhealthy-etcd-member.adoc +++ b/modules/nodes-remove-unhealthy-etcd-member.adoc @@ -6,6 +6,7 @@ [id="removing-etcd-member_{context}"] = Removing the unhealthy etcd member +[role="_abstract"] Begin removing the failed control plane node by first removing the unhealthy etcd member. .Procedure @@ -20,7 +21,7 @@ $ oc -n openshift-etcd get pods -l k8s-app=etcd -o wide .Example output [source,terminal] ---- -etcd-openshift-control-plane-0 5/5 Running 11 3h56m 192.168.10.9 openshift-control-plane-0 +etcd-openshift-control-plane-0 5/5 Running 11 3h56m 192.168.10.9 openshift-control-plane-0 etcd-openshift-control-plane-1 5/5 Running 0 3h54m 192.168.10.10 openshift-control-plane-1 etcd-openshift-control-plane-2 5/5 Running 0 3h58m 192.168.10.11 openshift-control-plane-2 ---- diff --git a/modules/nodes-replace-control-plane-prereqs.adoc b/modules/nodes-replace-control-plane-prereqs.adoc index 2d2e264e40b2..8be5833b752f 100644 --- a/modules/nodes-replace-control-plane-prereqs.adoc +++ b/modules/nodes-replace-control-plane-prereqs.adoc @@ -6,6 +6,10 @@ [id="prerequisites_{context}"] = Prerequisites +[role="_abstract"] +You must meet the following prerequisites to replace a failed bare-metal control plane node using this method. + + * You have identified the unhealthy bare metal etcd member. * You have verified that either the machine is not running or the node is not ready. * You have access to the cluster as a user with the `cluster-admin` role. diff --git a/modules/nodes-verify-failed-node-deleted.adoc b/modules/nodes-verify-failed-node-deleted.adoc index 6bb564bbb12e..896297cac572 100644 --- a/modules/nodes-verify-failed-node-deleted.adoc +++ b/modules/nodes-verify-failed-node-deleted.adoc @@ -6,6 +6,7 @@ [id="verify-machine-deleted_{context}"] = Verifying that the failed node was deleted +[role="_abstract"] Before proceeding to create a replacement control plane node, verify that the failed node was successfully deleted. .Procedure @@ -23,8 +24,6 @@ $ oc get machines -n openshift-machine-api -o wide NAME PHASE TYPE REGION ZONE AGE NODE PROVIDERID STATE examplecluster-control-plane-0 Running 3h11m openshift-control-plane-0 baremetalhost:///openshift-machine-api/openshift-control-plane-0/da1ebe11-3ff2-41c5-b099-0aa41222964e externally provisioned examplecluster-control-plane-1 Running 3h11m openshift-control-plane-1 baremetalhost:///openshift-machine-api/openshift-control-plane-1/d9f9acbc-329c-475e-8d81-03b20280a3e1 externally provisioned -examplecluster-compute-0 Running 165m openshift-compute-0 baremetalhost:///openshift-machine-api/openshift-compute-0/3d685b81-7410-4bb3-80ec-13a31858241f provisioned -examplecluster-compute-1 Running 165m openshift-compute-1 baremetalhost:///openshift-machine-api/openshift-compute-1/0fdae6eb-2066-4241-91dc-e7ea72ab13b9 provisioned ---- . Verify that the node has been deleted by running the following command: @@ -37,11 +36,9 @@ $ oc get nodes .Example output [source,terminal] ---- -NAME STATUS ROLES AGE VERSION -openshift-control-plane-0 Ready master 3h24m v1.34.2 -openshift-control-plane-1 Ready master 3h24m v1.34.2 -openshift-compute-0 Ready worker 176m v1.34.2 -openshift-compute-1 Ready worker 176m v1.34.2 +NAME STATUS ROLES AGE VERSION +openshift-control-plane-0 Ready master 3h24m v1.34.2 +openshift-control-plane-1 Ready master 3h24m v1.34.2 ---- . Wait for all of the cluster Operators to complete rolling out changes. diff --git a/nodes/nodes/nodes-nodes-replace-control-plane.adoc b/nodes/nodes/nodes-nodes-replace-control-plane.adoc index 8e73c0d76b98..a0f5426a6720 100644 --- a/nodes/nodes/nodes-nodes-replace-control-plane.adoc +++ b/nodes/nodes/nodes-nodes-replace-control-plane.adoc @@ -6,6 +6,7 @@ include::_attributes/common-attributes.adoc[] toc::[] +[role="_abstract"] If a control plane node on your bare-metal cluster has failed and cannot be recovered, but you installed your cluster without providing baseboard management controller (BMC) credentials, you must take extra steps in order to replace the failed node with a new one. // Prerequisites