Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions modules/nodes-add-new-etcd-member.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
[id="add-new-etcd-member_{context}"]
= Adding the new etcd member

[role="_abstract"]
Finish adding the new control plane node by adding the new etcd member to the cluster.

.Procedure
Expand Down
4 changes: 3 additions & 1 deletion modules/nodes-create-new-control-plane-node.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
[id="create-new-machine_{context}"]
= Creating the new control plane node

[role="_abstract"]
Begin creating the new control plane node by creating a `BareMetalHost` object and node.

.Procedure
Expand Down Expand Up @@ -136,5 +137,6 @@ $ coreos-installer iso customize rhcos-live.86_64.iso \
Replace `<device_path>` with the path to the target device on which the ISO will be generated.

. Boot the new control plane node with the customized {op-system} live ISO.
The node will automatically reboot twice before the pending Certificate Signing Requests (CSRs) appear.

. Approve the Certificate Signing Requests (CSR) to join the new node to the cluster.
. Approve the pending CSRs to join the new node to the cluster.
5 changes: 2 additions & 3 deletions modules/nodes-delete-machine-unhealthy-etcd.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
[id="deleting-machine_{context}"]
= Deleting the machine of the unhealthy etcd member

[role="_abstract"]
Finish removing the failed control plane node by deleting the machine of the unhealthy etcd member.

.Procedure
Expand Down Expand Up @@ -62,9 +63,7 @@ $ oc get machines -n openshift-machine-api -o wide
NAME PHASE TYPE REGION ZONE AGE NODE PROVIDERID STATE
examplecluster-control-plane-0 Running 3h11m openshift-control-plane-0 baremetalhost:///openshift-machine-api/openshift-control-plane-0/da1ebe11-3ff2-41c5-b099-0aa41222964e externally provisioned
examplecluster-control-plane-1 Running 3h11m openshift-control-plane-1 baremetalhost:///openshift-machine-api/openshift-control-plane-1/d9f9acbc-329c-475e-8d81-03b20280a3e1 externally provisioned
examplecluster-control-plane-2 Running 3h11m openshift-control-plane-2 baremetalhost:///openshift-machine-api/openshift-control-plane-2/3354bdac-61d8-410f-be5b-6a395b056135 externally provisioned
examplecluster-compute-0 Running 165m openshift-compute-0 baremetalhost:///openshift-machine-api/openshift-compute-0/3d685b81-7410-4bb3-80ec-13a31858241f provisioned
examplecluster-compute-1 Running 165m openshift-compute-1 baremetalhost:///openshift-machine-api/openshift-compute-1/0fdae6eb-2066-4241-91dc-e7ea72ab13b9 provisioned
examplecluster-control-plane-2 Failed 3h11m openshift-control-plane-2 baremetalhost:///openshift-machine-api/openshift-control-plane-2/3354bdac-61d8-410f-be5b-6a395b056135 externally provisioned
----

. Delete the machine of the unhealthy member by running the following command:
Expand Down
30 changes: 24 additions & 6 deletions modules/nodes-link-node-machine-bmh.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,12 @@
[id="linking-node-machine-bmh_{context}"]
= Linking the node, bare metal host, and machine together

[role="_abstract"]
Continue creating the new control plane node by creating a machine and then linking it with the new `BareMetalHost` object and node.

.Procedure

. Get the `providerID` for control plane nodes by running the following command:
. Get the `providerID` for the replaced node by running the following command:
+
[source,terminal]
----
Expand All @@ -25,7 +26,7 @@ baremetalhost:///openshift-machine-api/master-01/58fb60bd-b2a6-4ff3-a88d-208c33a
baremetalhost:///openshift-machine-api/master-02/dc5a94f3-625b-43f6-ab5a-7cc4fc79f105
----

. Get cluster information for labels by running the following command:
. Get the `cluster-api-cluster` label by running the following command:
+
[source,terminal]
----
Expand All @@ -40,10 +41,11 @@ $ oc get machine -n openshift-machine-api \
NAME PHASE TYPE REGION ZONE AGE CLUSTER-API-CLUSTER
ci-op-jcp3s7wx-ng5sd-master-0 Running 10h ci-op-jcp3s7wx-ng5sd
ci-op-jcp3s7wx-ng5sd-master-1 Running 10h ci-op-jcp3s7wx-ng5sd
ci-op-jcp3s7wx-ng5sd-master-2 Running 10h ci-op-jcp3s7wx-ng5sd
----

. Create a `Machine` object for the new control plane node by creating a yaml file similar to the following:
. Create a `Machine` object for the new control plane node:

.. Create a YAML file named `new-machine.yaml` similar to the following:
+
[source,yaml]
----
Expand Down Expand Up @@ -75,13 +77,18 @@ spec:
name: master-user-data-managed
----
+
--
where:

`<new_control_plane_machine>`:: Specifies the name of the new machine, which can be the same as the previously deleted machine name.
`<cluster_api_cluster>`:: Specifies the `CLUSTER-API-CLUSTER` value for the other control plane machines, shown in the output of the previous step.
`<provider_id>`:: Specifies the `providerID` value of the new bare metal host, shown in the output of an earlier step.
--

.. Apply the YAML file by running the following command:
+
[source,terminal]
----
$ oc apply -f new-machine.yaml
----
+
The following warning is expected:
+
Expand All @@ -100,6 +107,17 @@ $ NEW_NODE_NAME=<new_node_name>
----
+
Replace `<new_node_name>` with the name of the new control plane node.
+
[NOTE]
====
The name of the new node might be different from the name of the node you are replacing.
You can check the name of the new node by running the following command:

[source,terminal]
----
$ oc get nodes
----
====

.. Define the `NEW_MACHINE_NAME` variable by running the following command:
+
Expand Down
3 changes: 2 additions & 1 deletion modules/nodes-remove-unhealthy-etcd-member.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
[id="removing-etcd-member_{context}"]
= Removing the unhealthy etcd member

[role="_abstract"]
Begin removing the failed control plane node by first removing the unhealthy etcd member.

.Procedure
Expand All @@ -20,7 +21,7 @@ $ oc -n openshift-etcd get pods -l k8s-app=etcd -o wide
.Example output
[source,terminal]
----
etcd-openshift-control-plane-0 5/5 Running 11 3h56m 192.168.10.9 openshift-control-plane-0 <none> <none>
etcd-openshift-control-plane-0 5/5 Running 11 3h56m 192.168.10.9 openshift-control-plane-0 <none> <none>
etcd-openshift-control-plane-1 5/5 Running 0 3h54m 192.168.10.10 openshift-control-plane-1 <none> <none>
etcd-openshift-control-plane-2 5/5 Running 0 3h58m 192.168.10.11 openshift-control-plane-2 <none> <none>
----
Expand Down
4 changes: 4 additions & 0 deletions modules/nodes-replace-control-plane-prereqs.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,10 @@
[id="prerequisites_{context}"]
= Prerequisites

[role="_abstract"]
You must meet the following prerequisites to replace a failed bare-metal control plane node using this method.


* You have identified the unhealthy bare metal etcd member.
* You have verified that either the machine is not running or the node is not ready.
* You have access to the cluster as a user with the `cluster-admin` role.
Expand Down
11 changes: 4 additions & 7 deletions modules/nodes-verify-failed-node-deleted.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
[id="verify-machine-deleted_{context}"]
= Verifying that the failed node was deleted

[role="_abstract"]
Before proceeding to create a replacement control plane node, verify that the failed node was successfully deleted.

.Procedure
Expand All @@ -23,8 +24,6 @@ $ oc get machines -n openshift-machine-api -o wide
NAME PHASE TYPE REGION ZONE AGE NODE PROVIDERID STATE
examplecluster-control-plane-0 Running 3h11m openshift-control-plane-0 baremetalhost:///openshift-machine-api/openshift-control-plane-0/da1ebe11-3ff2-41c5-b099-0aa41222964e externally provisioned
examplecluster-control-plane-1 Running 3h11m openshift-control-plane-1 baremetalhost:///openshift-machine-api/openshift-control-plane-1/d9f9acbc-329c-475e-8d81-03b20280a3e1 externally provisioned
examplecluster-compute-0 Running 165m openshift-compute-0 baremetalhost:///openshift-machine-api/openshift-compute-0/3d685b81-7410-4bb3-80ec-13a31858241f provisioned
examplecluster-compute-1 Running 165m openshift-compute-1 baremetalhost:///openshift-machine-api/openshift-compute-1/0fdae6eb-2066-4241-91dc-e7ea72ab13b9 provisioned
----

. Verify that the node has been deleted by running the following command:
Expand All @@ -37,11 +36,9 @@ $ oc get nodes
.Example output
[source,terminal]
----
NAME STATUS ROLES AGE VERSION
openshift-control-plane-0 Ready master 3h24m v1.34.2
openshift-control-plane-1 Ready master 3h24m v1.34.2
openshift-compute-0 Ready worker 176m v1.34.2
openshift-compute-1 Ready worker 176m v1.34.2
NAME STATUS ROLES AGE VERSION
openshift-control-plane-0 Ready master 3h24m v1.34.2
openshift-control-plane-1 Ready master 3h24m v1.34.2
----

. Wait for all of the cluster Operators to complete rolling out changes.
Expand Down
1 change: 1 addition & 0 deletions nodes/nodes/nodes-nodes-replace-control-plane.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ include::_attributes/common-attributes.adoc[]

toc::[]

[role="_abstract"]
If a control plane node on your bare-metal cluster has failed and cannot be recovered, but you installed your cluster without providing baseboard management controller (BMC) credentials, you must take extra steps in order to replace the failed node with a new one.

// Prerequisites
Expand Down