Skip to content

Add sled could not complete #9713

@askfongjojo

Description

@askfongjojo

This is on rack2, after a mupdate (to get past a failed reconfigurator-driven update) and blueprint archival. Two new sleds in cubby 2 and 3 were updated to the same version:

root@oxz_switch0:~# pilot host ls
CUBBY IP                        SERIAL      IMAGE
2     fe80::aa40:25ff:fe04:c96  BRM22250001 ci 5eb1337/4e5b80e 2026-01-21 22:22
3     fe80::aa40:25ff:fe04:412  BRM13250012 ci 5eb1337/4e5b80e 2026-01-21 22:22
7     fe80::aa40:25ff:fe04:6d6  BRM27230045 ci 5eb1337/4e5b80e 2026-01-21 22:22
8     fe80::aa40:25ff:fe04:3d5  BRM44220011 ci 5eb1337/4e5b80e 2026-01-21 22:22
9     fe80::aa40:25ff:fe04:357  BRM44220005 ci 5eb1337/4e5b80e 2026-01-21 22:22
10    fe80::aa40:25ff:fe04:3d4  BRM42220009 ci 5eb1337/4e5b80e 2026-01-21 22:22
11    fe80::aa40:25ff:fe04:191  BRM42220006 ci 5eb1337/4e5b80e 2026-01-21 22:22
12    fe80::aa40:25ff:fe04:393  BRM42220057 ci 5eb1337/4e5b80e 2026-01-21 22:22
13    fe80::aa40:25ff:fe04:1d1  BRM42220018 ci 063828b/10bf4ba 2025-10-09 02:10
14    fe80::aa40:25ff:fe04:195  BRM42220051 ci 5eb1337/4e5b80e 2026-01-21 22:22
16    fe80::aa40:25ff:fe04:352  BRM42220014 ci 5eb1337/4e5b80e 2026-01-21 22:22
17    fe80::aa40:25ff:fe04:192  BRM42220017 ci 5eb1337/4e5b80e 2026-01-21 22:22
21    fe80::aa40:25ff:fe04:353  BRM42220031 ci 5eb1337/4e5b80e 2026-01-21 22:22
23    fe80::aa40:25ff:fe04:395  BRM42220016 ci 5eb1337/4e5b80e 2026-01-21 22:22
25    fe80::aa40:25ff:fe04:354  BRM44220010 ci 5eb1337/4e5b80e 2026-01-21 22:22

Component versions before adding sled 2:

root@oxz_switch0:~# omdb nexus update-status
Count of each component type by system version:

                  |18.0.0-0.ci+git5eb13372380 
------------------+---------------------------
RoT bootloader    |15                         
RoT               |15                         
SP                |15                         
Host OS (phase 1) |12                         
Host OS (phase 2) |12                         
Zone              |149   
root@oxz_switch0:~# omdb nexus sleds list-uninitialized
RACK_ID                              CUBBY SERIAL      PART        REVISION 
de608e01-b8e4-4d93-b972-a7dbed36dd22 2     BRM22250001 913-0000023 1        
de608e01-b8e4-4d93-b972-a7dbed36dd22 3     BRM13250012 913-0000023 1        
de608e01-b8e4-4d93-b972-a7dbed36dd22 13    BRM42220018 913-0000019 6 

An attempt to add sled 2 ended with an error

root@oxz_switch0:~# omdb nexus sleds add BRM22250001 913-0000023 -w
Error: adding sled

Caused by:
    0: Communication Error: error sending request for url (http://[fd00:1122:3344:104::56]:12232/sleds/add)
    1: error sending request for url (http://[fd00:1122:3344:104::56]:12232/sleds/add)
    2: operation timed out

The sled still got added to the cluster (the sled is listed in omdb db sleds and gone from list-uninitialized), and also has the component versions set:

root@oxz_switch0:~# omdb nexus update-status
Count of each component type by system version:

                  |18.0.0-0.ci+git5eb13372380 
------------------+---------------------------
RoT bootloader    |16                         
RoT               |16                         
SP                |16                         
Host OS (phase 1) |13                         
Host OS (phase 2) |13                         
Zone              |149

The current blueprint executor status is as follows:

root@oxz_switch0:~# omdb nexus blueprints list 2>/dev/null | tail -5
      2d05edea-be41-47e3-a99f-4604c6d61a9f 06767a6b-f1c3-45ca-bb4f-0317487e7c16 2026-01-22T20:05:23.387Z 
      162afbd6-5a20-446f-abf5-fcc83a5b6b96 2d05edea-be41-47e3-a99f-4604c6d61a9f 2026-01-23T05:58:53.302Z 
      d5d2b2cd-12e8-4854-9faf-391a0db1ae9e 162afbd6-5a20-446f-abf5-fcc83a5b6b96 2026-01-23T05:59:36.847Z 
      0b924bd4-d860-4767-a99d-0703e669a6f0 d5d2b2cd-12e8-4854-9faf-391a0db1ae9e 2026-01-23T05:59:38.121Z 
* yes 94192783-0bf5-4c23-90c1-ba223c040d49 0b924bd4-d860-4767-a99d-0703e669a6f0 2026-01-23T05:59:42.778Z 

root@oxz_switch0:~# omdb nexus background-tasks show blueprint_planner
task: "blueprint_planner"
  configured period: every 1m
  currently executing: no
  last completed activation: iter 1774, triggered by a dependent task completing
    started at 2026-01-23T06:14:27.977Z (23s ago) and ran for 777ms
    plan unchanged from parent 94192783-0bf5-4c23-90c1-ba223c040d49
    note: 249/5000 blueprints in database
planning report:
* zone adds waiting on blockers
* zone adds and updates are blocked:
  - current target release generation (35) is lower than minimum required by blueprint (36)
* zone updates waiting on zone add blockers
* will ensure cockroachdb setting: "22.1"

The sled only has a global zone at this point and I've marked it non-provisionable for now.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions