Skip to content

NS8 → new my migration: end-to-end verification (migration + clean install) #8078

Description

@edospadoni

NS8 → new my migration — end‑to‑end verification checklist

Target repo: NethServer/dev
Parent issue: NethServer/my#84
Cutover PRs: ns8‑core NethServer/ns8-core#1148 feat/backup-my-cutover · ns8‑metrics NethServer/ns8-metrics#75 feat/alerts-my-cutover

Goal

Verify, on a single controlled Nethesis‑Enterprise NS8 cluster, the full migration flow from the old
my to the new my (my.nethesis.it): credential rotation, native telemetry/backup/alerts on the new
my, and confirmation that the NS8 software repo keeps working (it is NOT gated by my /auth).

Pre‑flip verification. These PRs point the functional endpoints (register, collect_url,
/proxy/credentials, alerts) at my-proxy-prod.onrender.com, and this checklist is run before
the DNS flip. Commands use $MY:

MY=https://my-proxy-prod.onrender.com

(The onrender → my.nethesis.it swap happens later, coupled with the DNS flip. NS8 has no /auth
feed dependency — see Step 5 — so there is no fixed‑hostname exception here.)


Scenario A — migration of an existing (already‑subscribed) cluster

(For a clean install that subscribes fresh on the new my, see Scenario B near the end — Steps 3–6 are shared.)

Step 0 — Pre‑check: the cluster already exists on the new my

  1. Note the current (old‑my) subscription id on the leader:
    api-cli run get-subscription | jq .        # system_id, plan, active/expired
  2. Log in to the new my UI ($MY) → Systems, find this cluster.
    • Present, with correct plan, company/customer, subscription state and System ID.

Step 1 — Install the cutover PRs

Image tags follow the branch (feat/xfeat-x): core:feat-backup-my-cutover,
metrics:feat-alerts-my-cutover.

1a. ns8‑core NethServer/ns8-core#1148 — update core (run on the leader and every worker node)

curl -fsSL "https://raw.githubusercontent.com/NethServer/ns8-core/feat/backup-my-cutover/core/install.sh" \
  | bash -s -- ghcr.io/nethserver/core:feat-backup-my-cutover
  • podman images | grep core shows ghcr.io/nethserver/core:feat-backup-my-cutover on each node.

1b. ns8‑metrics NethServer/ns8-metrics#75 — update the metrics module (on the leader)

# confirm the instance id (default: metrics1)
api-cli run list-installed-modules | jq -r '.[]?|select(.name=="metrics")|.instances[]?.id'

api-cli run update-module --data '{
  "module_url":"ghcr.io/nethserver/metrics:feat-alerts-my-cutover",
  "instances":["metrics1"],
  "force":true
}'
  • metrics module now runs the feat-alerts-my-cutover image.
  • write-alert-proxy-envfile re‑ran (module configure); if not, force it via the
    subscription-changed event / restart the alert-proxy service.

⚠️ Confirm the exact update-module invocation against the installed cluster before publishing.


Step 2 — Verify credential rotation (new creds set, legacy preserved)

On the leader, cluster/bin/migrate-to-my rotates via GET $MY/proxy/credentials, preserves the legacy
pair, sets migrated=1 and collect_url. It is called up front by the telemetry helpers.

# trigger a cycle (or wait for the timer)
runagent -m cluster send-heartbeat 2>/dev/null || cluster/bin/send-heartbeat

# inspect the subscription hash on the leader
redis-cli hgetall cluster/subscription        # via the cluster Redis (use runagent if auth is needed)

Expected in cluster/subscription:

  • system_id = NETH-…, auth_token/secret = my_…
  • migrated = 1, collect_url = https://my-proxy-prod.onrender.com/collect/api/systems
  • legacy_system_id / legacy_auth_token = the old pair (preserved for rollback)

Idempotency: re‑run migrate‑to‑my → exits at the marker, no re‑rotation.

  • Second run is a no‑op.

Known bug already fixed in NethServer/ns8-core#1148: migrate-to-my used to export the read‑only redis default
user (→ hset 401); it now inherits the privileged cluster user. Confirm no 401 on write.


Step 3 — Verify heartbeat / inventory / backup write to the new my

runagent -m cluster send-heartbeat ; runagent -m cluster send-inventory
# backup:
cluster/bin/send-cluster-backup 2>&1 | tail -20

On the new my for this cluster:

  • Heartbeat / last‑seen updates.
  • Inventory present and current.
  • Backup stored on the new my — verify: GET $MY/collect/api/systems/backups (auth new creds).
  • facts.migration.from_legacy_system_id set to the old system id (authoritative migration signal).

Step 4 — Verify native alerts (new‑my Mimir alertmanager)

write-alert-proxy-envfile sets MIMIR_URL to the native collect Mimir endpoint (derived from
collect_url) once migrated=1; alert-proxy POSTs there verbatim.

runagent -m metrics1 cat alert-proxy.env | grep -E 'MIMIR_URL|MIMIR_AUTH_USER'
  • MIMIR_URL = …/collect/api/services/mimir/alertmanager/api/v2/alerts (native), not /proxy/alerts.
  • MIMIR_AUTH_USER = the rotated NETH-… id.
  • Fire a test alert and confirm it lands in the new‑my alerting view for this cluster (no gap).

Step 5 — Software repo is NOT my‑gated (confirm it keeps working)

NS8's enterprise repo https://subscription.nethserver.com/distfeed/ is served by NethServer/ns8-porthos
(snapshot.php), authenticated by creds‑present only (no secret check, no call to my; its Traefik
route has no forwardAuth). So rotated creds keep working and there is no /auth broker dependency
for NS8
(unlike NethSecurity). The my dependency for NS8 is the subscription (collect/info),
not the software repo.

# packages still resolve after rotation
podman exec -it <any-module> apk update 2>/dev/null || true   # or the node's dnf/repo check
api-cli run get-subscription | jq '.active, .expiration'
  • Package/update channel still resolves after rotation.
  • Subscription shows active via collect/info.

Step 6 — Subscription UX

  • Subscription view shows plan / company / expiration (no crash on empty expire_date — fixed).
  • Re‑register the same key → 409 system_already_registered (one‑shot by design).
  • Cancel / remove shows the irreversibility warning; the outer flow clears cluster/subscription.

Scenario B — clean install (fresh subscription, never subscribed)

A cluster subscribed directly on the new my — no legacy creds, no rotation. set-subscription
(subscribe_nsent in ns8‑core NethServer/ns8-core#1148) POSTs the pasted token to $MY/backend/api/systems/register,
stores the returned native system_key as system_id, and writes cluster/subscription with
migrated="1" from the start (so migrate-to-my never rotates). Steady‑state path for new enterprise
clusters.

Prereq: a valid registration token for a system created on the new my.

B1 — Install PRs + subscribe

B2 — Verify native subscription (no rotation, no legacy)

redis-cli hgetall cluster/subscription        # via the cluster Redis
  • provider=nsent, system_id = NETH-…, auth_token = the pasted my_… token
  • migrated = 1 (set by set-subscription, not by migrate-to-my)
  • collect_url = https://my-proxy-prod.onrender.com/collect/api/systems
  • No legacy_system_id / legacy_auth_token (nothing to preserve — correct)
  • migrate-to-my is a no‑op

B3–B6 — same as Scenario A

Heartbeat/inventory/backup (Step 3), native alerts (Step 4), software repo + subscription active (Step 5),
subscription UX (Step 6) — verify exactly as in Scenario A. Only difference:
facts.migration.from_legacy_system_id is NOT set on the new my (no legacy system) — expected.


Rollback (single‑cluster blast radius)

Restore the preserved legacy pair in cluster/subscription and clear the flag:

# on the leader, via the cluster Redis:
redis-cli <<'EOF'
HSET cluster/subscription system_id "<legacy_system_id>" auth_token "<legacy_auth_token>" migrated "0"
HDEL cluster/subscription collect_url
EOF
# re-run the metrics configure + restart alert-proxy

The cluster returns to legacy heartbeat + /proxy/* dual‑send (translated by the my‑ent proxy). No data loss.


Summary checklist

  • Cluster present on the new my (Step 0)
  • core feat(migration): move to new my ns8-core#1148 + metrics feat: switch alerts to native my collect after cutover ns8-metrics#75 installed (Step 1)
  • Creds rotated NETH-…/my_…, migrated=1, legacy preserved, idempotent, no redis‑401 (Step 2)
  • Heartbeat + inventory + backup native on new my; from_legacy_system_id set (Step 3)
  • Native alerts on new‑my Mimir, no gap (Step 4)
  • Software repo keeps working (creds‑present, not my‑gated); subscription active (Step 5)
  • Subscription UX incl. one‑shot re‑register (Step 6)
  • Rollback path validated (optional)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    Status
    ToDo

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions