Skip to content

sdk/go,controller: add TopologyInfo deserialization and flex-algo controller support#3515

Open
ben-malbeclabs wants to merge 23 commits intomainfrom
bc/rfc18-pr5
Open

sdk/go,controller: add TopologyInfo deserialization and flex-algo controller support#3515
ben-malbeclabs wants to merge 23 commits intomainfrom
bc/rfc18-pr5

Conversation

@ben-malbeclabs
Copy link
Copy Markdown
Contributor

@ben-malbeclabs ben-malbeclabs commented Apr 10, 2026

RFC-18 flex-algo · PR 5 of 5 · see rfcs/rfc-0018-flex-algo.md
Depends on: #3497 (PR 1) — does not require PRs 2–4; can merge after PR 1
Series: #3497 · #3512 · #3513 · #3514 · #3515

Summary of Changes

  • Adds TopologyInfo account type to the Go SDK (state.go, deserialize.go, client.go), including TopologyType, TopologyConstraint, LinkTopologies/LinkFlags on Link, and IncludeTopologies on Tenant
  • Adds FeaturesConfig YAML support (--features-config flag on controller) gating flex-algo IS-IS and BGP color community stamping; disabled by default
  • When flex-algo is enabled, the controller populates topology data into the state cache, resolves tenant color communities via resolveTenantColors, and emits IS-IS flex-algo node segment and link configuration into the Arista EOS template
  • Updates e2e/compatibility_test.go to set the mandatory upgrade boundary for 20 RFC-18 interface/link operations to before: "0.18.0" (v0.17.0 was released before RFC-18 merged)

Diff Breakdown

Category Files Lines (+/-) Net
Core logic 6 +321 / -6 +315
Scaffolding 2 +75 / -14 +61
Tests 4 +261 / -1 +260
Fixtures 2 +415 / -0 +415

Fixtures are two new EOS golden configs; most of the test weight is in render tests and the features config test. Core logic is the controller server, template, and Go SDK deserialization.

Note on diff size: The full diff vs main is 109 files / +7,101 lines because this branch is stacked on PR 1 (#3497), which has not yet merged. The table above reflects only PR 5's contribution. After PR 1 merges and this branch is rebased, the diff shrinks to the ~14 files shown here.

Key files (click to expand)
  • controlplane/controller/internal/controller/server.go — populates Topologies map in state cache; resolves LinkTopologies pubkeys to topology names and UnicastDrained from LinkFlags; computes TenantTopologyColors string; warns when flex-algo enabled but no topologies found
  • controlplane/controller/internal/controller/templates/tunnel.tmpl — adds IS-IS flex-algo node segment config and BGP color community stamping blocks, gated by FlexAlgoEnabled() and non-empty TenantTopologyColors; guards cleanup commands behind Config != nil to avoid EOS session rollback in e2e
  • controlplane/controller/internal/controller/features_config.go — new: FeaturesConfig struct with flex_algo.enabled, parsed from YAML; loaded via --features-config flag
  • controlplane/controller/internal/controller/models.go — adds LinkTopologies, UnicastDrained, FlexAlgoNodeSegments, TenantTopologyColors to controller model types
  • smartcontract/sdk/go/serviceability/state.go — adds TopologyType, TopologyConstraint, TopologyInfo, LinkTopologies/LinkFlags on Link, IncludeTopologies on Tenant
  • smartcontract/sdk/go/serviceability/client.go — adds ListTopologies to the Go SDK client
  • controlplane/controller/internal/controller/render_test.go — new render tests for flex-algo enabled and disabled paths
  • controlplane/controller/internal/controller/fixtures/base.config.flex-algo.txt — expected EOS config with IS-IS flex-algo segments and BGP color communities

Testing Verification

  • go test ./smartcontract/sdk/go/... ./controlplane/controller/... — all pass including render_flex_algo_enabled_successfully and render_flex_algo_disabled_cleanup_successfully
  • TestE2E_IBRL, TestE2E_IBRL_WithAllocatedAddr, TestE2E_Multicast — all pass; flex-algo cleanup commands are guarded so they don't fire in e2e (where controller runs without --features-config)
  • Backward-compatibility test passes with the updated before: "0.18.0" boundary for RFC-18 interface/link operations

…pport

- Add flex_algo_node_segments field to Interface::V2; add
  deserialize_legacy_v2_interface for reading pre-RFC-18 accounts
- Add MigrateDeviceInterfaces processor: rewrites pre-RFC-18 device
  accounts (no flex_algo bytes) to new V2 layout; idempotent so the
  activator startup sweep can call it unconditionally; accepts
  foundation, device owner, or activator authority as signer
- Wire MigrateDeviceInterfaces into entrypoint and instructions
- Update ActivateLink to pass unicast_default_topology_pda
  unconditionally; processor tags link only when account is initialized
- Update topology backfill/create processors
- Remove V3 discriminant from smartcontract Go SDK; DeserializeInterfaceV2
  now reads flex_algo_node_segments unconditionally (requires migration)
- Register topology commands in Rust SDK
- Add integration tests for MigrateDeviceInterfaces: idempotency,
  authorization (unauthorized/activator/non-signer), legacy account migration
…mmands

FlexAlgo teardown commands (no traffic-engineering, no next-hop resolution
ribs, no router traffic-engineering) were emitted unconditionally whenever
FlexAlgoEnabled=false. In e2e, no --features-config is passed so Config is
nil. Issuing these no-ops on fresh cEOS caused configure session errors which
rolled back the entire session, including the no neighbor command used to
remove disconnected users.

Fix: guard all FlexAlgo cleanup branches with else-if $.Config so they only
fire when a features config was loaded (FlexAlgo was deployed or is being
disabled). When Config is nil, the controller was never configured for
FlexAlgo, so there is nothing to clean up.

Update all render fixtures and the flex-algo-disabled test case (now uses a
non-nil Config with FlexAlgo.Enabled=false) to match.
Config non-nil causes tunnel.tmpl to emit 'no traffic-engineering' cleanup
lines in all non-flex-algo tests, breaking IBRL fixture comparisons.
…llback

When flex-algo is disabled after being enabled, the controller was not
emitting 'no node-segment ipv4 index X flex-algo TOPOLOGY' cleanup lines
on Loopback255. Root cause: FlexAlgoNodeSegments was only populated when
flex-algo was enabled, so the template had nothing to iterate over during
rollback.

Fix: populate FlexAlgoNodeSegments whenever a features config is loaded
(not just when enabled), so the template can emit cleanup lines even
when flex-algo is being disabled.
…enant deserialization

- add missing IncludeTopologies deserialization in DeserializeTenant (root
  cause of community stamping not working)
- fix resolveTenantColors fallback to use case-insensitive comparison
- uppercase all topology name references in tunnel.tmpl that were missing
  ToUpper (node-segment flex-algo, isis flex-algo, traffic-engineering flex-algo)
- update fixtures to match new uppercase output
@ben-malbeclabs ben-malbeclabs marked this pull request as ready for review April 15, 2026 04:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant