Is this a new feature, an enhancement, or a change to existing functionality?
New Feature
How would you describe the priority of this feature request
High
Please provide a clear description of problem this feature solves
An upcoming deployment introduces a target requirement of 4,500+ endpoints (compute trays) managed under a single NICo controller domain. To ensure seamless production operations at this density, we need to optimize NICo to handle the increased concurrency in three key areas:
- Ingestion Throughput
- Core provisioning pipelines (including DHCP/PXE allocation, firmware updates, and OS installation)
- Control Plane Efficiency to ensure the controller handles the volume of 4,500 active endpoints efficiently under peak load.
Feature Description
- Use Case 1: Bulk Endpoint Ingestion & Registration
Scenario: A site operator initiates a full-site bring-up or power cycle, causing 4,500+ nodes (compute trays and ancillary devices) to hit the network and check in concurrently.
Success Criteria: 100% of endpoints are successfully discovered, validated, and registered in the database without dropped connections, discovery timeouts, or system deadlocks. The operator does not need to artificially stagger or pace out rack power sequences.
- Use Case 2: Coordinated Fleet-Wide Provisioning Workflows
Scenario: The site operator triggers a synchronized lifecycle operation—such as an OS provisioning cycle, firmware flash, or tenant sanitization—across all 4,500+ managed hosts.
Success Criteria: The parallel provisioning pipelines execute reliably across the entire high-density footprint, completing deterministically within the site's scheduled maintenance window without getting stuck in intermediate lifecycle states or requiring manual retries.
- Use Case 3: API Responsiveness Under Peak Telemetry Load
Scenario: The data center is running at full capacity, with 4,500+ active nodes continuously streaming high-frequency health metrics, heartbeats, and sensor data back to the controller domain.
Success Criteria: The north-bound API and CLI remain fully responsive. External automation and orchestration tools can query the inventory or execute audits instantly without experiencing command lag, API timeouts, or database write-locks.
Describe your ideal solution
No response
Describe any alternatives you have considered
No response
Additional context
No response
Code of Conduct
Is this a new feature, an enhancement, or a change to existing functionality?
New Feature
How would you describe the priority of this feature request
High
Please provide a clear description of problem this feature solves
An upcoming deployment introduces a target requirement of 4,500+ endpoints (compute trays) managed under a single NICo controller domain. To ensure seamless production operations at this density, we need to optimize NICo to handle the increased concurrency in three key areas:
Feature Description
Scenario: A site operator initiates a full-site bring-up or power cycle, causing 4,500+ nodes (compute trays and ancillary devices) to hit the network and check in concurrently.
Success Criteria: 100% of endpoints are successfully discovered, validated, and registered in the database without dropped connections, discovery timeouts, or system deadlocks. The operator does not need to artificially stagger or pace out rack power sequences.
Scenario: The site operator triggers a synchronized lifecycle operation—such as an OS provisioning cycle, firmware flash, or tenant sanitization—across all 4,500+ managed hosts.
Success Criteria: The parallel provisioning pipelines execute reliably across the entire high-density footprint, completing deterministically within the site's scheduled maintenance window without getting stuck in intermediate lifecycle states or requiring manual retries.
Scenario: The data center is running at full capacity, with 4,500+ active nodes continuously streaming high-frequency health metrics, heartbeats, and sensor data back to the controller domain.
Success Criteria: The north-bound API and CLI remain fully responsive. External automation and orchestration tools can query the inventory or execute audits instantly without experiencing command lag, API timeouts, or database write-locks.
Describe your ideal solution
No response
Describe any alternatives you have considered
No response
Additional context
No response
Code of Conduct