- Overview
- Best Practices
- Runbook Components
- Prepare the Environment
- Quick Start Workflow
- Repository Structure Reference
- Key Configuration Files
- Support and Escalation
- Quick Commands Reference
- Updates and Maintenance
This directory contains comprehensive documentation for the Cisco AI Pods infrastructure. The runbook is organized into focused guides for each component and operational procedure. This is a living document.
- Keep runbooks updated with any customizations
- Document all deviations from standard procedures
- Maintain change logs for configuration modifications
- Regular review and validation of procedures
- Rotate API tokens and credentials regularly
- Use encrypted connections for all management
- Implement proper access controls
- Regular security audits and updates
- Regular health checks and monitoring
- Scheduled maintenance windows
- Automated backup procedures
- Performance monitoring and optimization
π Network Deployment
Network infrastructure deployment guide
- Cisco Nexus switch configuration
- VLAN and routing setup
- Integration with compute and storage
- Monitoring and maintenance
π₯οΈ C800 Configuration Automation
Automation for C845A/C880A/C885A GPU server deployment
- Cisco C8XX M8 GPU server deployment
- Redfish API configuration procedures
- GPU-optimized BIOS settings
- Integration with main AI Pods infrastructure
Automation for Cisco Intersight deployment
- Essential setup steps
- Key configuration files
- Common troubleshooting
- Verification procedures
Complete deployment guide covering all components
- Prerequisites and planning
- End-to-end deployment procedures
- Integration steps
- Post-deployment validation
Automation for Pure Storage deployment
- FlashArray/FlashBlade configuration
- Ansible automation procedures
- Host integration steps
- Performance optimization
Comprehensive troubleshooting reference
- Common issues and resolutions
- Performance troubleshooting
- Recovery procedures
- Escalation processes
Follow the Steps to Prepare the Environment
-
Planning Phase:
- Review Main Runbook - Pre-Deployment Planning
- Review Deployment Execution Order
- Complete network and IP planning
- Gather all required credentials
-
Phase 1 - Foundation Setup (FIRST):
- Follow Network Configuration - Phase 1 for management network
- Establish basic network connectivity before any automation
- Checkpoint: Verify management network connectivity
-
Phase 2 - Infrastructure Deployment:
- Use Intersight Automation for compute infrastructure
- Checkpoint: Validate Intersight deployment before proceeding
-
Phase 3 - C845A/C880A/C885A M8 GPU Servers:
- Follow C845A/C880A/C885A Configuration Guide for additional GPU infrastructure
- Configure using Redfish API for GPU-specific features
- Checkpoint: Validate GPU functionality and DateTime sync
-
Phase 4 - Storage Configuration:
- Follow Pure Storage Configuration for storage
- Checkpoint: Verify storage connectivity
-
Phase 5 - OpenShift Deployment:
- Complete OpenShift Deployment - Phase 5
- Checkpoint: End-to-end connectivity validation
-
Phase 6 - Integration & Validation:
- Run verification procedures from each guide
- Perform end-to-end testing
- Document any customizations
-
Phase 7 - Application Platform:
- Deploy OpenShift/Kubernetes
- Configure container orchestration
-
Identify Component:
- C845/C880/C885 issues β Troubleshooting C885 issues
- Compute issues β Troubleshooting Quick Fixes
- Network issues β Troubleshooting Common Issues
- Pure Storage issues β Troubleshooting
- OpenShift issues β See Individual READMEs
- OpenShift Installatin β Troubleshooting
- OpenShift - Base Operators β Troubleshooting
-
Follow Procedures:
- Use Troubleshooting Guide for comprehensive procedures
- Check component-specific guides for detailed steps
- Escalate following documented procedures
Cisco-AI-Pods
βββ c885/ # Cisco C885A automation
β βββ main.fsai.yaml # C885 configuration data model
βββ intersight/ # Cisco Intersight automation
β βββ global_settings.ezi.yaml # Global Parameters
β βββ main.tf # Main Terraform module
β βββ organizations/ # Organization data model
β βββ policies/ # Policy data model
β βββ pools/ # Pool data model
β βββ provider.tf # Provider Attributes
β βββ templates/ # Templates data model
β βββ variables.tf # Terraform sensitive variables
βββ network/ # Network Device Configurations
β βββ *.txt # Switch configuration templates
βββ openshift/ # Cisco Intersight automation
β βββ global_settings.ezi.yaml # Global Parameters
β βββ main.tf # Main Terraform module
β βββ organizations/ # Organization data model
β βββ policies/ # Policy data model
βββ pure_storage/ # Pure Storage automation
βββ tasks/ # Ansible playbooks
βββ vars/ # Ansible vars
βββ configure_pure_storage_arrays.yaml # Top-level Ansible Playbook
βββ requirements.yaml # Ansible Requirements
- Location:
Cisco-AI-Pods/Intersight/global_settings.ezi.yaml - Purpose: Central configuration for Intersight deployment
- Key Settings: Intersight FQDN, tags, global parameters
- Location:
Cisco-AI-Pods/pure_storage/vars/main.fsai.yaml - Purpose: Ansible inventory for storage automation
- Content: Storage array IPs, credentials, connection details
- Location:
Cisco-AI-Pods/network/*.txt - Purpose: Switch configuration templates
- Usage: Customize and apply to network devices
- Level 1: Infrastructure team β Component-specific guides
- Level 2: Senior engineers β Troubleshooting Guide
- Level 3: Vendor support β Escalation procedures
- Cisco TAC: Intersight, UCS, and network issues
- Pure Storage: Storage array and performance issues
- DevNet Community: Terraform and Ansible community support
terraform init # Initialize working directory
terraform validate # Validate configuration
terraform plan # Preview changes
terraform apply # Apply changes
terraform destroy # Destroy infrastructureansible-galaxy collection install -r requirements.yaml
ansible-playbook main.yml # Run Pure Storage setupFor full environment and dependency setup, see Prepare the Environment.
copy running-config startup-config # Save configuration
ping # Basic connectivity tests
show bgp ipv4 unicast # See the BGP IPv4 unicasting table
show interface brief # Interface status
show ip route # IP Routing Table
show vlan brief # VLAN configuration
show version # Software version
traceroute # Path validation- Review quarterly for accuracy
- Update after major infrastructure changes
- Validate procedures after software updates
- Incorporate lessons learned from incidents
- Follow change management procedures
- Test in non-production first
- Document all changes
- Update runbooks accordingly
Document Information
- Created: June 13, 2025
- Version: 1.2
- Last Updated: July 12, 2025
- Maintained By: Infrastructure Automation Team
For questions or updates to this runbook, please contact the Infrastructure team or submit an issue in the repository.