Skip to content

nsfcac/ai-inference-energy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

458 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

AI Inference Energy

A comprehensive framework for studying energy-efficient GPU frequency selection for AI inference workloads. This framework provides complete command-line interfaces, triple GPU architecture support (A100/V100/H100), intelligent tool fallback, comprehensive profiling tools, and multiple AI model support for conducting systematic DVFS (Dynamic Voltage and Frequency Scaling) research on modern AI workloads.

🎯 Project Overview

As AI workloads grow in complexity and energy demand, static frequency settings on GPUs often result in sub-optimal trade-offs between performance and power consumption. This framework provides tools for conducting comprehensive energy profiling experiments on NVIDIA A100, V100, and H100 GPUs across various AI inference tasks.

✨ Key Features

  • πŸ”§ Complete CLI Interface: Configure all experiments via command-line arguments with --help support
  • 🎯 Triple GPU Support: Native A100, V100, and H100 configurations
  • πŸ› οΈ Multiple Profiling Tools: Support for both DCGMI and nvidia-smi profiling with automatic fallback
  • πŸ“Š Flexible Experiment Modes: DVFS (full frequency sweep) or baseline (single frequency) modes
  • πŸš€ HPC Integration: Ready-to-use SLURM submission scripts for cluster environments
  • ⚑ Intelligent Fallback: Automatic tool selection when DCGMI is unavailable
  • πŸ“ˆ Comprehensive Logging: Error handling and progress tracking
  • πŸ”„ Professional Architecture: Modular, maintainable, and extensible codebase
  • 🐍 Python 3.8+ Compatible: Works with modern cluster environments
  • 🎨 Data Visualizations: Publication-quality scatter plots and analysis charts using actual experimental data
  • πŸ“Š Advanced Analysis Suite: EDP optimization, optimal frequency selection, and production deployment tools with comprehensive visualization framework
  • πŸ” Data Collection: Systematic energy and performance data collection across AI workloads with DCGMI integration
  • 🎨 Modernized AI Models: Latest Stable Diffusion variants (SDXL, Turbo, Lightning) with comprehensive benchmarking and visualization
  • πŸ€– ML Frequency Prediction: Machine learning system for predicting optimal GPU frequencies from short profiling runs

πŸŽ‰ Latest Updates (v2.2.0 - August 2025)

  • 🎨 Enhanced Visualization System: Publication-quality scatter plots with outlier detection and statistical filtering
  • πŸ“Š Advanced Data Processing: Warm run averaging (excluding cold runs) with comprehensive outlier filtering using IQR methods
  • πŸ” Experimental Data Integration: Direct loading of DCGMI profiling data with intelligent data quality improvements
  • πŸš€ EDP Optimization Suite: Complete Energy-Delay Product optimization tools with visual validation
  • πŸ› οΈ Fixed GitHub Actions: Resolved test suite issues with correct module imports and directory structures
  • βœ… Configuration Consolidation: Unified DCGMI monitoring with 25 comprehensive fields
  • βœ… Enhanced Compatibility: Improved PyTorch/torchvision compatibility in AI model environments

πŸ”‹ Profiling Infrastructure Foundation

This framework provides a robust foundation for GPU energy profiling and comprehensive data collection:

Core Infrastructure (Available)

  • Profiling Data Collection: Comprehensive GPU profiling with DCGMI and nvidia-smi across V100, A100, and H100
  • Application Integration: Support for LLaMA, Stable Diffusion, Whisper, Vision Transformer, and LSTM workloads
  • Job Automation: Complete SLURM integration with automated frequency sweeping
  • Data Export: Structured CSV output for analysis and visualization
  • Comprehensive Testing: Full test coverage for profiling infrastructure and AI applications
  • EDP Optimization: Energy-Delay Product optimization with multi-criteria frequency selection and data visualization
  • Production Tools: Ready-to-use deployment interface for optimal frequency settings with visual validation
  • Data Visualization: Publication-quality plots using actual DCGMI profiling data for accurate analysis

Advanced Analysis Suite (Available)

  • Optimal Frequency Selection: Comprehensive algorithms for frequency optimization with visual analysis
  • EDP Analysis Tools: Energy-Delay Product optimization and performance evaluation with data plotting
  • Measured Data Analysis: Hybrid timing extraction and validation frameworks with experimental data visualization
  • Production Deployment: Interface for implementing optimal settings in production with visual confirmation
  • Multi-GPU Comparison: Cross-architecture performance and efficiency analysis with publication-quality charts

Planned Extensions (Future Work)

  • Real-time Optimization: Dynamic frequency adjustment during inference
  • ML-based Prediction: Advanced power prediction models with enhanced accuracy
  • Multi-node Scaling: Distributed profiling across multiple GPU nodes
  • Interactive Dashboard: Web-based visualization and control interface
# Quick profiling example - Available now!
cd sample-collection-scripts
./launch_v2.sh --app-name "StableDiffusion" --profiling-mode baseline
# Results saved to structured CSV files for analysis

# ML frequency prediction - Available now!
cd tools/ml_prediction
python -m tools.ml_prediction.train_baseline \
  --dataset datasets/all_freq.csv \
  --model-out models/rf_predictor.joblib
# Train ML model to predict optimal frequencies from short profiling runs

Supported AI Models & Applications

  • LLaMA: Text generation via transformer-based large language models
  • Stable Diffusion: Modernized latent diffusion model with latest variants (SD v1.x, v2.x, SDXL, Turbo, Lightning) for high-quality image generation
  • Whisper: OpenAI Whisper automatic speech recognition for audio processing energy profiling
  • Vision Transformer (ViT): Transformer-based image classification for computer vision energy profiling
  • LSTM Sentiment Analysis: Binary classification benchmark for consistent profiling
  • Custom Applications: Framework supports any Python-based AI inference workload

Research & Experimental Capabilities

  • πŸ“Š Comprehensive Profiling: GPU power consumption, utilization, temperature, and performance metrics
  • πŸ”„ Frequency Scaling: Support for 61 A100 frequencies (1410-510 MHz) and 117 V100 frequencies (1380-510 MHz)
  • ⚑ Energy Analysis: Detailed power vs performance trade-off analysis across frequency ranges
  • πŸ“ˆ Statistical Rigor: Multiple runs per frequency with configurable parameters for statistical significance
  • πŸ“ Reproducible Research: Standardized output formats and comprehensive experiment documentation

πŸ—οΈ Repository Structure

ai-inference-energy/
β”œβ”€β”€ README.md                            # Project documentation
β”œβ”€β”€ requirements.txt                     # Python dependencies
β”œβ”€β”€ setup.py                             # Package installation
β”œβ”€β”€ config.py                            # Centralized configuration (Python 3.8+ compatible)
β”œβ”€β”€ utils.py                             # Utility functions and helpers
β”‚
β”œβ”€β”€ app-llama/                           # LLaMA inference applications
β”‚   β”œβ”€β”€ README.md                        # LLaMA application documentation
β”‚   └── LlamaViaHF.py                    # LLaMA text generation via Hugging Face
β”‚
β”œβ”€β”€ app-stable-diffusion/                # 🎨 Modernized Stable Diffusion applications
β”‚   β”œβ”€β”€ README.md                        # Comprehensive Stable Diffusion documentation
β”‚   β”œβ”€β”€ StableDiffusionViaHF.py          # **Modernized** image generation with latest models
β”‚   β”œβ”€β”€ scripts/                         # Setup and utility scripts
β”‚   β”‚   └── setup_stable_diffusion.sh    # Complete setup and validation script
β”‚   β”œβ”€β”€ test_stable_diffusion_*.py       # Comprehensive test suites
β”‚   └── validate_stable_diffusion.py     # Quick validation script
β”‚
β”œβ”€β”€ app-whisper/                         # 🎀 Whisper speech recognition applications
β”‚   β”œβ”€β”€ README.md                        # Comprehensive Whisper documentation
β”‚   β”œβ”€β”€ WhisperViaHF.py                  # OpenAI Whisper speech-to-text via Hugging Face
β”‚   β”œβ”€β”€ __init__.py                      # Python package initialization
β”‚   β”œβ”€β”€ setup/                           # Environment setup and configuration
β”‚   β”‚   β”œβ”€β”€ setup_whisper_env.sh         # Automated conda environment setup
β”‚   β”‚   β”œβ”€β”€ requirements.txt             # Python dependencies
β”‚   β”‚   └── whisper-repacss.yml          # REPACSS cluster environment
β”‚   └── tests/                           # Test suite for Whisper implementation
β”‚       └── test_whisper.py              # Comprehensive test suite
β”‚
β”œβ”€β”€ app-vision-transformer/              # πŸ–ΌοΈ Vision Transformer applications
β”‚   β”œβ”€β”€ README.md                        # Comprehensive ViT documentation
β”‚   β”œβ”€β”€ ViTViaHF.py                      # Vision Transformer image classification via Hugging Face
β”‚   β”œβ”€β”€ __init__.py                      # Python package initialization
β”‚   └── setup/                           # Environment setup and configuration
β”‚       β”œβ”€β”€ setup.sh                     # Automated conda environment setup
β”‚       β”œβ”€β”€ requirements.txt             # Python dependencies
β”‚       β”œβ”€β”€ vit-env-repacss.yml          # REPACSS cluster environment
β”‚       └── vit-env-hpcc.yml             # HPCC cluster environment
β”‚
β”œβ”€β”€ app-lstm/                            # LSTM benchmark application
β”‚   β”œβ”€β”€ README.md                        # LSTM benchmark documentation
β”‚   β”œβ”€β”€ lstm.py                          # Sentiment analysis benchmark
β”‚   └── setup/                           # Environment configuration
β”‚       β”œβ”€β”€ lstm-env-hpcc.yml            # HPCC cluster environment
β”‚       β”œβ”€β”€ lstm-env-repacss.yml         # REPACSS cluster environment
β”‚       β”œβ”€β”€ requirements_lstm_repacss.txt            # Python dependencies
β”‚       └── requirements_lstm_repacss_minimal.txt   # Minimal dependencies
β”‚
β”œβ”€β”€ (planned) examples/                  # Usage examples (not in this release)
β”‚
β”œβ”€β”€ tests/                               # πŸ§ͺ Comprehensive test suite
β”‚   β”œβ”€β”€ README.md                        # Test documentation and coverage
β”‚   β”œβ”€β”€ test_integration.py              # Integration and system tests
β”‚   β”œβ”€β”€ test_configuration.py            # Configuration and compatibility tests
β”‚   β”œβ”€β”€ test_hardware_module.py          # Hardware detection tests
β”‚   β”œβ”€β”€ test_utils.py                    # Utility function tests
β”‚   └── test_python_compatibility.sh     # Python compatibility test
β”‚
β”œβ”€β”€ documentation/                       # πŸ“š Essential documentation (streamlined)
β”‚   β”œβ”€β”€ README.md                        # Documentation index and quick reference
β”‚   β”œβ”€β”€ GPU_USAGE_GUIDE.md               # Complete GPU support guide (A100/V100/H100)
β”‚   β”œβ”€β”€ USAGE_EXAMPLES.md                # CLI usage examples and automation
β”‚   └── SUBMIT_JOBS_README.md            # SLURM usage and HPC deployment
β”‚
β”œβ”€β”€ tools/                                             # πŸ› οΈ Advanced analysis and optimization tools
β”‚   β”œβ”€β”€ README.md                                      # Tools documentation and usage guide
β”‚   β”œβ”€β”€ analysis/                                      # EDP optimization and performance analysis
β”‚   β”‚   β”œβ”€β”€ edp_optimizer.py                           # Energy-Delay Product optimization engine
β”‚   β”‚   β”œβ”€β”€ edp_summary_tables.py                      # EDP results summarization and reporting
β”‚   β”‚   β”œβ”€β”€ results/                                   # Analysis outputs and JSON data
β”‚   β”‚   β”‚   β”œβ”€β”€ edp_optimization_results.json          # Primary optimization results
β”‚   β”‚   β”‚   └── *.csv                                  # Detailed analysis tables
β”‚   β”‚   β”œβ”€β”€ visualization/                             # Data visualization and plotting tools
β”‚   β”‚   β”‚   β”œβ”€β”€ visualize_edp_results.py               # 🎨 Experimental data visualization (scatter plots)
β”‚   β”‚   β”‚   β”œβ”€β”€ visualize_edp_summary.py               # πŸ“Š Comprehensive summary analysis charts
β”‚   β”‚   β”‚   β”œβ”€β”€ README.md                              # Complete visualization system documentation
β”‚   β”‚   β”‚   └── edp-plots/                             # 🎨 Generated visualization files (16 total)
β”‚   β”‚   β”‚       β”œβ”€β”€ *_energy_performance_scatter.png   # Individual GPU-workload plots
β”‚   β”‚   β”‚       β”œβ”€β”€ energy_savings_comparison.png      # EDP vs EDΒ²P comparison
β”‚   β”‚   β”‚       β”œβ”€β”€ frequency_optimization_comparison.png  # Frequency analysis
β”‚   β”‚   β”‚       β”œβ”€β”€ performance_impact_analysis.png    # Performance trade-offs
β”‚   β”‚   β”‚       └── comprehensive_summary.png          # 4-panel overview
β”‚   β”‚   └── archived/                                  # Historical analysis tools and reports
β”‚   β”œβ”€β”€ ml_prediction/                                 # πŸ€– Machine Learning Frequency Prediction System
β”‚   β”‚   β”œβ”€β”€ README.md                                  # Comprehensive ML tools documentation
β”‚   β”‚   β”œβ”€β”€ build_labels.py                            # Generate EDP/EDΒ²P optimal labels
β”‚   β”‚   β”œβ”€β”€ build_dataset.py                           # Build training datasets with probe policies
β”‚   β”‚   β”œβ”€β”€ train_baseline.py                          # Baseline RandomForest training
β”‚   β”‚   β”œβ”€β”€ evaluate.py                                # Cross-validation with EDP gap analysis
β”‚   β”‚   β”œβ”€β”€ feature_extractor.py                       # Statistical features and trend analysis
β”‚   β”‚   β”œβ”€β”€ profile_reader.py                          # DCGMI profile parsing and aggregation
β”‚   β”‚   β”œβ”€β”€ datasets/                                  # Generated training datasets
β”‚   β”‚   β”‚   β”œβ”€β”€ all_freq.csv                           # Full frequency sweep dataset
β”‚   β”‚   β”‚   └── max_only.csv                           # Max frequency baseline dataset
β”‚   β”‚   β”œβ”€β”€ models/                                    # Trained ML models
β”‚   β”‚   β”‚   β”œβ”€β”€ random_forest_predictor.py             # RF implementation with frequency snapping
β”‚   β”‚   β”‚   β”œβ”€β”€ rf_all_freq.joblib                     # Trained model (all frequencies)
β”‚   β”‚   β”‚   └── rf_max_only.joblib                     # Trained model (max frequency only)
β”‚   β”‚   β”œβ”€β”€ results/                                   # Feature importance analysis results
β”‚   β”‚   β”‚   β”œβ”€β”€ fi_baseline/                           # Baseline training feature importance
β”‚   β”‚   β”‚   └── fi_eval_gpu_h100/                      # Cross-GPU evaluation results
β”‚   β”‚   └── labels.json                                # Generated optimal frequency labels
β”‚   β”œβ”€β”€ (planned) optimal-frequency/                   # Planned frequency optimization tools (not in this release)
β”‚   β”œβ”€β”€ (planned) deployment/                          # Planned deployment interfaces (not in this release)
β”‚   β”œβ”€β”€ (planned) testing/                             # Planned extra testing tools (not in this release)
β”‚   └── (planned) utilities/                           # Planned general utilities (not in this release)
β”‚
└── sample-collection-scripts/           # πŸš€ Enhanced profiling framework
    β”œβ”€β”€ README.md                        # Profiling framework documentation
    β”œβ”€β”€ launch_v2.sh                     # 🎯 Main experiment orchestration (enhanced CLI)
    β”œβ”€β”€ profile.py                       # DCGMI-based GPU profiler
    β”œβ”€β”€ profile_smi.py                   # nvidia-smi alternative profiler
    β”œβ”€β”€ control.sh                       # DCGMI frequency control
    β”œβ”€β”€ control_smi.sh                   # nvidia-smi frequency control
    β”œβ”€β”€ clean.sh                         # Enhanced workspace cleanup
    β”œβ”€β”€ lstm.py                          # LSTM benchmark application
    β”‚
    β”œβ”€β”€ interactive_gpu.sh               # 🎯 Unified interactive GPU session helper (V100/A100/H100)
    β”‚
    β”œβ”€β”€ submit_job_v100.sh               # 🎯 Unified V100 submission (16 configurations)
    β”œβ”€β”€ submit_job_a100.sh               # 🎯 Unified A100 submission (16 configurations)
    β”œβ”€β”€ submit_job_h100.sh               # 🎯 Unified H100 submission (16 configurations)
    β”‚
    β”œβ”€β”€ submit_job_v100_baseline.sh      # Legacy V100 baseline (redirects to unified)
    β”œβ”€β”€ submit_job_v100_comprehensive.sh # Legacy V100 comprehensive (redirects to unified)
    β”œβ”€β”€ submit_job_v100_custom_app.sh    # Legacy V100 custom app (redirects to unified)
    β”œβ”€β”€ submit_job_a100_baseline.sh      # Legacy A100 baseline (redirects to unified)
    β”œβ”€β”€ submit_job_a100_comprehensive.sh # Legacy A100 comprehensive (redirects to unified)
    β”œβ”€β”€ submit_job_a100_custom_app.sh    # Legacy A100 custom app (redirects to unified)
    β”œβ”€β”€ submit_job_h100_baseline.sh      # Legacy H100 baseline (redirects to unified)
    β”œβ”€β”€ submit_job_h100_comprehensive.sh # Legacy H100 comprehensive (redirects to unified)
    β”œβ”€β”€ submit_job_h100_custom_app.sh    # Legacy H100 custom app (redirects to unified)
    └── submit_job*.sh                   # Additional legacy scripts

πŸš€ Quick Start

Prerequisites

Hardware Requirements

  • NVIDIA GPU with DCGMI support (A100/H100 recommended)
  • Sufficient GPU memory for AI models (8GB+ recommended)
  • CUDA-compatible driver

Software Requirements

  • Python 3.8+ (tested on Python 3.8-3.11)
  • CUDA Toolkit 11.0+
  • NVIDIA DCGMI tools (automatically falls back to nvidia-smi if unavailable)
  • Hugging Face account with model access

Framework Note: This project provides two profiling frameworks:

  • launch_v2.sh - Enhanced framework with modular architecture (recommended)
  • launch_v2.sh - Enhanced framework with modular architecture

HPC Environment (Optional)

  • SLURM workload manager
  • Environment modules (GCC, CUDA, cuDNN)
  • Conda/Miniconda

Installation

  1. Clone the repository

    git clone <repository-url>
    cd ai-inference-energy
  2. Install Python dependencies

    pip install -r requirements.txt
  3. Set up Hugging Face authentication

    huggingface-cli login
    # Follow prompts to enter your HF token
  4. Verify GPU and profiling tool setup

    nvidia-smi                    # Check GPU status
    dcgmi discovery --list        # Verify DCGMI access (optional - will fallback to nvidia-smi)
    
    # Use unified interactive helper for quick setup validation
    cd sample-collection-scripts
    ./interactive_gpu.sh          # Auto-detects GPU type and provides setup guidance
  5. Make scripts executable

    chmod +x sample-collection-scripts/*.sh
    chmod +x sample-collection-scripts/profile.py
    chmod +x app-stable-diffusion/scripts/setup_stable_diffusion.sh

Basic Usage

1. Individual Model Testing

Run LLaMA inference:

cd app-llama
python LlamaViaHF.py

Run Stable Diffusion inference:

cd app-stable-diffusion
python StableDiffusionViaHF.py

Run Whisper speech recognition:

cd app-whisper
python WhisperViaHF.py --benchmark --num-samples 3

2. Power Profiling

Profile a single application:

cd sample-collection-scripts
./profile.py "python ../app-llama/LlamaViaHF.py"

Set specific GPU frequencies:

# Set memory=1215MHz, core=1200MHz
./control.sh 1215 1200

3. Enhanced Full Experiment Suite

Complete CLI-driven experiments:

cd sample-collection-scripts

# Show all available options
./launch_v2.sh --help

# Default A100 DVFS experiment with DCGMI
./launch_v2.sh

# V100 baseline experiment with nvidia-smi fallback
./launch_v2.sh --gpu-type V100 --profiling-mode baseline --profiling-tool nvidia-smi

# Custom application profiling
./launch_v2.sh \
  --app-name "StableDiffusion" \
  --app-executable "../app-stable-diffusion/StableDiffusionViaHF.py" \
  --app-params "--prompt 'A beautiful landscape' --steps 20"

# Quick test configuration
./launch_v2.sh --num-runs 1 --sleep-interval 0

4. HPC Cluster Deployment

Multiple SLURM submission options:

# A100 unified submission (toreador partition) - Edit script first to uncomment desired config
sbatch submit_job_a100.sh

# V100 unified submission (matador partition) - Edit script first to uncomment desired config
sbatch submit_job_v100.sh

# H100 unified submission (h100 partition) - Edit script first to uncomment desired config
sbatch submit_job_h100.sh

Unified Script Features:

  • πŸ“‹ 16+ pre-configured options in each GPU-specific script
  • 🎯 Easy selection: Just uncomment one configuration
  • ⏱️ Timing guidance: Built-in recommendations for SLURM --time parameter
  • πŸ”§ GPU-optimized: Configurations tailored for each GPU architecture
  • πŸ“– Full guide: See sample-collection-scripts/JOB_SCRIPT_GUIDE_V100.md

Legacy Scripts (Deprecated):

# A100 legacy scripts (redirect to unified)
sbatch submit_job_a100_baseline.sh      # β†’ use submit_job_a100.sh config #1
sbatch submit_job_a100_comprehensive.sh # β†’ use submit_job_a100.sh config #8
sbatch submit_job_a100_custom_app.sh    # β†’ use submit_job_a100.sh config #5

# V100 legacy scripts (redirect to unified)
sbatch submit_job_v100_baseline.sh      # β†’ use submit_job_v100.sh config #1
sbatch submit_job_v100_comprehensive.sh # β†’ use submit_job_v100.sh config #8
sbatch submit_job_v100_custom_app.sh    # β†’ use submit_job_v100.sh config #7

# H100 legacy scripts (redirect to unified)
sbatch submit_job_h100_baseline.sh      # β†’ use submit_job_h100.sh config #1
sbatch submit_job_h100_comprehensive.sh # β†’ use submit_job_h100.sh config #4
sbatch submit_job_h100_custom_app.sh    # β†’ use submit_job_h100.sh config #5

# Custom application profiling
sbatch submit_job_custom_app.sh
sbatch submit_job_h100_custom_app.sh

# Comprehensive DVFS study (all frequencies)
sbatch submit_job_comprehensive.sh
sbatch submit_job_v100_comprehensive.sh
sbatch submit_job_h100_comprehensive.sh

5. Advanced Analysis and Optimization

Run EDP (Energy-Delay Product) optimization:

cd tools/analysis

# Optimize frequencies for specific GPU and workload
python edp_optimizer.py --gpu A100 --workload llama

# Generate comprehensive summary tables
python edp_summary_tables.py --input edp_optimization_results.json

# View optimization results
cat edp_optimization_results_summary.csv

Create comprehensive visualizations:

cd tools/analysis/visualization

# Generate scatter plots with experimental data, outlier detection, and warm run averaging
python visualize_edp_results.py

# Features include:
# - One point per frequency (averaged from warm runs, excluding cold runs)
# - Statistical outlier detection using IQR methods
# - Direct loading of DCGMI profiling data
# - Publication-quality plots for 12 GPU-workload combinations

# View generated plots (12 individual scatter plots)
ls edp-plots/*_energy_performance_scatter.png

Note: Optimal-frequency selection and deployment tooling are planned and not included in this release.

6. Profiling Data Analysis

Analyze profiling results:

cd sample-collection-scripts

# Basic analysis with built-in tools
./launch_v2.sh --help  # See analysis options

# View profiling results
ls -la results_*/
head results_*/profiling_*.csv

# Use visualization tools
cd visualization
python plot_metric_vs_time.py --gpu V100 --app LLAMA --metric POWER

πŸ“š For detailed examples, see documentation/USAGE_EXAMPLES.md and documentation/SUBMIT_JOBS_README.md

πŸ”§ Configuration

GPU Frequency Settings

The framework supports comprehensive frequency scaling for all three GPU architectures:

A100 GPU (Toreador Partition)

  • Memory Frequency: 1215 MHz (A100 default)
  • Core Frequencies: 61 different settings from 1410 MHz down to 510 MHz
  • Frequency Control: Via DCGMI interface with nvidia-smi fallback

V100 GPU (Matador Partition)

  • Memory Frequency: 877 MHz (V100 default)
  • Core Frequencies: 117 different settings from 1380 MHz down to 510 MHz
  • Frequency Control: Via nvidia-smi interface

H100 GPU (REPACSS)

  • Memory Frequency: 2619 MHz (H100 maximum)
  • Core Frequencies: 86 different settings from 1785 MHz down to 510 MHz in 15MHz steps
  • Frequency Control: Via DCGMI interface with nvidia-smi fallback
  • Cluster: REPACSS at Texas Tech University (node: rpg-93-9)

Command-Line Interface

The launch_v2.sh script accepts comprehensive command-line arguments for flexible experiment configuration:

./launch_v2.sh [OPTIONS]

Options: --gpu-type TYPE GPU type: A100 or V100 (default: A100) --profiling-tool TOOL Profiling tool: dcgmi or nvidia-smi (default: dcgmi) --profiling-mode MODE Mode: dvfs or baseline (default: dvfs) --num-runs NUM Number of runs per frequency (default: 2) --sleep-interval SEC Sleep between runs in seconds (default: 1) --app-name NAME Application display name (default: LSTM) --app-executable PATH Application executable path (default: lstm) --app-params "PARAMS" Application parameters (default: "") -h, --help Show help and examples


### Experiment Parameters

Key configuration options in `config.py` (Python 3.8+ compatible):

```python
# Profiling settings
DEFAULT_NUM_RUNS = 2              # Runs per frequency
DEFAULT_INTERVAL_MS = 50          # Sampling interval
DCGMI_FIELDS = [52, 50, 155, 160, ...]  # Comprehensive GPU metrics to collect (25 fields)
                                  # βœ… v2.1.0: Consolidated comprehensive field set
                                  # βœ… Includes: device info, power, temps, clocks, utilization, activity metrics

# Model settings
LLAMA_MODEL_NAME = "huggyllama/llama-7b"
STABLE_DIFFUSION_MODEL_NAME = "CompVis/stable-diffusion-v1-4"

# A100 GPU settings (Toreador partition)
A100_MEMORY_FREQ = 1215           # MHz
A100_DEFAULT_CORE_FREQ = 1410     # MHz

# V100 GPU settings (Matador partition)
V100_MEMORY_FREQ = 877            # MHz
V100_DEFAULT_CORE_FREQ = 1380     # MHz

πŸ“Š Output and Results

Data Collection

The framework collects comprehensive GPU metrics during inference:

  • Power consumption (watts)
  • GPU utilization (%)
  • Memory utilization (%)
  • Temperature (Β°C)
  • Clock frequencies (MHz)
  • Execution time (seconds)

Output Files

Results are saved in the results/ directory with structured naming:

results/
β”œβ”€β”€ GA100-dvfs-LSTM-1410-0        # Architecture-Mode-App-Freq-Iteration
β”œβ”€β”€ GA100-dvfs-LSTM-1410-1
β”œβ”€β”€ GA100-dvfs-LSTM-1395-0
└── GA100-dvfs-lstm-perf.csv      # Performance summary

Analysis Scripts

The collected data can be analyzed using standard data science tools:

import pandas as pd
import matplotlib.pyplot as plt

# Load performance data
perf_data = pd.read_csv('results/GA100-dvfs-lstm-perf.csv')

# Plot frequency vs execution time
plt.plot(perf_data['frequency'], perf_data['execution_time'])
plt.xlabel('GPU Frequency (MHz)')
plt.ylabel('Execution Time (s)')
plt.title('LLaMA Inference: Frequency vs Performance')

πŸ› οΈ Advanced Usage

Custom Applications

To add new AI applications to the framework:

  1. Create application script following the pattern:

    # my_app.py
    import sys, os
    sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
    from utils import setup_logging
    
    def main():
        logger = setup_logging()
        # Your AI inference code here
        logger.info("Inference completed")
    
    if __name__ == "__main__":
        main()
  2. Run with launch script:

    ./launch_v2.sh \
      --app-name "MyApp" \
      --app-executable "my_app" \
      --app-params "--model bert-base --batch-size 32"
    
    # For Stable Diffusion (modernized)
    ./launch_v2.sh \
      --app-name "StableDiffusion" \
      --app-executable "../app-stable-diffusion/StableDiffusionViaHF.py" \
      --app-params "--model-variant sdxl --steps 30"

GPU-Specific Configurations

A100 Configuration (Toreador)

./launch_v2.sh \
  --gpu-type A100 \
  --profiling-tool dcgmi \
  --profiling-mode dvfs

V100 Configuration (Matador)

./launch_v2.sh \
  --gpu-type V100 \
  --profiling-tool nvidia-smi \
  --profiling-mode baseline

H100 Configuration (REPACSS)

./launch_v2.sh \
  --gpu-type H100 \
  --profiling-tool dcgmi \
  --profiling-mode dvfs

Profiling Tool Selection & Fallback

The framework supports intelligent profiling tool selection:

# Prefer DCGMI (will fallback to nvidia-smi if unavailable)
./launch_v2.sh --profiling-tool dcgmi

# Force nvidia-smi usage
./launch_v2.sh --profiling-tool nvidia-smi

# Test profiling tool availability
dcgmi discovery --list  # Check DCGMI
nvidia-smi              # Check nvidia-smi

Experiment Automation

Batch Testing Multiple Configurations

#!/bin/bash
# Test script for multiple GPU types and applications

for gpu in A100 V100; do
  for app in "LSTM" "StableDiffusion"; do
    ./launch_v2.sh \
      --gpu-type $gpu \
      --app-name $app \
      --profiling-mode baseline \
      --num-runs 1
  done
done

πŸ” Troubleshooting

Common Issues

GPU Access & Tool Problems

# Check GPU visibility and type
nvidia-smi

# Check DCGMI availability (optional)
dcgmi discovery --list

# Test profiling tool fallback
./launch_v2.sh --profiling-tool dcgmi  # Will auto-fallback to nvidia-smi if needed

# Reset GPU if needed
sudo nvidia-smi --gpu-reset

Python & Environment Issues

# Check Python version (3.8+ required)
python --version

# Test config module compatibility
python -c "import config; print('Config loaded successfully')"

# Check HuggingFace authentication
huggingface-cli whoami

Visualization & Analysis Issues

# Test visualization module imports
cd tools/analysis/visualization
python -c "import visualize_edp_results; print('βœ… Visualization modules working')"

# Check if experimental data exists
ls ../../sample-collection-scripts/results_*/

# Test matplotlib backend (for headless environments)
python -c "import matplotlib; matplotlib.use('Agg'); import matplotlib.pyplot as plt; print('βœ… Matplotlib working')"

SLURM & Partition Issues

# Check available partitions
sinfo

# Check A100 nodes (toreador)
sinfo -p toreador

# Check V100 nodes (matador)
sinfo -p matador

# Test SLURM job submission
sbatch --test-only submit_job.sh

πŸ“š For detailed troubleshooting, see documentation/GPU_USAGE_GUIDE.md troubleshooting sections

Performance Optimization

For Better Profiling Accuracy

  • Ensure stable GPU temperature before experiments
  • Run experiments during low system load
  • Use dedicated GPU nodes when possible
  • Increase sampling interval for longer workloads
  • Use --profiling-mode baseline for quick testing

For Faster Experiments

  • Use --num-runs 1 for quick tests
  • Set --sleep-interval 0 to reduce delays
  • Use --profiling-mode baseline (single frequency)
  • Test with smaller model variants first
  • Use V100 nodes with --gpu-type V100 for availability

πŸ“š Documentation

The framework includes streamlined documentation focused on practical usage:

🎯 Essential Guides

πŸ“‹ Additional Module Documentation

All documentation follows consistent patterns with practical examples, and comprehensive troubleshooting sections.

πŸ“ Citation

If you use this framework in your research, please cite:

@misc{Side:2025:AIEnergy:GitHub,
  title={AI Inference Energy Profiling Framework},
  author={Side, Mert},
  year={2025},
  url={https://github.com/mertside/ai-inference-energy}
}

Happy profiling! βš‘πŸ”¬

About

Exploring how to improve energy efficiency in AI inference by tuning GPU frequency settings. We profile and optimize modern workloads on modern GPUs using DVFS techniques.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors