A comprehensive framework for studying energy-efficient GPU frequency selection for AI inference workloads. This framework provides complete command-line interfaces, triple GPU architecture support (A100/V100/H100), intelligent tool fallback, comprehensive profiling tools, and multiple AI model support for conducting systematic DVFS (Dynamic Voltage and Frequency Scaling) research on modern AI workloads.
As AI workloads grow in complexity and energy demand, static frequency settings on GPUs often result in sub-optimal trade-offs between performance and power consumption. This framework provides tools for conducting comprehensive energy profiling experiments on NVIDIA A100, V100, and H100 GPUs across various AI inference tasks.
- π§ Complete CLI Interface: Configure all experiments via command-line arguments with --help support
- π― Triple GPU Support: Native A100, V100, and H100 configurations
- π οΈ Multiple Profiling Tools: Support for both DCGMI and nvidia-smi profiling with automatic fallback
- π Flexible Experiment Modes: DVFS (full frequency sweep) or baseline (single frequency) modes
- π HPC Integration: Ready-to-use SLURM submission scripts for cluster environments
- β‘ Intelligent Fallback: Automatic tool selection when DCGMI is unavailable
- π Comprehensive Logging: Error handling and progress tracking
- π Professional Architecture: Modular, maintainable, and extensible codebase
- π Python 3.8+ Compatible: Works with modern cluster environments
- π¨ Data Visualizations: Publication-quality scatter plots and analysis charts using actual experimental data
- π Advanced Analysis Suite: EDP optimization, optimal frequency selection, and production deployment tools with comprehensive visualization framework
- π Data Collection: Systematic energy and performance data collection across AI workloads with DCGMI integration
- π¨ Modernized AI Models: Latest Stable Diffusion variants (SDXL, Turbo, Lightning) with comprehensive benchmarking and visualization
- π€ ML Frequency Prediction: Machine learning system for predicting optimal GPU frequencies from short profiling runs
- π¨ Enhanced Visualization System: Publication-quality scatter plots with outlier detection and statistical filtering
- π Advanced Data Processing: Warm run averaging (excluding cold runs) with comprehensive outlier filtering using IQR methods
- π Experimental Data Integration: Direct loading of DCGMI profiling data with intelligent data quality improvements
- π EDP Optimization Suite: Complete Energy-Delay Product optimization tools with visual validation
- π οΈ Fixed GitHub Actions: Resolved test suite issues with correct module imports and directory structures
- β Configuration Consolidation: Unified DCGMI monitoring with 25 comprehensive fields
- β Enhanced Compatibility: Improved PyTorch/torchvision compatibility in AI model environments
This framework provides a robust foundation for GPU energy profiling and comprehensive data collection:
- Profiling Data Collection: Comprehensive GPU profiling with DCGMI and nvidia-smi across V100, A100, and H100
- Application Integration: Support for LLaMA, Stable Diffusion, Whisper, Vision Transformer, and LSTM workloads
- Job Automation: Complete SLURM integration with automated frequency sweeping
- Data Export: Structured CSV output for analysis and visualization
- Comprehensive Testing: Full test coverage for profiling infrastructure and AI applications
- EDP Optimization: Energy-Delay Product optimization with multi-criteria frequency selection and data visualization
- Production Tools: Ready-to-use deployment interface for optimal frequency settings with visual validation
- Data Visualization: Publication-quality plots using actual DCGMI profiling data for accurate analysis
- Optimal Frequency Selection: Comprehensive algorithms for frequency optimization with visual analysis
- EDP Analysis Tools: Energy-Delay Product optimization and performance evaluation with data plotting
- Measured Data Analysis: Hybrid timing extraction and validation frameworks with experimental data visualization
- Production Deployment: Interface for implementing optimal settings in production with visual confirmation
- Multi-GPU Comparison: Cross-architecture performance and efficiency analysis with publication-quality charts
- Real-time Optimization: Dynamic frequency adjustment during inference
- ML-based Prediction: Advanced power prediction models with enhanced accuracy
- Multi-node Scaling: Distributed profiling across multiple GPU nodes
- Interactive Dashboard: Web-based visualization and control interface
# Quick profiling example - Available now!
cd sample-collection-scripts
./launch_v2.sh --app-name "StableDiffusion" --profiling-mode baseline
# Results saved to structured CSV files for analysis
# ML frequency prediction - Available now!
cd tools/ml_prediction
python -m tools.ml_prediction.train_baseline \
--dataset datasets/all_freq.csv \
--model-out models/rf_predictor.joblib
# Train ML model to predict optimal frequencies from short profiling runs- LLaMA: Text generation via transformer-based large language models
- Stable Diffusion: Modernized latent diffusion model with latest variants (SD v1.x, v2.x, SDXL, Turbo, Lightning) for high-quality image generation
- Whisper: OpenAI Whisper automatic speech recognition for audio processing energy profiling
- Vision Transformer (ViT): Transformer-based image classification for computer vision energy profiling
- LSTM Sentiment Analysis: Binary classification benchmark for consistent profiling
- Custom Applications: Framework supports any Python-based AI inference workload
- π Comprehensive Profiling: GPU power consumption, utilization, temperature, and performance metrics
- π Frequency Scaling: Support for 61 A100 frequencies (1410-510 MHz) and 117 V100 frequencies (1380-510 MHz)
- β‘ Energy Analysis: Detailed power vs performance trade-off analysis across frequency ranges
- π Statistical Rigor: Multiple runs per frequency with configurable parameters for statistical significance
- π Reproducible Research: Standardized output formats and comprehensive experiment documentation
ai-inference-energy/
βββ README.md # Project documentation
βββ requirements.txt # Python dependencies
βββ setup.py # Package installation
βββ config.py # Centralized configuration (Python 3.8+ compatible)
βββ utils.py # Utility functions and helpers
β
βββ app-llama/ # LLaMA inference applications
β βββ README.md # LLaMA application documentation
β βββ LlamaViaHF.py # LLaMA text generation via Hugging Face
β
βββ app-stable-diffusion/ # π¨ Modernized Stable Diffusion applications
β βββ README.md # Comprehensive Stable Diffusion documentation
β βββ StableDiffusionViaHF.py # **Modernized** image generation with latest models
β βββ scripts/ # Setup and utility scripts
β β βββ setup_stable_diffusion.sh # Complete setup and validation script
β βββ test_stable_diffusion_*.py # Comprehensive test suites
β βββ validate_stable_diffusion.py # Quick validation script
β
βββ app-whisper/ # π€ Whisper speech recognition applications
β βββ README.md # Comprehensive Whisper documentation
β βββ WhisperViaHF.py # OpenAI Whisper speech-to-text via Hugging Face
β βββ __init__.py # Python package initialization
β βββ setup/ # Environment setup and configuration
β β βββ setup_whisper_env.sh # Automated conda environment setup
β β βββ requirements.txt # Python dependencies
β β βββ whisper-repacss.yml # REPACSS cluster environment
β βββ tests/ # Test suite for Whisper implementation
β βββ test_whisper.py # Comprehensive test suite
β
βββ app-vision-transformer/ # πΌοΈ Vision Transformer applications
β βββ README.md # Comprehensive ViT documentation
β βββ ViTViaHF.py # Vision Transformer image classification via Hugging Face
β βββ __init__.py # Python package initialization
β βββ setup/ # Environment setup and configuration
β βββ setup.sh # Automated conda environment setup
β βββ requirements.txt # Python dependencies
β βββ vit-env-repacss.yml # REPACSS cluster environment
β βββ vit-env-hpcc.yml # HPCC cluster environment
β
βββ app-lstm/ # LSTM benchmark application
β βββ README.md # LSTM benchmark documentation
β βββ lstm.py # Sentiment analysis benchmark
β βββ setup/ # Environment configuration
β βββ lstm-env-hpcc.yml # HPCC cluster environment
β βββ lstm-env-repacss.yml # REPACSS cluster environment
β βββ requirements_lstm_repacss.txt # Python dependencies
β βββ requirements_lstm_repacss_minimal.txt # Minimal dependencies
β
βββ (planned) examples/ # Usage examples (not in this release)
β
βββ tests/ # π§ͺ Comprehensive test suite
β βββ README.md # Test documentation and coverage
β βββ test_integration.py # Integration and system tests
β βββ test_configuration.py # Configuration and compatibility tests
β βββ test_hardware_module.py # Hardware detection tests
β βββ test_utils.py # Utility function tests
β βββ test_python_compatibility.sh # Python compatibility test
β
βββ documentation/ # π Essential documentation (streamlined)
β βββ README.md # Documentation index and quick reference
β βββ GPU_USAGE_GUIDE.md # Complete GPU support guide (A100/V100/H100)
β βββ USAGE_EXAMPLES.md # CLI usage examples and automation
β βββ SUBMIT_JOBS_README.md # SLURM usage and HPC deployment
β
βββ tools/ # π οΈ Advanced analysis and optimization tools
β βββ README.md # Tools documentation and usage guide
β βββ analysis/ # EDP optimization and performance analysis
β β βββ edp_optimizer.py # Energy-Delay Product optimization engine
β β βββ edp_summary_tables.py # EDP results summarization and reporting
β β βββ results/ # Analysis outputs and JSON data
β β β βββ edp_optimization_results.json # Primary optimization results
β β β βββ *.csv # Detailed analysis tables
β β βββ visualization/ # Data visualization and plotting tools
β β β βββ visualize_edp_results.py # π¨ Experimental data visualization (scatter plots)
β β β βββ visualize_edp_summary.py # π Comprehensive summary analysis charts
β β β βββ README.md # Complete visualization system documentation
β β β βββ edp-plots/ # π¨ Generated visualization files (16 total)
β β β βββ *_energy_performance_scatter.png # Individual GPU-workload plots
β β β βββ energy_savings_comparison.png # EDP vs EDΒ²P comparison
β β β βββ frequency_optimization_comparison.png # Frequency analysis
β β β βββ performance_impact_analysis.png # Performance trade-offs
β β β βββ comprehensive_summary.png # 4-panel overview
β β βββ archived/ # Historical analysis tools and reports
β βββ ml_prediction/ # π€ Machine Learning Frequency Prediction System
β β βββ README.md # Comprehensive ML tools documentation
β β βββ build_labels.py # Generate EDP/EDΒ²P optimal labels
β β βββ build_dataset.py # Build training datasets with probe policies
β β βββ train_baseline.py # Baseline RandomForest training
β β βββ evaluate.py # Cross-validation with EDP gap analysis
β β βββ feature_extractor.py # Statistical features and trend analysis
β β βββ profile_reader.py # DCGMI profile parsing and aggregation
β β βββ datasets/ # Generated training datasets
β β β βββ all_freq.csv # Full frequency sweep dataset
β β β βββ max_only.csv # Max frequency baseline dataset
β β βββ models/ # Trained ML models
β β β βββ random_forest_predictor.py # RF implementation with frequency snapping
β β β βββ rf_all_freq.joblib # Trained model (all frequencies)
β β β βββ rf_max_only.joblib # Trained model (max frequency only)
β β βββ results/ # Feature importance analysis results
β β β βββ fi_baseline/ # Baseline training feature importance
β β β βββ fi_eval_gpu_h100/ # Cross-GPU evaluation results
β β βββ labels.json # Generated optimal frequency labels
β βββ (planned) optimal-frequency/ # Planned frequency optimization tools (not in this release)
β βββ (planned) deployment/ # Planned deployment interfaces (not in this release)
β βββ (planned) testing/ # Planned extra testing tools (not in this release)
β βββ (planned) utilities/ # Planned general utilities (not in this release)
β
βββ sample-collection-scripts/ # π Enhanced profiling framework
βββ README.md # Profiling framework documentation
βββ launch_v2.sh # π― Main experiment orchestration (enhanced CLI)
βββ profile.py # DCGMI-based GPU profiler
βββ profile_smi.py # nvidia-smi alternative profiler
βββ control.sh # DCGMI frequency control
βββ control_smi.sh # nvidia-smi frequency control
βββ clean.sh # Enhanced workspace cleanup
βββ lstm.py # LSTM benchmark application
β
βββ interactive_gpu.sh # π― Unified interactive GPU session helper (V100/A100/H100)
β
βββ submit_job_v100.sh # π― Unified V100 submission (16 configurations)
βββ submit_job_a100.sh # π― Unified A100 submission (16 configurations)
βββ submit_job_h100.sh # π― Unified H100 submission (16 configurations)
β
βββ submit_job_v100_baseline.sh # Legacy V100 baseline (redirects to unified)
βββ submit_job_v100_comprehensive.sh # Legacy V100 comprehensive (redirects to unified)
βββ submit_job_v100_custom_app.sh # Legacy V100 custom app (redirects to unified)
βββ submit_job_a100_baseline.sh # Legacy A100 baseline (redirects to unified)
βββ submit_job_a100_comprehensive.sh # Legacy A100 comprehensive (redirects to unified)
βββ submit_job_a100_custom_app.sh # Legacy A100 custom app (redirects to unified)
βββ submit_job_h100_baseline.sh # Legacy H100 baseline (redirects to unified)
βββ submit_job_h100_comprehensive.sh # Legacy H100 comprehensive (redirects to unified)
βββ submit_job_h100_custom_app.sh # Legacy H100 custom app (redirects to unified)
βββ submit_job*.sh # Additional legacy scripts
- NVIDIA GPU with DCGMI support (A100/H100 recommended)
- Sufficient GPU memory for AI models (8GB+ recommended)
- CUDA-compatible driver
- Python 3.8+ (tested on Python 3.8-3.11)
- CUDA Toolkit 11.0+
- NVIDIA DCGMI tools (automatically falls back to nvidia-smi if unavailable)
- Hugging Face account with model access
Framework Note: This project provides two profiling frameworks:
launch_v2.sh- Enhanced framework with modular architecture (recommended)launch_v2.sh- Enhanced framework with modular architecture
- SLURM workload manager
- Environment modules (GCC, CUDA, cuDNN)
- Conda/Miniconda
-
Clone the repository
git clone <repository-url> cd ai-inference-energy
-
Install Python dependencies
pip install -r requirements.txt
-
Set up Hugging Face authentication
huggingface-cli login # Follow prompts to enter your HF token -
Verify GPU and profiling tool setup
nvidia-smi # Check GPU status dcgmi discovery --list # Verify DCGMI access (optional - will fallback to nvidia-smi) # Use unified interactive helper for quick setup validation cd sample-collection-scripts ./interactive_gpu.sh # Auto-detects GPU type and provides setup guidance
-
Make scripts executable
chmod +x sample-collection-scripts/*.sh chmod +x sample-collection-scripts/profile.py chmod +x app-stable-diffusion/scripts/setup_stable_diffusion.sh
Run LLaMA inference:
cd app-llama
python LlamaViaHF.pyRun Stable Diffusion inference:
cd app-stable-diffusion
python StableDiffusionViaHF.pyRun Whisper speech recognition:
cd app-whisper
python WhisperViaHF.py --benchmark --num-samples 3Profile a single application:
cd sample-collection-scripts
./profile.py "python ../app-llama/LlamaViaHF.py"Set specific GPU frequencies:
# Set memory=1215MHz, core=1200MHz
./control.sh 1215 1200Complete CLI-driven experiments:
cd sample-collection-scripts
# Show all available options
./launch_v2.sh --help
# Default A100 DVFS experiment with DCGMI
./launch_v2.sh
# V100 baseline experiment with nvidia-smi fallback
./launch_v2.sh --gpu-type V100 --profiling-mode baseline --profiling-tool nvidia-smi
# Custom application profiling
./launch_v2.sh \
--app-name "StableDiffusion" \
--app-executable "../app-stable-diffusion/StableDiffusionViaHF.py" \
--app-params "--prompt 'A beautiful landscape' --steps 20"
# Quick test configuration
./launch_v2.sh --num-runs 1 --sleep-interval 0Multiple SLURM submission options:
# A100 unified submission (toreador partition) - Edit script first to uncomment desired config
sbatch submit_job_a100.sh
# V100 unified submission (matador partition) - Edit script first to uncomment desired config
sbatch submit_job_v100.sh
# H100 unified submission (h100 partition) - Edit script first to uncomment desired config
sbatch submit_job_h100.shUnified Script Features:
- π 16+ pre-configured options in each GPU-specific script
- π― Easy selection: Just uncomment one configuration
- β±οΈ Timing guidance: Built-in recommendations for SLURM
--timeparameter - π§ GPU-optimized: Configurations tailored for each GPU architecture
- π Full guide: See
sample-collection-scripts/JOB_SCRIPT_GUIDE_V100.md
Legacy Scripts (Deprecated):
# A100 legacy scripts (redirect to unified)
sbatch submit_job_a100_baseline.sh # β use submit_job_a100.sh config #1
sbatch submit_job_a100_comprehensive.sh # β use submit_job_a100.sh config #8
sbatch submit_job_a100_custom_app.sh # β use submit_job_a100.sh config #5
# V100 legacy scripts (redirect to unified)
sbatch submit_job_v100_baseline.sh # β use submit_job_v100.sh config #1
sbatch submit_job_v100_comprehensive.sh # β use submit_job_v100.sh config #8
sbatch submit_job_v100_custom_app.sh # β use submit_job_v100.sh config #7
# H100 legacy scripts (redirect to unified)
sbatch submit_job_h100_baseline.sh # β use submit_job_h100.sh config #1
sbatch submit_job_h100_comprehensive.sh # β use submit_job_h100.sh config #4
sbatch submit_job_h100_custom_app.sh # β use submit_job_h100.sh config #5
# Custom application profiling
sbatch submit_job_custom_app.sh
sbatch submit_job_h100_custom_app.sh
# Comprehensive DVFS study (all frequencies)
sbatch submit_job_comprehensive.sh
sbatch submit_job_v100_comprehensive.sh
sbatch submit_job_h100_comprehensive.shRun EDP (Energy-Delay Product) optimization:
cd tools/analysis
# Optimize frequencies for specific GPU and workload
python edp_optimizer.py --gpu A100 --workload llama
# Generate comprehensive summary tables
python edp_summary_tables.py --input edp_optimization_results.json
# View optimization results
cat edp_optimization_results_summary.csvCreate comprehensive visualizations:
cd tools/analysis/visualization
# Generate scatter plots with experimental data, outlier detection, and warm run averaging
python visualize_edp_results.py
# Features include:
# - One point per frequency (averaged from warm runs, excluding cold runs)
# - Statistical outlier detection using IQR methods
# - Direct loading of DCGMI profiling data
# - Publication-quality plots for 12 GPU-workload combinations
# View generated plots (12 individual scatter plots)
ls edp-plots/*_energy_performance_scatter.pngNote: Optimal-frequency selection and deployment tooling are planned and not included in this release.
Analyze profiling results:
cd sample-collection-scripts
# Basic analysis with built-in tools
./launch_v2.sh --help # See analysis options
# View profiling results
ls -la results_*/
head results_*/profiling_*.csv
# Use visualization tools
cd visualization
python plot_metric_vs_time.py --gpu V100 --app LLAMA --metric POWERπ For detailed examples, see documentation/USAGE_EXAMPLES.md and documentation/SUBMIT_JOBS_README.md
The framework supports comprehensive frequency scaling for all three GPU architectures:
- Memory Frequency: 1215 MHz (A100 default)
- Core Frequencies: 61 different settings from 1410 MHz down to 510 MHz
- Frequency Control: Via DCGMI interface with nvidia-smi fallback
- Memory Frequency: 877 MHz (V100 default)
- Core Frequencies: 117 different settings from 1380 MHz down to 510 MHz
- Frequency Control: Via nvidia-smi interface
- Memory Frequency: 2619 MHz (H100 maximum)
- Core Frequencies: 86 different settings from 1785 MHz down to 510 MHz in 15MHz steps
- Frequency Control: Via DCGMI interface with nvidia-smi fallback
- Cluster: REPACSS at Texas Tech University (node: rpg-93-9)
The launch_v2.sh script accepts comprehensive command-line arguments for flexible experiment configuration:
./launch_v2.sh [OPTIONS]Options: --gpu-type TYPE GPU type: A100 or V100 (default: A100) --profiling-tool TOOL Profiling tool: dcgmi or nvidia-smi (default: dcgmi) --profiling-mode MODE Mode: dvfs or baseline (default: dvfs) --num-runs NUM Number of runs per frequency (default: 2) --sleep-interval SEC Sleep between runs in seconds (default: 1) --app-name NAME Application display name (default: LSTM) --app-executable PATH Application executable path (default: lstm) --app-params "PARAMS" Application parameters (default: "") -h, --help Show help and examples
### Experiment Parameters
Key configuration options in `config.py` (Python 3.8+ compatible):
```python
# Profiling settings
DEFAULT_NUM_RUNS = 2 # Runs per frequency
DEFAULT_INTERVAL_MS = 50 # Sampling interval
DCGMI_FIELDS = [52, 50, 155, 160, ...] # Comprehensive GPU metrics to collect (25 fields)
# β
v2.1.0: Consolidated comprehensive field set
# β
Includes: device info, power, temps, clocks, utilization, activity metrics
# Model settings
LLAMA_MODEL_NAME = "huggyllama/llama-7b"
STABLE_DIFFUSION_MODEL_NAME = "CompVis/stable-diffusion-v1-4"
# A100 GPU settings (Toreador partition)
A100_MEMORY_FREQ = 1215 # MHz
A100_DEFAULT_CORE_FREQ = 1410 # MHz
# V100 GPU settings (Matador partition)
V100_MEMORY_FREQ = 877 # MHz
V100_DEFAULT_CORE_FREQ = 1380 # MHz
The framework collects comprehensive GPU metrics during inference:
- Power consumption (watts)
- GPU utilization (%)
- Memory utilization (%)
- Temperature (Β°C)
- Clock frequencies (MHz)
- Execution time (seconds)
Results are saved in the results/ directory with structured naming:
results/
βββ GA100-dvfs-LSTM-1410-0 # Architecture-Mode-App-Freq-Iteration
βββ GA100-dvfs-LSTM-1410-1
βββ GA100-dvfs-LSTM-1395-0
βββ GA100-dvfs-lstm-perf.csv # Performance summary
The collected data can be analyzed using standard data science tools:
import pandas as pd
import matplotlib.pyplot as plt
# Load performance data
perf_data = pd.read_csv('results/GA100-dvfs-lstm-perf.csv')
# Plot frequency vs execution time
plt.plot(perf_data['frequency'], perf_data['execution_time'])
plt.xlabel('GPU Frequency (MHz)')
plt.ylabel('Execution Time (s)')
plt.title('LLaMA Inference: Frequency vs Performance')To add new AI applications to the framework:
-
Create application script following the pattern:
# my_app.py import sys, os sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) from utils import setup_logging def main(): logger = setup_logging() # Your AI inference code here logger.info("Inference completed") if __name__ == "__main__": main()
-
Run with launch script:
./launch_v2.sh \ --app-name "MyApp" \ --app-executable "my_app" \ --app-params "--model bert-base --batch-size 32" # For Stable Diffusion (modernized) ./launch_v2.sh \ --app-name "StableDiffusion" \ --app-executable "../app-stable-diffusion/StableDiffusionViaHF.py" \ --app-params "--model-variant sdxl --steps 30"
./launch_v2.sh \
--gpu-type A100 \
--profiling-tool dcgmi \
--profiling-mode dvfs./launch_v2.sh \
--gpu-type V100 \
--profiling-tool nvidia-smi \
--profiling-mode baseline./launch_v2.sh \
--gpu-type H100 \
--profiling-tool dcgmi \
--profiling-mode dvfsThe framework supports intelligent profiling tool selection:
# Prefer DCGMI (will fallback to nvidia-smi if unavailable)
./launch_v2.sh --profiling-tool dcgmi
# Force nvidia-smi usage
./launch_v2.sh --profiling-tool nvidia-smi
# Test profiling tool availability
dcgmi discovery --list # Check DCGMI
nvidia-smi # Check nvidia-smi#!/bin/bash
# Test script for multiple GPU types and applications
for gpu in A100 V100; do
for app in "LSTM" "StableDiffusion"; do
./launch_v2.sh \
--gpu-type $gpu \
--app-name $app \
--profiling-mode baseline \
--num-runs 1
done
done# Check GPU visibility and type
nvidia-smi
# Check DCGMI availability (optional)
dcgmi discovery --list
# Test profiling tool fallback
./launch_v2.sh --profiling-tool dcgmi # Will auto-fallback to nvidia-smi if needed
# Reset GPU if needed
sudo nvidia-smi --gpu-reset# Check Python version (3.8+ required)
python --version
# Test config module compatibility
python -c "import config; print('Config loaded successfully')"
# Check HuggingFace authentication
huggingface-cli whoami# Test visualization module imports
cd tools/analysis/visualization
python -c "import visualize_edp_results; print('β
Visualization modules working')"
# Check if experimental data exists
ls ../../sample-collection-scripts/results_*/
# Test matplotlib backend (for headless environments)
python -c "import matplotlib; matplotlib.use('Agg'); import matplotlib.pyplot as plt; print('β
Matplotlib working')"# Check available partitions
sinfo
# Check A100 nodes (toreador)
sinfo -p toreador
# Check V100 nodes (matador)
sinfo -p matador
# Test SLURM job submission
sbatch --test-only submit_job.shπ For detailed troubleshooting, see documentation/GPU_USAGE_GUIDE.md troubleshooting sections
- Ensure stable GPU temperature before experiments
- Run experiments during low system load
- Use dedicated GPU nodes when possible
- Increase sampling interval for longer workloads
- Use
--profiling-mode baselinefor quick testing
- Use
--num-runs 1for quick tests - Set
--sleep-interval 0to reduce delays - Use
--profiling-mode baseline(single frequency) - Test with smaller model variants first
- Use V100 nodes with
--gpu-type V100for availability
The framework includes streamlined documentation focused on practical usage:
- GPU_USAGE_GUIDE.md: Complete GPU support guide for A100, V100, and H100 across HPCC and REPACSS clusters
- USAGE_EXAMPLES.md: Complete CLI usage examples and automation scripts
- SUBMIT_JOBS_README.md: SLURM submission guide and HPC cluster deployment
- tools/README.md: Advanced analysis and optimization tools documentation
- tools/analysis/visualization/README.md: Complete visualization system with outlier detection documentation
- sample-collection-scripts/README.md: Profiling framework documentation
- app-stable-diffusion/README.md: Modernized Stable Diffusion application with latest models
- app-whisper/README.md: OpenAI Whisper speech recognition for audio processing energy profiling
- app-llama/README.md: LLaMA text generation application for language model energy profiling
All documentation follows consistent patterns with practical examples, and comprehensive troubleshooting sections.
If you use this framework in your research, please cite:
@misc{Side:2025:AIEnergy:GitHub,
title={AI Inference Energy Profiling Framework},
author={Side, Mert},
year={2025},
url={https://github.com/mertside/ai-inference-energy}
}Happy profiling! β‘π¬