Aging-Aware Condition-Based Maintenance System using Deep Q-Learning
This project implements a Condition-Based Maintenance (CBM) system that considers equipment aging (deterioration) using Deep Q-Learning (DQN).
- 3D State Space: Multi-dimensional state representation with
(condition, temperature, normalized_age) - Installation Date Integration: Incorporating equipment aging information into learning
- Aging Risk Model: Dynamic degradation risk adjustment through aging_factor
- Multi-Equipment Support: Comparative learning across different equipment types
- Comprehensive Visualization: Multi-faceted results visualization including aging analysis
flowchart TB
subgraph DataInput["📊 Data Input"]
A1["Equipment Specs CSV<br/>Measurement CSV<br/>Installation Date CSV"] --> A2["data_preprocessor.py"]
A2 --> A3["Equipment Age Calculation<br/>(Current Date - Installation Date)"]
A3 --> A4["Statistical Threshold Calculation<br/>Smin/Smax Estimation"]
A4 --> A5["State Classification<br/>Normal/Anomalous"]
A5 --> A6["Aging Analysis<br/>Age-Anomaly Rate Correlation"]
end
subgraph AgingModel["⏳ Aging Model"]
B1["3D State Space<br/>(condition, temperature, age)"]
B2["Aging Factor<br/>0.003-0.018"]
B3["Age Normalization<br/>age/max_age"]
B4["Dynamic Risk Adjustment<br/>transition_matrix x aging_factor"]
B1 --> B2
B2 --> B3
B3 --> B4
end
subgraph Learning["🤖 Reinforcement Learning Engine"]
C1["train_cbm_dqn_v2.py"]
C2["QR-DQN + Aging Adaptation<br/>51 quantiles"]
C3["Parallel Learning Environment<br/>16 AsyncVectorEnv"]
C4["Progressive Learning Strategy<br/>• Fast: 200ep<br/>• Standard: 400ep<br/>• Long-term: 800ep<br/>• Special: 2000ep+"]
C1 --> C2
C2 --> C3
C3 --> C4
end
subgraph Equipment["🏭 Equipment-Type Specific"]
D1["AHU Equipment<br/>Differential Pressure"]
D2["Pump Equipment<br/>Tank Level/Power"]
D3["HVAC Equipment<br/>Temperature/Pressure"]
D4["Individual aging_factor<br/>• AHU: 0.008+age x 0.0005<br/>• Pump: 0.005+age x 0.0003<br/>• Cooler: 0.015+age x 0.001"]
D1 --> D4
D2 --> D4
D3 --> D4
end
subgraph Results["📈 Visualization"]
E1["visualize_results.py"]
E2["Learning Curves<br/>Aging Analysis<br/>Transition Matrix<br/>Policy Evaluation"]
E3["compare_equipment_analysis.py"]
E4["7-Equipment Comparison<br/>• Performance Ranking<br/>• Convergence Analysis<br/>• Age Correlation"]
E1 --> E2
E3 --> E4
end
subgraph Analysis["📊 Analysis Results"]
F1["Equipment Performance Ranking<br/>1. AHU-TSK-A-2: +73.88<br/>2. Chemical Pump CP-500-5: +19.32<br/>3. OAC-TSK-F-2: +17.49<br/>..."]
F2["Key Findings<br/>• Critical Importance of Learning Time<br/>• Anomaly Rate as Performance Determinant<br/>• Individual > Type Differences"]
F3["Implementation Readiness<br/>• Level 1: Immediate (AHU)<br/>• Level 2: Monitored (4 units)<br/>• Level 3: Improvement Needed (1 unit)"]
F1 --> F2
F2 --> F3
end
subgraph Insights["💡 Validated Insights"]
G1["AgedRL_Lesson.md<br/>README.md"]
G2["7-Equipment Validated Findings<br/>• Age Correlation: +0.371<br/>• Data Quality is Critical<br/>• Equipment Type Myth Limitations"]
G1 --> G2
end
A6 --> B1
B4 --> C1
C4 --> D4
D4 --> E1
E2 --> F1
E4 --> F1
F3 --> G1
style DataInput fill:#e3f2fd
style AgingModel fill:#fff3e0
style Learning fill:#f3e5f5
style Equipment fill:#e8f5e9
style Results fill:#fce4ec
style Analysis fill:#fff9c4
style Insights fill:#e0f2f1
System Features:
- 3D State Space: Addition of equipment age to conventional 2D (condition, temperature)
- Dynamic Aging Model: Degradation risk adjustment through aging_factor
- Equipment-Type Individualization: Optimization according to AHU, Pump, HVAC characteristics
- Progressive Learning Strategy: Optimal episode numbers according to equipment characteristics
- 7-Equipment Validation Platform: Comprehensive performance verification across diverse equipment
- cbm_environment.py: 3D state space-compatible CBM environment
- data_preprocessor.py: Installation date data integration processing
- train_cbm_dqn_v2.py: Aging-aware DQN learning engine
- visualize_results.py: Comprehensive results visualization system
data/private_benchmark/
├── # Equipment installation date data (205 units)
├── # Equipment specification data
└── # Measurement data samples
Learning completed for 6 equipment units (2000 episodes each):
| Equipment Type | Equipment ID | Age (Years) | Final Performance | Improvement | Learning Results |
|---|---|---|---|---|---|
| Chemical Pump | 265715 | 19.7 | +19.32 | +9.02 | outputs_pump_265715 |
| Cooling Pump | 137953 | 3.0 | -3.07 | +55.33 | outputs_pump_137953 |
| Chemical Pump | 519177 | 0.5 | +11.34 | +56.54 | outputs_pump_519177 |
| AHU | 327240 | 15.6 | +73.88 | +2.53 | outputs_ahu_327240 |
| R-1-3 | 265694 | 19.7 | +13.93 | +298.83 | outputs_r13_265694 |
| OAC | 322220 | 17.7 | +17.49 | +7.19 | outputs_oac_322220 |
- Importance of Learning Time: Dramatic improvement with 2000 episodes (R-1-3: +298.83 improvement)
- Age Correlation Validation: +0.371 positive correlation confirmed (older equipment shows higher performance)
- Universal Equipment Improvement: 5 out of 6 equipment achieved implementation-ready levels
- Equipment Type-Specific Characteristics: AHU excellence (+73.88), Pump stability, Power system challenges
- Immediate Implementation: AHU-TSK-A-2 (Performance +73.88, Stability 11.33)
- Monitored Implementation: 2 Chemical Pumps, OAC, R-1-3 (4 equipment units)
- Continuous Improvement Needed: Cooling Pump (Special challenges with power measurement items)
- Fast Type (<200ep): Chemical Pump CP-500-3 (100ep convergence)
- Standard Type (200-400ep): AHU, OAC, Chemical Pump CP-500-5 (Recommended range)
- Long-term Type (>600ep): R-1-3 (768ep, overcoming high anomaly rates)
For detailed analysis results, see AgedRL_Lesson.md
# Create and activate Python virtual environment
python -m venv .venv
.venv\Scripts\activate
# Install dependencies
pip install torch torchvision numpy pandas matplotlib seaborn pyyaml gym# Standard learning (recommended)
python train_cbm_dqn_v2.py --equipment_id 265715 --measurement_id 260374 --episodes 2000 --scenario balanced --aging_factor 0.018 --output_dir outputs_pump_265715
# Fast learning (for testing)
python train_cbm_dqn_v2.py --equipment_id 327240 --measurement_id 167473 --episodes 400 --scenario balanced --aging_factor 0.015 --output_dir outputs_test# PowerShell
powershell -ExecutionPolicy Bypass -File run_6_equipment_training.ps1
# Batch file
run_6_equipment_training.batpython visualize_results.py --output_dir outputs_pump_265715 --equipment_id 265715 --measurement_id 260374Adjusted according to equipment age and measurement items:
- New Equipment (0-5 years): 0.003-0.005 (200ep recommended)
- Mid-age (10-16 years): 0.015 (400ep recommended)
- Aged (17+ years): 0.018 (800ep recommended)
Equipment type-specific recommendations based on validation:
- Fast Learning Type: 200ep (New chemical pumps, etc.)
- Standard Learning Type: 400ep (AHU, OAC, etc., most common)
- Long-term Learning Type: 800ep (High anomaly rate equipment, R-1-3, etc.)
- Special Challenge Type: 2000ep+ (Power measurement items, etc.)
- balanced: Balanced maintenance strategy (recommended)
- cost_efficient: Cost-focused
- safety_first: Safety-focused
- AHU-TSK-A-2 (15.6 years): +73.88 🏆 (Stability: 11.33)
- Chemical Pump CP-500-5 (19.7 years): +19.32 🥈 (Improvement: +9.02)
- OAC-TSK-F-2 (17.7 years): +17.49 🥉 (Improvement: +7.19)
- R-1-3 (19.7 years): +13.93 🚀 (Dramatic improvement: +298.83)
- Chemical Pump CP-500-3 (0.5 years): +11.34 ⚡ (Dramatic improvement: +56.54)
- Cooling Pump CDP-A5 (3.0 years): -3.07
⚠️ (Major improvement: +55.33)
- Chemical Pump CP-500-3: 100ep ⚡ (Ultra-fast convergence)
- OAC-TSK-F-2: 260ep 📈 (Fast convergence)
- Chemical Pump CP-500-5: 303ep 📊 (Standard convergence)
- AHU-TSK-A-2: 386ep 📊 (Standard convergence)
- R-1-3: 768ep 🐌 (Long-term convergence, but ultimately successful)
Each learning result generates 4 types of plots:
- training_history.png: Learning curves (rewards, loss)
- transition_matrix.png: State transition matrix
- aging_analysis.png: Aging analysis (age-anomaly rate correlation)
- policy_evaluation.png: Policy evaluation (action distribution)
# Generate all equipment list
python generate_equipment_list.py
# Pump equipment specialized
python find_pumps.py# Data preprocessing for specific equipment
python data_preprocessor.py --equipment_id 265715 --measurement_id 260374- Learning Time is Critical: Dramatic improvement with sufficient episodes (1000ep→2000ep improved all equipment)
- Equipment Type-Specific Strategy: Uniform parameters have limitations, individualization is important
- Convergence Determination: Initial learning difficulties can be overcome through continuation (R-1-3 validation)
- Execution Time: Approximately 20-30 minutes for 2000 episodes (varies by equipment)
- Memory Usage: Approximately 2-3GB required during learning
- GPU Recommended: CUDA-compatible GPU enables fast learning
- AgedRL_Lesson.md: Equipment age-specific maintenance strategy analysis
- README_age.md: Technical details of aging functionality
- README_v03_JP.md: Japanese version documentation
- Scenario_Lessons.md: Scenario comparison analysis
- Progressive Implementation: AHU immediate implementation → 4 equipment monitored implementation
- Dynamic Learning Time Adjustment: Application of optimal episode numbers by equipment type
- Individualized aging_factor: Refinement based on validation data
- Hybrid Methods: For special challenge equipment like power systems
- Transfer Learning: Knowledge transfer from successful equipment (AHU)
- Real-time Adaptation: Degradation prediction utilizing age correlation +0.371
- Multi-metric Learning: Low priority as single-item improvements have been achieved for most equipment
Created: December 23, 2025
Target Equipment: 6 units (Age 0.5-19.7 years)
Learning Completed: All 6 units (2000 episodes each)
Validation Completed: Dramatic improvement effects confirmed, implementation readiness evaluated
Analysis Completed: Equipment age-specific maintenance strategy comparative analysis, convergence pattern establishment established

