Bridge Health Level Classification using Explainable Boosting Machine
An AI-powered bridge health classification system that automatically categorizes bridge inspection reports into health levels using machine learning. The system leverages Explainable Boosting Machine (EBM) to achieve high accuracy while maintaining interpretability.
- 🚀 EBM 25倍高速化: 25分 → 63.8秒(16並列処理)
- 91.88% Test Accuracy - 実用レベル達成!
- 86.20% F1-macro score (+19.8pt improvement from v0.2)
- フルデータ学習: 8,615件処理(31倍データ活用)
- Repair-requirement F1: 80% - 実用性確保
| Version | Data Size | Test Accuracy | F1-Macro | Key Innovation |
|---|---|---|---|---|
| v0.3 | 8,615件 | 91.88% | 86.20% | 16並列高速化 + フルデータ |
| v0.2 | 276件 | 84.34% | 66.40% | 集約データでのMVP |
# Clone the repository
git clone https://github.com/YOUR_USERNAME/health-ebm-classification.git
cd health-ebm-classification
# Set up virtual environment
python -m venv .venv
.\.venv\Scripts\Activate.ps1 # Windows
# source .venv/bin/activate # macOS/Linux
# Install dependencies
pip install -r requirements.txt
# Run the complete pipeline
cd src
python main_pipeline.py| Model | Training Time | Val F1-Macro | Test Accuracy | 特徴 |
|---|---|---|---|---|
| 🥇 EBM | 63.80秒 | 85.34% | 91.88% | 最高精度+高速化 |
| 🥈 XGBoost Enhanced | 5.42秒 | 82.91% | 89.12% | 高速高精度 |
| 🥉 CatBoost | 28.12秒 | 79.44% | 87.56% | バランス型 |
| LightGBM | 2.23秒 | 76.38% | 85.23% | 超高速 |
| Random Forest | 0.42秒 | 71.64% | 82.34% | 最高速 |
- EBM高速化: 25分 → 63.8秒(25倍高速化)
- 精度向上: Test Accuracy 84.34% → 91.88% (+7.54pt)
- F1向上: 66.40% → 86.20% (+19.80pt)
- 実用性: Repair-requirement F1 25% → 80% (+55pt)
Raw CSV Data → Preprocessing → Feature Engineering → Model Training → Evaluation
↓ ↓ ↓ ↓ ↓
9,753 records → 8,615 samples → 1,019 features → 7 models → Best: EBM (91.88%)
(31倍データ) (フル活用) (16並列) (実用レベル)
- Level Ⅰ (Healthy): 1,404 samples (16.3%)
- Level Ⅱ (Preventive): 6,332 samples (73.5%)
- Repair-required (III+): 879 samples (10.2%) - 44倍増加!
- ⚡ 16並列処理: EBMを25倍高速化(25分→63.8秒)
- 📊 フルデータ学習: 8,615件の個別記録処理
- 🎯 実行時間トラッキング: 全モデルの性能監視
- 🔧 最適化パラメータ: interactions=10, max_bins=64
- 7 Advanced Models: EBM, LightGBM, CatBoost, XGBoost, Random Forest
- Class Imbalance Handling: Strategic class consolidation and weighting
- Cross-validation: 5-fold CV for robust evaluation
- Interpretable AI: EBM provides feature importance and decision explanations
- Japanese NLP: Janome morphological analysis
- TF-IDF Vectorization: 1,000-dimensional text features
- Domain Keywords: Bridge-specific terminology extraction
- Multi-modal Features: Text + numerical + categorical data
- Automated Pipeline: End-to-end ML workflow
- Error Handling: Robust processing with fallback strategies
- Modular Design: Easily extensible components
- Performance Monitoring: Detailed metrics and reporting
health-ebm-classification/
├── src/
│ ├── main_pipeline.py # Main execution pipeline (v0.3対応)
│ ├── data_loader.py # Data loading + フルデータモード
│ ├── feature_engineering.py # Feature extraction and engineering
│ └── model_trainer.py # Model training + 16並列処理
├── 1_inspection-dataset/ # Bridge inspection data
├── docs/
│ ├── README_v0-3.md # 🆕 v0.3 成果まとめ
│ ├── README_v0-2.md # v0.2 technical documentation
│ └── QUICK_GUIDE.md # 5-minute start guide
├── requirements.txt # Python dependencies
├── .gitignore # Git ignore rules
└── README.md # This file
- Automated Inspection: Reduce manual assessment time
- Risk Prioritization: Identify bridges requiring immediate attention
- Maintenance Planning: Data-driven repair scheduling
- Quality Assurance: Consistent evaluation standards
- Explainable Predictions: Understand why a bridge needs repair
- Confidence Scoring: Reliability indicators for each prediction
- Comparative Analysis: Benchmark against historical data
- Expert Validation: AI recommendations with human oversight
- Python 3.11+
- 8GB+ RAM (for フルデータ処理)
- マルチコアCPU推奨 (16並列処理対応)
- Bridge inspection CSV data
pip install pandas numpy scikit-learn
pip install lightgbm xgboost catboost
pip install interpret janome # For EBM and Japanese textYour CSV files should contain:
BridgeID: Unique bridge identifierHealthLevel: Current health assessment (Ⅰ, Ⅱ, Ⅲ, Ⅳ, Ⅴ)Diagnosis: Inspection text descriptionDamageComment: Detailed damage observations
- 25倍高速化: EBM学習時間 25分→63.8秒
- 16並列処理: CPU最適活用実装
- フルデータ学習: 8,615件処理対応
- 実用レベル達成: Test Accuracy 91.88%
- 包括的ドキュメント: README_v0-3.md作成
- REST API implementation
- Real-time prediction endpoint
- Enhanced interpretability dashboard
- Web application interface
- Web application interface
- Automated model retraining
- Integration with inspection databases
- Multi-language support
- Image analysis integration
- Geographic factor modeling
- Predictive maintenance forecasting
- Mobile app development
We welcome contributions! Please see our Contributing Guidelines for details.
# Clone with development dependencies
git clone https://github.com/YOUR_USERNAME/health-ebm-classification.git
cd health-ebm-classification
# Install development dependencies
pip install -r requirements-dev.txt
# Run tests
python -m pytest tests/
# Run linting
flake8 src/
black src/This project is licensed under the MIT License - see the LICENSE file for details.
- Issues: GitHub Issues
- Documentation: See
docs/folder for detailed guides - Questions: Create a discussion in the repository
- Microsoft InterpretML: For the Explainable Boosting Machine implementation
- Japanese NLP Community: For Janome morphological analyzer
- Bridge Engineering Domain Experts: For validation and insights
Built with ❤️ for safer infrastructure
Last Updated: October 3, 2025 - v0.3 Release