An AI-powered network management system that predicts faults in 5G testbed environments using machine learning. This project aims to proactively detect and prevent network failures, improving service quality and reducing downtime.
| Member | Role | Responsibilities |
|---|---|---|
| Member 1 | Data Engineer | Dataset creation, preprocessing, and validation |
| Member 2 | ML Engineer | Model training, optimization, and evaluation |
| Member 3 | Backend Developer | API development and ML model integration |
| Member 4 | Frontend Developer | Dashboard creation and visualization |
AI-powered-fault-prediction/
β
βββ data/ # Dataset storage
β βββ synthetic_5g_fault_dataset.csv
β
βββ scripts/ # Data generation & preprocessing scripts
β βββ generate_synthetic_data.py
β βββ data_preprocessing.py (Day 2)
β
βββ notebooks/ # Jupyter notebooks for analysis
β βββ eda_report.ipynb (Day 3)
β
βββ models/ # Trained ML models
β βββ fault_prediction_model.pkl
β
βββ api/ # Backend API code
β βββ app.py
β
βββ dashboard/ # Frontend dashboard
β βββ streamlit_app.py
β
βββ requirements.txt # Python dependencies
βββ README.md # Project documentation
- Python 3.8 or higher
- pip package manager
- Navigate to the project directory:
cd AI-powered-fault-prediction- Install dependencies:
pip install -r requirements.txt- Generate and preprocess data (Days 1-2 Completed β ):
cd scripts
python generate_synthetic_data.py
python data_preprocessing.pydata/train.csv- 8,000 samples for trainingdata/test.csv- 2,000 samples for testingdata/scaler.pkl- StandardScaler for deploymentdata/label_encoder.pkl- Label encoder for predictions
Training Set: data/train.csv
- Samples: 8,000
- Features: 17 (scaled and encoded)
- Class Distribution: 70.6% Faulty, 29.4% Normal
Test Set: data/test.csv
- Samples: 2,000
- Features: 17 (scaled and encoded)
- Class Distribution: 70.7% Faulty, 29.3% Normal
Original Dataset: data/synthetic_5g_fault_dataset.csv (10,000 samples)
rssi_dbm: Received Signal Strength Indicator (dBm)sinr_db: Signal-to-Interference-plus-Noise Ratio (dB)throughput_mbps: Data throughput (Mbps)latency_ms: Network latency (milliseconds)jitter_ms: Packet delay variation (milliseconds)packet_loss_percent: Packet loss percentage
cpu_usage_percent: CPU utilizationmemory_usage_percent: Memory utilizationtemperature_celsius: Equipment temperatureactive_users: Number of connected users
timestamp: Time of measurementbase_station_id: Base station identifiercell_id: Cell tower identifierhour: Hour of day (0-23)day_of_week: Day of week (0-6)is_peak_hour: Peak hour indicator (9 AM - 5 PM)
network_quality_score: Composite network health metric (0-1)resource_stress: Average CPU and memory utilization
fault_status: Normal or Faulty
- Synthetic dataset generation with 10,000 samples
- 19 features including network metrics and fault labels
- Data validation (5/5 checks passed)
- Deliverables:
synthetic_5g_fault_dataset.csv,generate_synthetic_data.py
- Data cleaning and validation
- Feature scaling (StandardScaler) and encoding
- Train-test split (80-20, stratified)
- Saved preprocessing artifacts
- Deliverables:
data_preprocessing.py,train.csv(8K),test.csv(2K),scaler.pkl,label_encoder.pkl
- Feature distribution analysis
- Correlation analysis and heatmap
- Class balance visualization
- Feature importance identification
- Temporal pattern analysis
- Deliverables:
eda_report.ipynbwith 15+ visualizations
- Final dataset documentation
- Model training guidelines and sample code
- API integration specifications
- Complete ML team handoff documentation
- Deliverables:
HANDOFF_TO_ML_TEAM.md- Complete guide for ML Engineer
# Generate dataset
cd scripts
python generate_synthetic_data.py
# Preprocess data
python data_preprocessing.pyimport pandas as pd
import pickle
# Load preprocessed data
train_df = pd.read_csv('data/train.csv')
test_df = pd.read_csv('data/test.csv')
# Load scaler and encoder for deployment
with open('data/scaler.pkl', 'rb') as f:
scaler = pickle.load(f)
with open('data/label_encoder.pkl', 'rb') as f:
label_encoder = pickle.load(f)
# Features and target
X_train = train_df.drop('fault_status', axis=1)
y_train = train_df['fault_status']
# Start model training...The preprocessed data will be used to train:
- Random Forest Classifier
- XGBoost
- Support Vector Machine (SVM)
- Neural Networks
Evaluation Metrics:
- Accuracy
- Precision
- Recall
- F1-Score
- ROC-AUC
Backend API will provide:
/predict- Real-time fault prediction/upload- Bulk data upload/health- System health check/metrics- Network metrics dashboard
Interactive dashboard will display:
- Real-time network health status
- Fault probability visualization
- Alert notifications
- Historical trend analysis
- Network KPI monitoring
Each team member works on their designated area:
- Create feature branch from main
- Commit changes with clear messages
- Test thoroughly before merge
- Document all changes
This is an academic project for 5G network fault prediction research.
Team Members:
- Data Engineer: Dataset & Preprocessing
- ML Engineer: Model Development
- Backend Developer: API Integration
- Frontend Developer: Dashboard & UI
Last Updated: November 4, 2025
Status: Days 1-4 Complete β
| Data Engineering Finished | Ready for ML Training π
All data work is finished! The ML team has everything needed:
- β Clean, preprocessed datasets (train/test)
- β Comprehensive EDA with insights
- β Deployment artifacts (scaler, encoder)
- β Complete handoff documentation
π ML Team: Start with HANDOFF_TO_ML_TEAM.md