Urban City Analysis ML is a comprehensive, menu-driven data science solution for analyzing urban city data using machine learning. The project enables you to predict crime occurrence, accident severity, and passenger count from real-world urban datasets with detailed explanations and user-friendly interface.
- Smart Data Preprocessing: Automatically handles missing values, encodes categorical features, and scales numerical data
- Detailed Statistics Display: Shows dataset overview with unique values, ranges, and data quality metrics
- Exploratory Data Analysis: Interactive visualizations including box plots, correlation heatmaps, and scatter plots
- Purpose: Predicts likelihood of crime occurrence based on urban environmental factors
- Algorithm: Ensemble stacking model combining Random Forest and Gradient Boosting
- Features Used:
- Passenger count and trip duration
- Geographic location (district, latitude, longitude)
- Weather conditions and vehicle count
- Time-based features (hour, day of week, weekend indicator)
- Output: Accuracy metrics, feature importance, confusion matrix, and classification report
- Purpose: Classifies accident severity levels (low/medium/high)
- Algorithms: Multiple classification models (Random Forest, Decision Tree, Logistic Regression)
- Features Used: Weather conditions, vehicle count, location data, traffic factors
- Output: Comparative model performance, best model selection, and detailed metrics
- Purpose: Estimates passenger count using advanced regression analysis
- Algorithm: Random Forest Regression with hyperparameter tuning via GridSearchCV
- Features Used: Weather conditions, time factors, stop locations, accident severity, crime category
- Advanced Features: Feature engineering with interaction terms and cross-validation
- Output: R² scores, MSE, RMSE, MAPE, and feature importance rankings
- Menu-Driven Interface: Easy navigation with colored, descriptive menu options
- Progress Indicators: Real-time feedback with emojis and colored status messages
- Detailed Explanations: Each analysis includes methodology descriptions and result interpretations
- Error Handling: Graceful handling of missing data and user input validation
- Optional Visualizations: Choose whether to display data exploration charts
git clone https://github.com/dharm1123/Urban-City-Analysis-ML.git
cd Urban-City-Analysis-MLpip install -r requirements.txtMain Dependencies:
- pandas (data manipulation)
- numpy (numerical computations)
- scikit-learn (machine learning algorithms)
- matplotlib & seaborn (data visualization)
- termcolor (colored terminal output)
- Ensure
final_crime_dataset.csvis in the project root directory - For passenger count analysis,
passenger_count_dataset_modified.csvis also recommended - The system will automatically handle missing files and adapt accordingly
python PROJECTEND.py- Data Loading: The program automatically loads and preprocesses your data
- Visualization Choice: Decide whether to see exploratory data analysis charts
- Analysis Selection: Choose from:
1- Crime Prediction Analysis2- Accident Severity Prediction3- Passenger Count Prediction0- Exit
- Results Review: Examine detailed performance metrics and interpretations
Urban-City-Analysis-ML/
├── PROJECTEND.py # 🎯 Main program (menu-driven interface)
├── final_crime_dataset.csv # 📊 Primary dataset (user provided)
├── passenger_count_dataset_modified.csv # 🚌 Passenger count dataset (optional)
├── requirements.txt # 📦 Python dependencies
├── README.md # 📚 This documentation
├── About project.docx # 📄 Project overview document
├── model.docx # 📈 Model documentation
└── [Additional model scripts] # 🔧 Individual analysis scripts
- Missing Value Handling: Smart imputation using random sampling from existing values
- Feature Engineering: Time-based features, interaction terms, categorical encoding
- Data Scaling: StandardScaler for numerical features, one-hot encoding for categorical
- Data Quality Checks: Outlier detection and removal, data type validation
- Classification: Accuracy, Precision, Recall, F1-Score, Confusion Matrix
- Regression: R² Score, MSE, RMSE, MAPE, Cross-validation scores
- Feature Analysis: Feature importance rankings and interpretations
- Ensemble Methods: Stacking classifier for improved prediction accuracy
- Hyperparameter Tuning: GridSearchCV for optimal model parameters
- Cross-Validation: Robust model evaluation with multiple data splits
- Balanced Datasets: Automatic handling of imbalanced data through sampling
Primary Author: DHARM DUDHAGARA
Contributors:
For questions, suggestions, feature requests, or technical support:
- GitHub Issues: Report bugs or request features
- Repository: View source code and documentation
- ✅ Menu-driven interface for better user experience
- ✅ Comprehensive explanations for each analysis type
- ✅ Error handling and validation for robust operation
- ✅ Detailed progress indicators with colored output
- ✅ Advanced feature engineering and model optimization
- ✅ Professional documentation with clear methodology descriptions
- ✅ Modular code structure for easier maintenance and extension
Built with ❤️ for urban data science and smart city analysis