A machine learning project that predicts CO2 emissions of cars based on engine size using linear regression analysis.
This project analyzes the relationship between car engine size and CO2 emissions using a linear regression model. The analysis includes data visualization, model training, evaluation, and performance metrics to understand how engine specifications correlate with environmental impact.
- Predict CO2 emissions based on car engine size
- Visualize the correlation between engine specifications and emissions
- Build and evaluate a linear regression model
- Provide insights into automotive environmental impact
The project uses the FuelConsumptionCo2.csv dataset which contains:
- Source: IBM Developer Skills Network
- Records: 1000+ vehicle entries from 2014
- Key Features:
ENGINESIZE: Engine displacement in litersCYLINDERS: Number of engine cylindersFUELCONSUMPTION_COMB: Combined fuel consumption (L/100km)CO2EMISSIONS: CO2 emissions in grams per kilometer
MODELYEAR | MAKE | MODEL | ENGINESIZE | CYLINDERS | FUELCONSUMPTION_COMB | CO2EMISSIONS
2014 | ACURA| ILX | 2.0 | 4 | 8.5 | 196
- Python 3.x
- Pandas: Data manipulation and analysis
- NumPy: Numerical computing
- Matplotlib: Data visualization
- Scikit-learn: Machine learning algorithms
- Requests: HTTP library for data downloading
-
Clone the repository:
git clone https://github.com/yourusername/ML_Car_CO2_Emission_Linear_Regression_model.git cd ML_Car_CO2_Emission_Linear_Regression_model -
Install required dependencies:
pip install pandas numpy matplotlib scikit-learn requests
-
Run the model:
python model.py
- Automatic download of fuel consumption dataset from IBM Cloud
- Data validation and error handling
- Feature selection (Engine Size, Cylinders, Fuel Consumption, CO2 Emissions)
- Data exploration and statistical analysis
- Histograms: Distribution analysis of key features
- Scatter Plots:
- Engine Size vs CO2 Emissions
- Fuel Consumption vs CO2 Emissions
- Number of Cylinders vs CO2 Emissions
- Data Split: 80% training, 20% testing
- Algorithm: Linear Regression
- Features: Engine Size (primary predictor)
- Target: CO2 Emissions
- Model fitting on training data
- Performance evaluation using:
- Mean Squared Error (MSE)
- R² Score (Coefficient of Determination)
- Training data scatter plot with regression line
- Test data predictions with fitted line
- Model performance visualization
- R² Score: ~0.85-0.90 (indicating strong correlation)
- MSE: Quantifies prediction accuracy
- Correlation: Strong positive relationship between engine size and CO2 emissions
- Larger engines typically produce higher CO2 emissions
- Linear relationship exists between engine displacement and environmental impact
- Fuel consumption patterns align with emission trends
ML_Car_CO2_Emission_Linear_Regression_model/
│
├── model.py # Main analysis script
├── FuelConsumption.csv # Dataset (auto-downloaded)
├── README.md # Project documentation
└── .git/ # Git repository files
# Import required libraries
from sklearn import linear_model
import pandas as pd
import numpy as np
# Load and preprocess data
df = pd.read_csv("FuelConsumption.csv")
df = df[['ENGINESIZE', 'CYLINDERS', 'FUELCONSUMPTION_COMB', 'CO2EMISSIONS']]
# Train model
regr = linear_model.LinearRegression()
train_x = np.asanyarray(train[['ENGINESIZE']])
train_y = np.asanyarray(train[['CO2EMISSIONS']])
regr.fit(train_x, train_y)
# Make predictions
predictions = regr.predict(test_x)pandas>=1.3.0
numpy>=1.21.0
matplotlib>=3.4.0
scikit-learn>=1.0.0
requests>=2.25.0
- Fork the repository
- Create a feature branch (
git checkout -b feature/improvement) - Commit changes (
git commit -am 'Add new feature') - Push to branch (
git push origin feature/improvement) - Create a Pull Request
Devansh Tomar
- GitHub: @devanshtomar
- LinkedIn: Devansh Tomar
- IBM Developer Skills Network for providing the dataset
- Scikit-learn community for machine learning tools
- Open source community for Python libraries
- Multiple linear regression with additional features
- Cross-validation for better model validation
- Feature engineering for improved accuracy
- Web application for interactive predictions
- Comparison with other regression algorithms
- Time series analysis for emission trends
⭐ Star this repository if you found it helpful!