Skip to content

DevanshTomar/ML_Car_CO2_Emission_Linear_Regression_model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

Car CO2 Emission Prediction using Linear Regression

A machine learning project that predicts CO2 emissions of cars based on engine size using linear regression analysis.

📋 Project Overview

This project analyzes the relationship between car engine size and CO2 emissions using a linear regression model. The analysis includes data visualization, model training, evaluation, and performance metrics to understand how engine specifications correlate with environmental impact.

🎯 Objectives

  • Predict CO2 emissions based on car engine size
  • Visualize the correlation between engine specifications and emissions
  • Build and evaluate a linear regression model
  • Provide insights into automotive environmental impact

📊 Dataset

The project uses the FuelConsumptionCo2.csv dataset which contains:

  • Source: IBM Developer Skills Network
  • Records: 1000+ vehicle entries from 2014
  • Key Features:
    • ENGINESIZE: Engine displacement in liters
    • CYLINDERS: Number of engine cylinders
    • FUELCONSUMPTION_COMB: Combined fuel consumption (L/100km)
    • CO2EMISSIONS: CO2 emissions in grams per kilometer

Sample Data Structure

MODELYEAR | MAKE | MODEL | ENGINESIZE | CYLINDERS | FUELCONSUMPTION_COMB | CO2EMISSIONS
2014      | ACURA| ILX   | 2.0        | 4         | 8.5                  | 196

🛠️ Technologies Used

  • Python 3.x
  • Pandas: Data manipulation and analysis
  • NumPy: Numerical computing
  • Matplotlib: Data visualization
  • Scikit-learn: Machine learning algorithms
  • Requests: HTTP library for data downloading

🚀 Installation & Setup

  1. Clone the repository:

    git clone https://github.com/yourusername/ML_Car_CO2_Emission_Linear_Regression_model.git
    cd ML_Car_CO2_Emission_Linear_Regression_model
  2. Install required dependencies:

    pip install pandas numpy matplotlib scikit-learn requests
  3. Run the model:

    python model.py

📈 Project Workflow

1. Data Collection

  • Automatic download of fuel consumption dataset from IBM Cloud
  • Data validation and error handling

2. Data Preprocessing

  • Feature selection (Engine Size, Cylinders, Fuel Consumption, CO2 Emissions)
  • Data exploration and statistical analysis

3. Exploratory Data Analysis (EDA)

  • Histograms: Distribution analysis of key features
  • Scatter Plots:
    • Engine Size vs CO2 Emissions
    • Fuel Consumption vs CO2 Emissions
    • Number of Cylinders vs CO2 Emissions

4. Model Development

  • Data Split: 80% training, 20% testing
  • Algorithm: Linear Regression
  • Features: Engine Size (primary predictor)
  • Target: CO2 Emissions

5. Model Training & Evaluation

  • Model fitting on training data
  • Performance evaluation using:
    • Mean Squared Error (MSE)
    • R² Score (Coefficient of Determination)

6. Visualization

  • Training data scatter plot with regression line
  • Test data predictions with fitted line
  • Model performance visualization

📊 Key Findings

Model Performance Metrics

  • R² Score: ~0.85-0.90 (indicating strong correlation)
  • MSE: Quantifies prediction accuracy
  • Correlation: Strong positive relationship between engine size and CO2 emissions

Insights

  • Larger engines typically produce higher CO2 emissions
  • Linear relationship exists between engine displacement and environmental impact
  • Fuel consumption patterns align with emission trends

📁 Project Structure

ML_Car_CO2_Emission_Linear_Regression_model/
│
├── model.py                 # Main analysis script
├── FuelConsumption.csv     # Dataset (auto-downloaded)
├── README.md               # Project documentation
└── .git/                   # Git repository files

🔧 Usage Example

# Import required libraries
from sklearn import linear_model
import pandas as pd
import numpy as np

# Load and preprocess data
df = pd.read_csv("FuelConsumption.csv")
df = df[['ENGINESIZE', 'CYLINDERS', 'FUELCONSUMPTION_COMB', 'CO2EMISSIONS']]

# Train model
regr = linear_model.LinearRegression()
train_x = np.asanyarray(train[['ENGINESIZE']])
train_y = np.asanyarray(train[['CO2EMISSIONS']])
regr.fit(train_x, train_y)

# Make predictions
predictions = regr.predict(test_x)

📋 Requirements

pandas>=1.3.0
numpy>=1.21.0
matplotlib>=3.4.0
scikit-learn>=1.0.0
requests>=2.25.0

🤝 Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/improvement)
  3. Commit changes (git commit -am 'Add new feature')
  4. Push to branch (git push origin feature/improvement)
  5. Create a Pull Request

👨‍💻 Author

Devansh Tomar

🙏 Acknowledgments

  • IBM Developer Skills Network for providing the dataset
  • Scikit-learn community for machine learning tools
  • Open source community for Python libraries

🔮 Future Enhancements

  • Multiple linear regression with additional features
  • Cross-validation for better model validation
  • Feature engineering for improved accuracy
  • Web application for interactive predictions
  • Comparison with other regression algorithms
  • Time series analysis for emission trends

Star this repository if you found it helpful!

About

A machine learning project that predicts car CO2 emissions based on engine size using linear regression. Features data visualization, model training, and performance evaluation with scikit-learn and matplotlib.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages