CO₂ Car Emissions Prediction

🏗️ Project Context

This was a research project developed for the second term of the Post-Degree Diploma in Data Analytics in Langara College.

📌 Project Overview

This project builds linear regression models in R to predict car CO₂ emissions (grams/km) using the Government of Canada’s Fuel Consumption dataset. The analysis includes descriptive statistics, feature evaluation, and regression modeling to identify the best predictors of emissions and compare model performance.

🎯 Objective

Analyze fuel consumption, engine size, cylinders, transmission type, and fuel type to understand their relationship with CO₂ emissions.
Develop regression models to predict emissions and assess their accuracy.
Compare stepwise, forward, backward, and subset selection methods to optimize the model.

🛠 Tools & Technologies

Language: R
Libraries: MASS, leaps, ggplot2, dplyr
Techniques: Stepwise regression, subset selection, interaction effects, model validation

📊 Key Steps

Data Preparation
- Cleaned and filtered the Government of Canada dataset.
- Split into training (80%) and testing (20%) sets.
Descriptive Analysis
- Explored numerical (engine size, cylinders, consumption) and categorical (fuel type, transmission, class, gears) variables.
- Identified strong correlations (e.g., CO₂ vs fuel consumption: 0.935).
Modeling
- Built multiple regression models (stepwise, forward, backward, exhaustive search).
- Tested in total 9 models with and without interaction terms.
- Compared Model number 8 (full model) vs. Model number 9 (reduced model).
Evaluation
- Achieved high predictive accuracy (Adjusted R² ≈ 0.9992).
- Residual analysis confirmed improved variance stability after including interaction terms.

🚀 Results

The full model, which included consumption, fuel type, year, gears, transmission, engine, and the interaction term (consumption × fuel), emerged as the strongest predictor of CO₂ emissions (RMSE = 1.83, R² ≈ 0.9992).
The reduced model, using only consumption, fuel type, and their interaction, offered a simpler structure while still maintaining high accuracy (RMSE = 2.47, R² = 0.9985), making it the most effective for practical use.
Demonstrated that smaller models can still deliver reliable predictive performance.

📂 Repository Structure

├── Data/              # Raw and cleaned datasets (train/test splits)
├── Src/           # R scripts for analysis and modelling
├── Documentation/f   # Full project report
└── README.md          # Project documentation

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
Data		Data
Documentation		Documentation
Src		Src
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CO₂ Car Emissions Prediction

🏗️ Project Context

📌 Project Overview

🎯 Objective

🛠 Tools & Technologies

📊 Key Steps

🚀 Results

📂 Repository Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

CO₂ Car Emissions Prediction

🏗️ Project Context

📌 Project Overview

🎯 Objective

🛠 Tools & Technologies

📊 Key Steps

🚀 Results

📂 Repository Structure

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages