Skip to content

immuhammad/housing-price-prediction-kaggel

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

House Price Prediction - Multivariable Regression

This project predicts house sale prices using multiple numerical features from the House Prices - Advanced Regression Dataset (Kaggle). The notebook explores data preprocessing, feature selection, regression modeling, and evaluation.

Project Overview

Goal: Predict house prices from numerical property features using regression techniques.

Dataset: House Prices - Advanced Regression (Kaggle)

Key Steps:

  1. Select numeric features
  2. Handle missing values
  3. Explore correlations with target (SalePrice)
  4. Scale features and compare models
  5. Train & evaluate Linear Regression models
  6. Visualize residuals, predictions, and feature importance

Key Components

1. Data Loading and Exploration

  • Loads training (1460 rows, 81 columns) and test (1459 rows, 80 columns) datasets
  • Initial exploration shows 35 numeric and 43 categorical features
  • Target variable: SalePrice (house sale price)

2. Target Variable Analysis

  • SalePrice distribution shows right-skewness
  • Statistical summary (mean: $180,921, std: $79,442)
  • Visualized using histogram with KDE plot

3. Outlier Detection

  • Identified outliers using GrLivArea vs SalePrice plot
  • Removed houses with GrLivArea > 4000 (extremely large houses with low prices)

4. Feature Engineering

  • Selected numerical features only
  • Handled missing values using SimpleImputer
  • Explored feature correlations with SalePrice

5. Model Training

  • Split data into train/test sets (80/20)
  • Scaled features using StandardScaler
  • Implemented:
    • Linear Regression
    • XGBoost (gradient boosting)

6. Evaluation Metrics

  • Mean Squared Error (MSE)
  • R² Score
  • Mean Absolute Error (MAE)

7. Visualization

  • Residual plots
  • Prediction vs actual values
  • Feature importance analysis

Technical Details

Libraries Used:

  • pandas, numpy: Data manipulation
  • matplotlib, seaborn: Visualization
  • scikit-learn: Preprocessing and machine learning
  • xgboost: Gradient boosting implementation

Key Techniques:

  • Correlation analysis for feature selection
  • Handling missing values with imputation
  • Feature scaling for model performance
  • Residual analysis for model diagnostics

How to Run

  1. Ensure required libraries are installed
  2. Place dataset files in ../data/ directory:
    • train.csv
    • test.csv
  3. Run the notebook sequentially
  4. Models will be trained and evaluated automatically

Competition Context

This project addresses the Kaggle House Prices competition, which challenges participants to predict residential home prices in Ames, Iowa using 79 explanatory features describing various aspects of the properties.

About

This project addresses the Kaggle House Prices competition, which challenges participants to predict residential home prices in Ames, Iowa using 79 explanatory features describing various aspects of the properties.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors