This project is a structured, end-to-end SQL analytics case study analyzing the Nashville housing market using Microsoft SQL Server.
The objective was to transform raw transactional housing data into a clean, analysis-ready dataset and extract meaningful business insights using advanced SQL techniques.
This repository emphasizes SQL-based analytical modeling rather than dashboard visualization.
- Structured data cleaning workflows
- Feature engineering
- Window functions (PERCENTILE_CONT, LAG, ROW_NUMBER)
- Market share modeling
- Time-series analysis
- Segmentation logic
- Business interpretation from raw data
The project follows a structured, production-style workflow:
Raw Dataset
↓
Data Profiling
↓
Data Cleaning
↓
Feature Engineering
↓
Analytical Modeling
↓
Business Insights
Each stage is separated into dedicated SQL files to reflect professional project organization and clear analytical layering.
nashville-housing-sql-analytics/
│
├── dataset/
│ ├── 01_database_setup.sql
│ ├── 02_table_creation.sql
│ ├── 03_data_profiling.sql
│ ├── 04_data_cleaning.sql
│ ├── 05_feature_engineering.sql
│ └── 06_analysis.sql
│
└── README.md
Key transformations performed:
- Standardizing categorical values (
SoldAsVacant) - Converting
SalePriceinto numeric format - Imputing missing
PropertyAddressvalues using self-join logic - Splitting composite address fields into structured columns
- Removing duplicate records using
ROW_NUMBER() - Creating a structured analytical table (
nashville_clean)
The resulting dataset is normalized and analysis-ready.
Engineered analytical features to support segmentation and time-series modelling:
SaleYear,SaleMonth,SaleQuarterPropertyAgeAtSalePropertyAgeGroupclassification- Time-based modeling fields
- Market share calculations
This step transformed raw transactional data into a structured analytical layer suitable for advanced modelling.
Performed trend and growth analysis using window functions.
PERCENTILE_CONTfor median pricingLAG()for Year-over-Year growth modeling- Partitioned aggregations for yearly summaries
- Strong expansion phase between 2013–2015
- Pricing growth outpaced volume growth in certain periods
- Post-2015 stabilization reflects cyclical normalization rather than structural collapse
- Median price was used over average due to right-skewed pricing distribution
The market demonstrates cyclical growth behavior consistent with normal economic patterns.
Segmented properties by lifecycle stage to understand pricing and demand concentration.
- ~42% of transactions occur in older resale inventory
- New construction and historic homes command pricing premiums
- Pricing dispersion varies significantly across age segments
- Older homes dominate volume, while newer homes dominate premium positioning
The housing market is structurally segmented by property age.
Analyzed transaction volume and pricing behavior across municipalities.
- Nashville accounts for ~71% of total transaction volume
- Suburban cities (e.g., Nolensville, Brentwood) command higher median prices
- Pricing dispersion varies materially across cities
- Urban core shows wider price spread due to mixed-income and luxury outliers
The market exhibits clear geographic segmentation rather than uniform pricing behavior.
This project showcases strong SQL depth including:
- Common Table Expressions (CTEs)
- Window Functions:
ROW_NUMBER()PERCENTILE_CONT()LAG()
- Partitioned aggregations
- Market share modeling using window sums
- Duplicate detection via partition logic
- Time-series growth modeling
- Structured query layering
- Is the housing market growing?
- Is pricing increasing?
- Is transaction volume increasing?
- Which property age segments drive demand?
- Which age segments command pricing premiums?
- Is the market geographically concentrated?
- Does pricing distribution vary across locations?
Each question is answered using structured SQL modeling and analytical reasoning.
This project demonstrates the ability to:
- Transform messy raw data into structured analytical datasets
- Apply statistical reasoning within SQL
- Design production-grade query architecture
- Use window functions effectively
- Derive meaningful business insights from transactional data
This repository serves as a SQL depth showcase, emphasizing analytical thinking and structured query design rather than visualization tools.