Skip to content

dennis20413/sql-sales-analytics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SQL Sales Analytics

A portfolio of analytical SQL on a realistic e-commerce dataset — the kind of queries asked in data-science and analyst interviews. Builds a SQLite database from a clean schema, populates it with synthetic sales data, and answers business questions with progressively more advanced SQL.

⚠️ Synthetic data, generated deterministically for reproducible results.


🗃️ Schema

A compact star-style schema:

customers ──< orders ──< order_items >── products
  • customers — id, name, country, signup_date
  • products — id, name, category, unit_price
  • orders — id, customer_id, order_date, status (Shipped/Pending/Returned)
  • order_items — id, order_id, product_id, quantity

📊 Query portfolio (sql/analytics_queries.sql)

# Question SQL skill
Q1 Revenue by product category Multi-table JOIN, GROUP BY, aggregation
Q2 Top 3 products per category RANK() OVER (PARTITION BY ...), CTE filtering
Q3 Monthly revenue + MoM growth Date bucketing, LAG(), growth rate
Q4 New-customer cohort retention Cohort logic, first-order date, CTEs
Q5 Customer RFM + value quartiles Per-customer metrics, NTILE() segmentation
Q6 Return rate by category Conditional aggregation with CASE

These cover the interview staples: joins, aggregation, window functions (RANK/LAG/NTILE), CTEs, cohort analysis, and conditional aggregation.

📈 Example output

Q1. Revenue by product category
   category  n_orders   revenue
Electronics       278 449234.85
       Home       288 136024.94
    Apparel       293  46703.05
       Toys       292  26235.62
      Books       305  18482.39

⚙️ Install & run

pip install -r requirements.txt

# Build the DB and run every query with results printed
python examples/run_queries.py
from build_db import build
import sqlite3, pandas as pd

conn = sqlite3.connect(build("sales.db"))
pd.read_sql_query("SELECT category, SUM(...) ... GROUP BY category", conn)

✅ Validation

python tests/test_queries.py

Confirms the database builds with data in every table, revenue aggregates are positive, and the SQLite build supports the window functions used.

🗂️ Project layout

sql/
  schema.sql              # tables, keys, indexes
  analytics_queries.sql   # the Q1-Q6 portfolio
src/
  build_db.py             # create + populate the SQLite database
examples/
  run_queries.py          # run all queries and print results
tests/
  test_queries.py         # build + query sanity checks

🔭 Roadmap

  • Add a customer lifetime-value (LTV) projection query
  • Add funnel / conversion analysis
  • Port queries to PostgreSQL syntax variant

Author: Dennis Chen (陳雲皓) · LinkedIn · GitHub

About

Analytical SQL portfolio on an e-commerce dataset — joins, window functions (RANK/LAG/NTILE), CTEs, cohort retention, and RFM segmentation.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages