A portfolio of analytical SQL on a realistic e-commerce dataset — the kind of queries asked in data-science and analyst interviews. Builds a SQLite database from a clean schema, populates it with synthetic sales data, and answers business questions with progressively more advanced SQL.
⚠️ Synthetic data, generated deterministically for reproducible results.
A compact star-style schema:
customers ──< orders ──< order_items >── products
- customers — id, name, country, signup_date
- products — id, name, category, unit_price
- orders — id, customer_id, order_date, status (Shipped/Pending/Returned)
- order_items — id, order_id, product_id, quantity
| # | Question | SQL skill |
|---|---|---|
| Q1 | Revenue by product category | Multi-table JOIN, GROUP BY, aggregation |
| Q2 | Top 3 products per category | RANK() OVER (PARTITION BY ...), CTE filtering |
| Q3 | Monthly revenue + MoM growth | Date bucketing, LAG(), growth rate |
| Q4 | New-customer cohort retention | Cohort logic, first-order date, CTEs |
| Q5 | Customer RFM + value quartiles | Per-customer metrics, NTILE() segmentation |
| Q6 | Return rate by category | Conditional aggregation with CASE |
These cover the interview staples: joins, aggregation, window functions (RANK/LAG/NTILE), CTEs, cohort analysis, and conditional aggregation.
Q1. Revenue by product category
category n_orders revenue
Electronics 278 449234.85
Home 288 136024.94
Apparel 293 46703.05
Toys 292 26235.62
Books 305 18482.39
pip install -r requirements.txt
# Build the DB and run every query with results printed
python examples/run_queries.pyfrom build_db import build
import sqlite3, pandas as pd
conn = sqlite3.connect(build("sales.db"))
pd.read_sql_query("SELECT category, SUM(...) ... GROUP BY category", conn)python tests/test_queries.pyConfirms the database builds with data in every table, revenue aggregates are positive, and the SQLite build supports the window functions used.
sql/
schema.sql # tables, keys, indexes
analytics_queries.sql # the Q1-Q6 portfolio
src/
build_db.py # create + populate the SQLite database
examples/
run_queries.py # run all queries and print results
tests/
test_queries.py # build + query sanity checks
- Add a customer lifetime-value (LTV) projection query
- Add funnel / conversion analysis
- Port queries to PostgreSQL syntax variant