A data science project that models NBA game probabilities and compares them against bookmaker odds to identify market inefficiencies.
No real money is involved — this is purely a data science and modeling exercise.
Bookmakers set odds that imply a probability for each game outcome. Those implied probabilities always sum to more than 100% — the excess is the bookmaker's margin (the "vig"). This project:
- Pulls live NBA odds from multiple bookmakers via the Odds API
- Strips out the vig to get clean implied probabilities
- Computes a consensus probability across all books
- Builds independent models to estimate true win probabilities
- Compares model estimates against the market to find edges
1\sports_odds_analysis.ipynb # Day 1 (Stage 1 & 2)
2\nba_odds_analysis.ipynb # Day 2 (Stage 3 & 4)
3\nba_odds_analysis_3.ipynb # Day 3 (Stage 5)
app.py # Streamlit dashboard
requirements.txt # required modules
notes.txt # personal notes to recall and record important things I've learnt in this project
README.md
- Pull live NBA odds from The Odds API across 9+ bookmakers
- Convert decimal odds to implied probabilities
- Strip the vig and normalize to clean probabilities
- Compute consensus probability and bookmaker spread
- Visualize tonight's games with team colors
- Pull current NBA standings via nba_api
- Build model v1 using season-average net rating
- Build model v2 using recent form (last 10 games)
- Compare both models against market consensus
- Flag large gaps between model and market for manual review
- Built rolling standings to avoid look-ahead bias
- Simulated a full season of flat bets ($10 per game)
- Tracked ROI, max drawdown, and losing streaks
- Result: 70.1% accuracy, 305.9% ROI, 4.3% max drawdown
- Built attack and defense ratings for all 30 teams
- Modeled expected scores using a Poisson distribution
- Backtested against V1 and V2
- Result: 69.3% accuracy — more complex but not more accurate than V1
- Built a dynamic Elo rating system that updates after every game
- Tuned K factor and home advantage across 30 parameter combinations
- Result: 66.4% accuracy (best at K=15, home advantage=50)
| Model | Accuracy | Notes |
|---|---|---|
| V1 season average | 70.1% | simplest, most accurate |
| V3 Poisson | 69.3% | useful for score prediction |
| V4 Elo (tuned) | 66.4% | dynamic but noisy |
| V2 recent form | 65.4% | too sensitive to hot/cold streaks |
| Baseline (always home) | 54.9% |
Key finding: Occam's Razor holds. The simplest model (season-average net rating) outperforms all more complex approaches. In the NBA, 82-game averages are stable enough that dynamic updates add noise rather than signal.
A live Streamlit dashboard brings the full pipeline into a single interface. It pulls live odds and NBA data automatically and refreshes every 30 minutes.
- Tonight's games — for each game on the schedule, the dashboard shows the model's predicted win probability alongside the market consensus, highlights the edge, displays the best available odds across all books, and gives a plain-English verdict: worth considering, skip, or caution. The caution flag triggers automatically when the model-market gap exceeds 15%, which almost always indicates injuries or missing roster context.
- Team ratings — a full 30-team table sorted by Elo rating, with net points per game, win percentage, and a progress bar for Elo strength. A side panel surfaces the top five offenses, defenses, and biggest Elo movers of the season.
- Backtest — the full season simulation with a bankroll curve, accuracy, ROI, max drawdown, and win/loss count. The model leaderboard is shown at the bottom for context.
- Go to https://nbaoddsanalysis.streamlit.app
- Get your Odds API key from https://the-odds-api.com and enter it in the dashboard.
- Implied probability and vig removal
- Market consensus and bookmaker spread
- Net rating as a proxy for team strength
- Home court advantage modeling
- Sigmoid function for probability estimation
- Recent form vs season averages
- Look-ahead bias and rolling backtests
- Kelly Criterion for bet sizing
- Poisson distribution for score modeling
- Elo rating systems and K factor tuning
- Occam's Razor in model selection
- Why large model-market gaps signal missing context (injuries, rest, back-to-backs)
- The Odds API — live and historical bookmaker odds
- nba_api — NBA standings and game logs via the official NBA stats API
pip install requests pandas matplotlib scipy nba_apiGet a free API key at the-odds-api.com and add it to the notebook.
This project is for educational purposes only. Sports betting involves real financial risk. The models built here are simple and should not be used as the basis for actual betting decisions.
MIT License — see LICENSE file.