This is a data science project analyzing League of Legends champion balance changes (buffs and nerfs) across Season 11. The project uses machine learning to predict which champions are likely to receive balance changes based on their performance statistics across different rank tiers.
Project Goal: Predict champion buffs/nerfs using historical performance data (win rate, pick rate, ban rate) across different skill levels.
Key Achievement: Decision Tree classifier achieving 82.6% accuracy in predicting balance changes.
The project follows standard data science best practices with clear separation between raw data, processed data, analysis notebooks, and outputs.
buff_project/
├── data/ # All data files
│ ├── raw/ # Immutable original data (never modify)
│ │ ├── season_11/ # Season 11 raw data by patch
│ │ │ ├── Challenger/ # 11.1-11.17 challenger tier CSVs
│ │ │ ├── irontogold/ # 11.1-11.19 low rank CSVs
│ │ │ ├── plattogm/ # Platinum to Grandmaster CSVs
│ │ │ ├── changes/ # 11.1-11.19 patch change CSVs
│ │ │ ├── latestpatch/ # Most recent patch data
│ │ │ └── combined/ # Alternative combined datasets
│ │ └── historic/ # Legacy data (Patches 8.24, 9.24, 10.25)
│ │ ├── [patch]challenger.csv
│ │ ├── [patch]changes.csv
│ │ └── [patch][rank-range].csv
│ ├── processed/ # Cleaned and merged datasets
│ │ ├── S11combined.csv # Combined Season 11 (all ranks)
│ │ ├── S11challenger.csv # Challenger tier only
│ │ ├── S11irontogm.csv # Iron to Grandmaster
│ │ ├── S11buffandnerf.csv # Labeled dataset for ML training
│ │ ├── 11.18combined.csv # Patch 11.18 specific combined
│ │ └── allrankseason11.csv # All ranks aggregated
│ └── reference/ # Reference and lookup files
│ └── champs.csv # Master list of all League champions
│
├── notebooks/ # Jupyter notebooks organized by pipeline stage
│ ├── 01_data_collection/ # Web scraping and data acquisition
│ │ ├── champ_historic_data_webscraping.ipynb
│ │ └── patch_history_webscraping.ipynb
│ ├── 02_data_cleaning/ # Data preprocessing and merging
│ │ └── data_cleaning.ipynb
│ └── 03_modeling/ # Model training and evaluation
│ ├── decision_tree_modeling.ipynb # Primary model (82.6% accuracy)
│ ├── SVM model.ipynb # Alternative SVM classifier
│ └── Untitled.ipynb # Experimental/scratch notebook
│
├── models/ # Trained models and model artifacts
│
├── reports/ # Generated analysis and documentation
│ └── figures/ # Visualizations and plots
│ └── bufftree.png # Decision tree visualization (971KB)
│
├── README.md # Project documentation
File: notebooks/01_data_collection/champ_historic_data_webscraping.ipynb
- Source: metasrc.com
- Function:
getdataframe(patch)- Scrapes champion statistics for a given patch - Rank Tiers Scraped:
- Iron/Bronze/Silver/Gold (low tier)
- Platinum/Diamond/Master/Grandmaster (high tier)
- Challenger (elite tier)
- Data Collected: Champion name, role, win rate, ban rate, pick rate
- Output Format: Separate CSVs for each rank tier and patch
- Note: Champion names are duplicated in HTML and need to be split (uses
champ[len(champ)/2:])
File: notebooks/01_data_collection/patch_history_webscraping.ipynb
- Source: pcgamesn.com
- Function:
getchanges(patch)- Identifies buffs/nerfs from patch notes - Change Types:
buff- Champion received buffsnerf- Champion received nerfstweak- Champion changed but not explicitly buffed/nerfedno change- Champion unchanged
- Uses:
data/reference/champs.csvas reference list to check against patch notes
File: notebooks/02_data_cleaning/data_cleaning.ipynb
- Function:
dataclean(patch)- Merges champion stats with patch change data - Process:
- Load three rank-tier CSVs for a patch
- Load next patch's changes (to see what happened AFTER these stats)
- Merge stats with change labels using champion name
- Output labeled datasets
- Function:
dataclean2(patch)- Combines all three rank tiers into single dataset - Key Logic: Stats from patch N are labeled with changes from patch N+1 (predictive modeling)
File: notebooks/03_modeling/decision_tree_modeling.ipynb (Primary Model)
- Algorithm: Decision Tree Classifier (sklearn)
- Features:
winrate- Champion win percentagerank- Encoded rank tier (0=irontogold, 1=plattogm)
- Target:
change- Buff/nerf/no change classification - Hyperparameters:
criterion="entropy"- Information gain splittingmax_depth=8- Prevents overfitting
- Performance: 82.6% accuracy on test set
- Train/Test Split: 70/30, random_state=3
- Visualization: Exports decision tree to
reports/figures/bufftree.png
File: notebooks/03_modeling/SVM model.ipynb (Alternative Model)
- Algorithm: Support Vector Machine with RBF kernel
- Features:
winrate,rank,banrate,pickrate(more comprehensive) - Target: Binary classification (buff=2, nerf=4)
- Performance:
- F1-score: 0.7284
- Jaccard score: 0.6864
- Observation: Better at predicting buffs than nerfs
champ,role,winrate,banrate,pickrate,change
Aatrox,TOP,48.17,4.48,4.68,no changeColumns:
champ: Champion name (string)role: TOP, JUNGLE, MID, ADC, SUPPORTwinrate: Win percentage as float (e.g., 48.17 = 48.17%)banrate: Ban percentage as floatpickrate: Pick percentage as floatrank: Rank tier (irontogold, plattogm, challenger) - added during mergingchange: buff, nerf, tweak, no change - label for ML
- Format:
[season].[patch_number](e.g.,11.18= Season 11, Patch 18) - Range in this project: Patches 11.1 through 11.19
- Python 3.x
- Data Collection: requests, BeautifulSoup (html5lib parser)
- Data Processing: pandas, numpy
- Machine Learning: scikit-learn
- DecisionTreeClassifier
- SVM (Support Vector Machines)
- LabelEncoder for categorical features
- Visualization: matplotlib, pydotplus (decision tree graphs)
- Development: Jupyter Notebook
- Function Naming: Lowercase with no underscores (e.g.,
getdataframe,getchanges) - DataFrame Operations: Uses deprecated
df.append()- consider updating topd.concat() - File Naming:
[patch][datatype].csvpattern (e.g.,11.18challenger.csv) - String Replacement: Uses regex replace to clean percentage signs from scraped data
- Champion Name Parsing: Web scraping duplicates champion names in HTML; code splits at midpoint
- Rate Limiting: pcgamesn.com may return "too many requests" error - add delays if scraping bulk data
- Missing Data: Some champion/role combinations may not exist in certain rank tiers
- Label Alignment: Patch N stats are labeled with patch N+1 changes (intentional for prediction)
From SVM model.ipynb analysis (cell 12 markdown):
"The model is good at predicting when a champ is likely to be buffed - winrate, playrate and banrate are good indicators. But nerfing is more complex."
Why Nerf Prediction is Harder:
- Complexity: Some champions are kept at low win rates intentionally (high skill ceiling)
- Pro Play Impact: Champions strong in professional play get nerfed despite average solo queue stats
- Role-Specific Nerfs: A champion might be overpowered in one role but balanced overall
- Mastery Curves: Win rate among champion experts differs from general population
Suggested Improvements:
- Add professional play statistics
- Include win rate among champion masters (high games played)
- Track win rates across all five roles separately
- Consider pick/ban rates in professional matches
# 1. Scrape champion data for a patch
exec(open('notebooks/01_data_collection/champ_historic_data_webscraping.ipynb').read())
getdataframe('11.19') # Creates irontogold, plattogm, challenger CSVs
# 2. Scrape patch changes
exec(open('notebooks/01_data_collection/patch_history_webscraping.ipynb').read())
getchanges('1119') # Creates changes CSV
# 3. Clean and merge data
exec(open('notebooks/02_data_cleaning/data_cleaning.ipynb').read())
dataclean(19) # Merges stats with changes from patch 20# Decision Tree approach
from decision_tree_modeling import *
# Data is already split and model trained in notebook
# Accuracy available via: metrics.accuracy_score(y_testset, predTree)
# SVM approach
from SVM_model import *
# Uses more features (winrate, rank, banrate, pickrate)- Add champion name to
data/reference/champs.csv(one name per line) - Re-run web scraping functions to collect stats
- Re-run data cleaning to generate merged datasets
- Retrain models with updated data
- Update URL patterns in scraping notebooks (change season number)
- Create new directory:
data/raw/season_[N]/ - Update file paths in cleaning notebooks
- Verify champion list is current (new champions released)
- README.md is Empty: Project documentation should be added there for users
- Deprecated pandas Methods: Code uses
df.append()which is deprecated in pandas 2.0+ - Web Scraping Reliability: URLs and HTML structure may change; verify sources are still accessible
- Rate Limits: Be respectful of web scraping sources; add delays between requests
- Data Freshness: Project focuses on Season 11 (2021); League of Legends is actively updated
- Checkpoint Files:
.ipynb_checkpoints/contains auto-saved notebook versions - Image Files:
reports/figures/bufftree.pngis a large (971KB) decision tree visualization
- Branch Naming: Use
claude/prefix for AI-generated branches - Commit Style: Project has minimal commit history; use descriptive messages
- Recent Commits: "added some commentary to model after 4 years!" suggests dormant project
- Models use
random_state=3for reproducibility - Test set size: 30% of data (decision tree), 20% (SVM)
- No unit tests present; validation is done through notebook outputs
- Confusion matrices available in SVM notebook for detailed performance analysis
Based on analysis in the notebooks, consider:
- Feature Engineering: Add professional play statistics, champion mastery curves
- Multi-Class Classification: Better handling of buff/nerf/tweak/no change (currently binary)
- Time Series Analysis: Predict multiple patches ahead
- API Integration: Use Riot Games API instead of web scraping for more reliable data
- Role-Specific Models: Separate models for each role (TOP, JUNGLE, etc.)
- Automated Pipeline: Schedule regular data collection and model retraining
- Web Dashboard: Visualize predictions and model confidence
- Master champion list:
data/reference/champs.csv - Latest combined dataset:
data/processed/S11combined.csv - Primary model notebook:
notebooks/03_modeling/decision_tree_modeling.ipynb - Raw Season 11 data:
data/raw/season_11/ - Model visualization:
reports/figures/bufftree.png - Training data for SVM:
data/processed/S11buffandnerf.csv
Last Updated: 2025-12-10 Project Status: Research/Analysis (dormant since ~4 years ago based on commit messages) Contact: See git commit history for original author