GitHub - mattspooner1/buff_project: Predicting buffs and nerfs in League of Legends

Project Overview

This is a data science project analyzing League of Legends champion balance changes (buffs and nerfs) across Season 11. The project uses machine learning to predict which champions are likely to receive balance changes based on their performance statistics across different rank tiers.

Project Goal: Predict champion buffs/nerfs using historical performance data (win rate, pick rate, ban rate) across different skill levels.

Key Achievement: Decision Tree classifier achieving 82.6% accuracy in predicting balance changes.

Repository Structure

The project follows standard data science best practices with clear separation between raw data, processed data, analysis notebooks, and outputs.

buff_project/
├── data/                                    # All data files
│   ├── raw/                                 # Immutable original data (never modify)
│   │   ├── season_11/                       # Season 11 raw data by patch
│   │   │   ├── Challenger/                  # 11.1-11.17 challenger tier CSVs
│   │   │   ├── irontogold/                  # 11.1-11.19 low rank CSVs
│   │   │   ├── plattogm/                    # Platinum to Grandmaster CSVs
│   │   │   ├── changes/                     # 11.1-11.19 patch change CSVs
│   │   │   ├── latestpatch/                 # Most recent patch data
│   │   │   └── combined/                    # Alternative combined datasets
│   │   └── historic/                        # Legacy data (Patches 8.24, 9.24, 10.25)
│   │       ├── [patch]challenger.csv
│   │       ├── [patch]changes.csv
│   │       └── [patch][rank-range].csv
│   ├── processed/                           # Cleaned and merged datasets
│   │   ├── S11combined.csv                  # Combined Season 11 (all ranks)
│   │   ├── S11challenger.csv                # Challenger tier only
│   │   ├── S11irontogm.csv                  # Iron to Grandmaster
│   │   ├── S11buffandnerf.csv               # Labeled dataset for ML training
│   │   ├── 11.18combined.csv                # Patch 11.18 specific combined
│   │   └── allrankseason11.csv              # All ranks aggregated
│   └── reference/                           # Reference and lookup files
│       └── champs.csv                       # Master list of all League champions
│
├── notebooks/                               # Jupyter notebooks organized by pipeline stage
│   ├── 01_data_collection/                  # Web scraping and data acquisition
│   │   ├── champ_historic_data_webscraping.ipynb
│   │   └── patch_history_webscraping.ipynb
│   ├── 02_data_cleaning/                    # Data preprocessing and merging
│   │   └── data_cleaning.ipynb
│   └── 03_modeling/                         # Model training and evaluation
│       ├── decision_tree_modeling.ipynb     # Primary model (82.6% accuracy)
│       ├── SVM model.ipynb                  # Alternative SVM classifier
│       └── Untitled.ipynb                   # Experimental/scratch notebook
│
├── models/                                  # Trained models and model artifacts
│
├── reports/                                 # Generated analysis and documentation
│   └── figures/                             # Visualizations and plots
│       └── bufftree.png                     # Decision tree visualization (971KB)
│
├── README.md                                # Project documentation

Data Pipeline & Workflow

1. Data Collection (Web Scraping)

File: notebooks/01_data_collection/champ_historic_data_webscraping.ipynb

Source: metasrc.com
Function: getdataframe(patch) - Scrapes champion statistics for a given patch
Rank Tiers Scraped:
- Iron/Bronze/Silver/Gold (low tier)
- Platinum/Diamond/Master/Grandmaster (high tier)
- Challenger (elite tier)
Data Collected: Champion name, role, win rate, ban rate, pick rate
Output Format: Separate CSVs for each rank tier and patch
Note: Champion names are duplicated in HTML and need to be split (uses champ[len(champ)/2:])

File: notebooks/01_data_collection/patch_history_webscraping.ipynb

Source: pcgamesn.com
Function: getchanges(patch) - Identifies buffs/nerfs from patch notes
Change Types:
- buff - Champion received buffs
- nerf - Champion received nerfs
- tweak - Champion changed but not explicitly buffed/nerfed
- no change - Champion unchanged
Uses: data/reference/champs.csv as reference list to check against patch notes

2. Data Cleaning & Merging

File: notebooks/02_data_cleaning/data_cleaning.ipynb

Function: dataclean(patch) - Merges champion stats with patch change data
Process:
1. Load three rank-tier CSVs for a patch
2. Load next patch's changes (to see what happened AFTER these stats)
3. Merge stats with change labels using champion name
4. Output labeled datasets
Function: dataclean2(patch) - Combines all three rank tiers into single dataset
Key Logic: Stats from patch N are labeled with changes from patch N+1 (predictive modeling)

3. Model Training

File: notebooks/03_modeling/decision_tree_modeling.ipynb (Primary Model)

Algorithm: Decision Tree Classifier (sklearn)
Features:
- winrate - Champion win percentage
- rank - Encoded rank tier (0=irontogold, 1=plattogm)
Target: change - Buff/nerf/no change classification
Hyperparameters:
- criterion="entropy" - Information gain splitting
- max_depth=8 - Prevents overfitting
Performance: 82.6% accuracy on test set
Train/Test Split: 70/30, random_state=3
Visualization: Exports decision tree to reports/figures/bufftree.png

File: notebooks/03_modeling/SVM model.ipynb (Alternative Model)

Algorithm: Support Vector Machine with RBF kernel
Features: winrate, rank, banrate, pickrate (more comprehensive)
Target: Binary classification (buff=2, nerf=4)
Performance:
- F1-score: 0.7284
- Jaccard score: 0.6864
Observation: Better at predicting buffs than nerfs

Data Schema

Champion Statistics CSV Format

champ,role,winrate,banrate,pickrate,change
Aatrox,TOP,48.17,4.48,4.68,no change

Columns:

champ: Champion name (string)
role: TOP, JUNGLE, MID, ADC, SUPPORT
winrate: Win percentage as float (e.g., 48.17 = 48.17%)
banrate: Ban percentage as float
pickrate: Pick percentage as float
rank: Rank tier (irontogold, plattogm, challenger) - added during merging
change: buff, nerf, tweak, no change - label for ML

Patch Naming Convention

Format: [season].[patch_number] (e.g., 11.18 = Season 11, Patch 18)
Range in this project: Patches 11.1 through 11.19

Technologies Used

Python 3.x
Data Collection: requests, BeautifulSoup (html5lib parser)
Data Processing: pandas, numpy
Machine Learning: scikit-learn
- DecisionTreeClassifier
- SVM (Support Vector Machines)
- LabelEncoder for categorical features
Visualization: matplotlib, pydotplus (decision tree graphs)
Development: Jupyter Notebook

Development Conventions

Code Style

Function Naming: Lowercase with no underscores (e.g., getdataframe, getchanges)
DataFrame Operations: Uses deprecated df.append() - consider updating to pd.concat()
File Naming: [patch][datatype].csv pattern (e.g., 11.18challenger.csv)
String Replacement: Uses regex replace to clean percentage signs from scraped data

Data Quality Considerations

Champion Name Parsing: Web scraping duplicates champion names in HTML; code splits at midpoint
Rate Limiting: pcgamesn.com may return "too many requests" error - add delays if scraping bulk data
Missing Data: Some champion/role combinations may not exist in certain rank tiers
Label Alignment: Patch N stats are labeled with patch N+1 changes (intentional for prediction)

Known Issues & Model Limitations

From SVM model.ipynb analysis (cell 12 markdown):

"The model is good at predicting when a champ is likely to be buffed - winrate, playrate and banrate are good indicators. But nerfing is more complex."

Why Nerf Prediction is Harder:

Complexity: Some champions are kept at low win rates intentionally (high skill ceiling)
Pro Play Impact: Champions strong in professional play get nerfed despite average solo queue stats
Role-Specific Nerfs: A champion might be overpowered in one role but balanced overall
Mastery Curves: Win rate among champion experts differs from general population

Suggested Improvements:

Add professional play statistics
Include win rate among champion masters (high games played)
Track win rates across all five roles separately
Consider pick/ban rates in professional matches

Common Tasks for AI Assistants

Running the Full Data Pipeline

# 1. Scrape champion data for a patch
exec(open('notebooks/01_data_collection/champ_historic_data_webscraping.ipynb').read())
getdataframe('11.19')  # Creates irontogold, plattogm, challenger CSVs

# 2. Scrape patch changes
exec(open('notebooks/01_data_collection/patch_history_webscraping.ipynb').read())
getchanges('1119')  # Creates changes CSV

# 3. Clean and merge data
exec(open('notebooks/02_data_cleaning/data_cleaning.ipynb').read())
dataclean(19)  # Merges stats with changes from patch 20

Training a New Model

# Decision Tree approach
from decision_tree_modeling import *
# Data is already split and model trained in notebook
# Accuracy available via: metrics.accuracy_score(y_testset, predTree)

# SVM approach
from SVM_model import *
# Uses more features (winrate, rank, banrate, pickrate)

Adding New Champion Data

Add champion name to data/reference/champs.csv (one name per line)
Re-run web scraping functions to collect stats
Re-run data cleaning to generate merged datasets
Retrain models with updated data

Updating to New Season

Update URL patterns in scraping notebooks (change season number)
Create new directory: data/raw/season_[N]/
Update file paths in cleaning notebooks
Verify champion list is current (new champions released)

Important Notes for AI Assistants

README.md is Empty: Project documentation should be added there for users
Deprecated pandas Methods: Code uses df.append() which is deprecated in pandas 2.0+
Web Scraping Reliability: URLs and HTML structure may change; verify sources are still accessible
Rate Limits: Be respectful of web scraping sources; add delays between requests
Data Freshness: Project focuses on Season 11 (2021); League of Legends is actively updated
Checkpoint Files: .ipynb_checkpoints/ contains auto-saved notebook versions
Image Files: reports/figures/bufftree.png is a large (971KB) decision tree visualization

Git Workflow

Branch Naming: Use claude/ prefix for AI-generated branches
Commit Style: Project has minimal commit history; use descriptive messages
Recent Commits: "added some commentary to model after 4 years!" suggests dormant project

Testing & Validation

Models use random_state=3 for reproducibility
Test set size: 30% of data (decision tree), 20% (SVM)
No unit tests present; validation is done through notebook outputs
Confusion matrices available in SVM notebook for detailed performance analysis

Future Enhancements

Based on analysis in the notebooks, consider:

Feature Engineering: Add professional play statistics, champion mastery curves
Multi-Class Classification: Better handling of buff/nerf/tweak/no change (currently binary)
Time Series Analysis: Predict multiple patches ahead
API Integration: Use Riot Games API instead of web scraping for more reliable data
Role-Specific Models: Separate models for each role (TOP, JUNGLE, etc.)
Automated Pipeline: Schedule regular data collection and model retraining
Web Dashboard: Visualize predictions and model confidence

Quick Reference: File Locations

Master champion list: data/reference/champs.csv
Latest combined dataset: data/processed/S11combined.csv
Primary model notebook: notebooks/03_modeling/decision_tree_modeling.ipynb
Raw Season 11 data: data/raw/season_11/
Model visualization: reports/figures/bufftree.png
Training data for SVM: data/processed/S11buffandnerf.csv

Last Updated: 2025-12-10 Project Status: Research/Analysis (dormant since ~4 years ago based on commit messages) Contact: See git commit history for original author

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Overview

Repository Structure

Data Pipeline & Workflow

1. Data Collection (Web Scraping)

2. Data Cleaning & Merging

3. Model Training

Data Schema

Champion Statistics CSV Format

Patch Naming Convention

Technologies Used

Development Conventions

Code Style

Data Quality Considerations

Known Issues & Model Limitations

Common Tasks for AI Assistants

Running the Full Data Pipeline

Training a New Model

Adding New Champion Data

Updating to New Season

Important Notes for AI Assistants

Git Workflow

Testing & Validation

Future Enhancements

Quick Reference: File Locations

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.ipynb_checkpoints		.ipynb_checkpoints
data		data
notebooks		notebooks
reports/figures		reports/figures
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Project Overview

Repository Structure

Data Pipeline & Workflow

1. Data Collection (Web Scraping)

2. Data Cleaning & Merging

3. Model Training

Data Schema

Champion Statistics CSV Format

Patch Naming Convention

Technologies Used

Development Conventions

Code Style

Data Quality Considerations

Known Issues & Model Limitations

Common Tasks for AI Assistants

Running the Full Data Pipeline

Training a New Model

Adding New Champion Data

Updating to New Season

Important Notes for AI Assistants

Git Workflow

Testing & Validation

Future Enhancements

Quick Reference: File Locations

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages