You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Backtesting a retail sentiment-driven trading strategy against SPY using Reddit r/WallStreetBets data and an ensemble NLP pipeline (VADER · Custom Lexicon · FinBERT).
Overview
This project investigates whether crowd sentiment on r/WallStreetBets contains exploitable signal for short-term SPY trading. The pipeline:
Collects r/WSB comments via the Reddit PRAW API.
Cleans & enriches text (entity extraction, ticker detection).
Scores sentiment using a 4-model ensemble:
VADER (rule-based NLP)
Simple keyword lexicon
Weighted financial lexicon
FinBERT (ProsusAI/finbert, transformer-based)
Merges sentiment signals with intraday SPY OHLCV data.
Backtests three sentiment-driven strategies against three baselines using Backtrader.
4. Configure Reddit API credentials (optional – only needed to re-scrape data)
cp .env.example .env
# Edit .env with your CLIENT_ID, CLIENT_SECRET, and DEV_NAME
Register a Reddit app at https://www.reddit.com/prefs/apps (choose script type).
Skip this step if you are using the pre-collected data already committed to data/.
TL;DR: Processed data (full_manual_spy.csv, manual_merged_df.csv, manual_spy_df.csv) is already
committed to data/. You can jump straight to notebooks/02_main.ipynb without running
the data-collection notebook or configuring any API credentials.
Results Snapshot
Backtest charts are saved under results/<regime>/ for each of the four market regimes evaluated.
The naming convention is:
Generating figures yourself: run notebooks/02_main.ipynb end-to-end. The notebooks call
run_and_plot(..., results_dir='../results/full') and
run_strategy(..., results_dir='../results/full') from src/strategy.py, and
plot_sentiment(df, ..., save_path='../results/sentiment_dist.png') /
plot_sector_sentiment_trends(df, save_path='../results/sector_trends.png') /
plot_sentiment_vs_price(df, save_path='../results/price_vs_sentiment.png') from
src/processing.py.