WallStreetBets Sentiment Trading Strategy

Backtesting a retail sentiment-driven trading strategy against SPY using Reddit r/WallStreetBets data and an ensemble NLP pipeline (VADER · Custom Lexicon · FinBERT).

Overview

This project investigates whether crowd sentiment on r/WallStreetBets contains exploitable signal for short-term SPY trading. The pipeline:

Collects r/WSB comments via the Reddit PRAW API.
Cleans & enriches text (entity extraction, ticker detection).
Scores sentiment using a 4-model ensemble:
- VADER (rule-based NLP)
- Simple keyword lexicon
- Weighted financial lexicon
- FinBERT (ProsusAI/finbert, transformer-based)
Merges sentiment signals with intraday SPY OHLCV data.
Backtests three sentiment-driven strategies against three baselines using Backtrader.

Repository Structure

wallstreetbets-sentiment-trading-strategy/
│
├── docs/
│   ├── Final Presentation.pdf      # Full project write-up & results
│   └── Midterm Presentation.pdf    # Intermediate progress report
│
├── notebooks/
│   ├── 01_data_collection.ipynb    # Reddit scraper (PRAW) + SPY fetcher (yfinance) – optional
│   ├── 02_main.ipynb               # End-to-end: merge → analyse → backtest
│   └── 03_algo.ipynb               # Experimental algorithmic trading sandbox
│
├── src/                            # Python source package
│   ├── __init__.py
│   ├── processing.py               # NLP pipeline (cleaning, entity extraction,
│   │                               #   sentiment scoring, visualisation helpers)
│   └── strategy.py                 # Backtrader strategy classes + runner helpers
│
├── data/
│   ├── full_manual_spy.csv         # ✅ committed – merged SPY price data
│   ├── manual_merged_df.csv        # ✅ committed – merged sentiment + price data
│   ├── manual_spy_df.csv           # ✅ committed – processed SPY price data
│   ├── reddit/                     # raw daily WSB comment CSVs  (gitignored – re-scrape if needed)
│   └── spy/                        # raw intraday SPY CSVs       (gitignored – re-fetch if needed)
│
├── results/
│   ├── bear/                       # Backtest charts – bear market regime
│   ├── bull/                       # Backtest charts – bull market regime
│   ├── flat/                       # Backtest charts – flat/sideways regime
│   └── full/                       # Backtest charts – full evaluation period
│
├── requirements.txt
├── .env.example                    # ← Copy to .env and fill in credentials
└── .gitignore

Strategies Tested

#	Strategy	Description
B1	Buy & Hold	Buy once, hold for the full period
B2	Dollar Cost Averaging	Invest 5% of cash every trading day
B3	Technical Analysis	EMA-5 + RSI-14 crossover signals
S1	Pure Sentiment	Buy when FinBERT score > 0.2; sell < -0.2
S2	Sentiment + TA	Buy when price > EMA-20 AND sentiment > 0.1
S3	Inverse Sentiment (Contrarian)	Fade the crowd: buy on panic, sell on euphoria

Sentiment Ensemble

The ensemble score is the mean of four independent signals:

ensemble = (VADER + Lexicon + FinLex + FinBERT) / 4

Model	Approach	Handles WallStreetBets Slang
VADER	Rule-based polarity	✓ (general)
Keyword Lexicon	Bullish/bearish word sets	✓ (custom)
Financial Lexicon	Weighted domain vocabulary	✓✓ (WSB-specific words)
FinBERT	Transformer (ProsusAI)	✓✓✓ (fine-tuned on financial text)

Quickstart

1. Clone the repo

git clone https://github.com/<your-username>/wallstreetbets-sentiment-trading-strategy.git
cd wallstreetbets-sentiment-trading-strategy

2. Create a virtual environment

python -m venv .venv
source .venv/bin/activate      # Windows: .venv\Scripts\activate

3. Install dependencies

pip install -r requirements.txt
python -m spacy download en_core_web_lg

Apple Silicon / CUDA users: Install the appropriate PyTorch wheel first: https://pytorch.org/get-started/locally/

4. Configure Reddit API credentials (optional – only needed to re-scrape data)

cp .env.example .env
# Edit .env with your CLIENT_ID, CLIENT_SECRET, and DEV_NAME

Register a Reddit app at https://www.reddit.com/prefs/apps (choose script type).
Skip this step if you are using the pre-collected data already committed to data/.

5. Run the notebooks

Notebook	Purpose	Required?
`notebooks/01_data_collection.ipynb`	Re-scrape WSB comments (PRAW) and/or re-fetch SPY bars (yfinance)	Optional
`notebooks/02_main.ipynb`	Full analysis pipeline and backtests	Yes
`notebooks/03_algo.ipynb`	⚠️ Non-functional — experimental IBKR TWS live paper-trading POC (requires TWS + ibapi)	Optional

TL;DR: Processed data (full_manual_spy.csv, manual_merged_df.csv, manual_spy_df.csv) is already committed to data/. You can jump straight to notebooks/02_main.ipynb without running the data-collection notebook or configuring any API credentials.

Results Snapshot

Backtest charts are saved under results/<regime>/ for each of the four market regimes evaluated.
The naming convention is:

File pattern	Description
`Baseline: Buy and Hold-backtest.png`	Buy-and-hold benchmark
`Baseline: Dollar Cost Averaging-backtest.png`	DCA benchmark
`Baseline: Technical Analysis (EMA, BB, RSI, VWAP)-backtest.png`	TA benchmark
`Strategy 1: Pure Sentiment-backtest.png`	S1 – FinBERT score only
`Strategy 2: Sentiment + Technical Analysis-backtest.png`	S2 – FinBERT + EMA/RSI
`Strategy 3: Inverse Sentiment (Contrarian)-backtest.png`	S3 – Contrarian

Full Period

Buy & Hold	Dollar Cost Averaging

Technical Analysis	Pure Sentiment

Sentiment + TA	Inverse Sentiment

Bull Regime

Buy & Hold	Dollar Cost Averaging

Technical Analysis	Pure Sentiment

Sentiment + TA	Inverse Sentiment

Bear Regime

Buy & Hold	Dollar Cost Averaging

Technical Analysis	Pure Sentiment

Sentiment + TA	Inverse Sentiment

Flat/Sideways Regime

Buy & Hold	Dollar Cost Averaging

Technical Analysis	Pure Sentiment

Sentiment + TA	Inverse Sentiment

Generating figures yourself: run notebooks/02_main.ipynb end-to-end. The notebooks call run_and_plot(..., results_dir='../results/full') and run_strategy(..., results_dir='../results/full') from src/strategy.py, and plot_sentiment(df, ..., save_path='../results/sentiment_dist.png') / plot_sector_sentiment_trends(df, save_path='../results/sector_trends.png') / plot_sentiment_vs_price(df, save_path='../results/price_vs_sentiment.png') from src/processing.py.

Dependencies

See requirements.txt for the full list. Key packages:

NLP: nltk, spacy, transformers (FinBERT via HuggingFace)
Data: pandas, numpy, praw
Backtesting: backtrader
Visualisation: matplotlib, wordcloud

Presentations

docs/Final Presentation.pdf — full project write-up and results
docs/Midterm Presentation.pdf — intermediate progress report

License

This project is for educational and research purposes only. Nothing here constitutes financial advice.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WallStreetBets Sentiment Trading Strategy

Overview

Repository Structure

Strategies Tested

Sentiment Ensemble

Quickstart

1. Clone the repo

2. Create a virtual environment

3. Install dependencies

4. Configure Reddit API credentials (optional – only needed to re-scrape data)

5. Run the notebooks

Results Snapshot

Full Period

Bull Regime

Bear Regime

Flat/Sideways Regime

Dependencies

Presentations

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
data		data
docs		docs
notebooks		notebooks
results		results
src		src
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

WallStreetBets Sentiment Trading Strategy

Overview

Repository Structure

Strategies Tested

Sentiment Ensemble

Quickstart

1. Clone the repo

2. Create a virtual environment

3. Install dependencies

4. Configure Reddit API credentials (optional – only needed to re-scrape data)

5. Run the notebooks

Results Snapshot

Full Period

Bull Regime

Bear Regime

Flat/Sideways Regime

Dependencies

Presentations

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages