Skip to content

apy-h/financial-sentiment-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

34 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“ˆ Finance Sentiment Analysis Dashboard

A production-quality financial sentiment intelligence platform that analyzes Reddit discussions using state-of-the-art FinBERT AI to provide real-time market sentiment insights across stocks, industries, and sectors.


✨ Features

πŸ€– AI-Powered Sentiment Analysis

  • FinBERT Model - Specialized BERT model fine-tuned for financial sentiment analysis (ProsusAI/finbert)
  • Real-time classification into Positive, Neutral, and Negative sentiment with confidence scores
  • Context-aware analysis understanding financial terminology and market language

πŸ“Š Comprehensive Market Intelligence

  • Market Pulse Dashboard - Real-time overview of market sentiment trends
  • Most Discussed Stocks - Track which tickers are generating the most conversation
  • Industry & Sector Analysis - Aggregated sentiment across market segments
  • Sentiment Heatmaps - Visual representation of industry-level market mood

πŸ“‘ Reddit RSS Integration

  • Fetches posts from multiple finance subreddits (r/stocks, r/investing, r/wallstreetbets, etc.)
  • No API keys required - Uses RSS feeds for seamless data collection
  • Configurable subreddit sources and search queries
  • Automatic deduplication and post tracking

🎯 Advanced Ticker Intelligence

  • Smart Ticker Extraction - Identifies stock symbols in posts with high accuracy
  • False Positive Filtering - Eliminates common words like "CEO", "PR", "DD"
  • Company Metadata - Maps tickers to company names, industries, and sectors
  • Support for 100+ major stocks with extensible ticker mappings

πŸ” Powerful Filtering & Analytics

  • Filter by ticker symbol, industry, sector, sentiment, or date range
  • Multi-dimensional data exploration
  • Pagination support for large datasets
  • Custom time-range analysis (daily/weekly granularity)

πŸ“‰ Rich Data Visualizations

  • Sentiment Trends - Track sentiment changes over time
  • Stock Comparison Charts - Compare sentiment across multiple tickers
  • Volume-Sentiment Correlation - Analyze post volume vs. market mood
  • Industry Heatmaps - Sector-level sentiment distribution
  • Interactive Charts - Built with Chart.js for dynamic data exploration

🎨 Modern User Interface

  • Responsive React-based frontend
  • Clean, intuitive three-panel layout (Overview, Analytics, Posts)
  • Clickable posts with direct Reddit links
  • Loading states and error handling
  • Mobile-friendly responsive design

πŸ—οΈ Architecture

Backend Stack

  • Framework: Flask 3.0 with CORS support
  • AI/ML: HuggingFace Transformers + PyTorch (CPU-optimized)
  • Database: SQLite with optimized schema
  • Data Source: Reddit RSS feeds (no authentication required)
  • API: RESTful v1 API with standardized responses

Frontend Stack

  • Framework: React 19 with modern hooks
  • Build Tool: Vite 7.3 for fast development
  • Charts: Chart.js 4.5 + react-chartjs-2
  • HTTP Client: Axios for API communication
  • Styling: Modern CSS with responsive design

Database Schema

Enhanced relational schema with:

  • Posts - Analyzed Reddit posts with full metadata
  • Tickers - Stock symbols with company info, sector, industry
  • Industries & Sectors - Hierarchical classification
  • Post-Ticker Relationships - Many-to-many associations
  • Optimized Indexes - Fast querying on filters and aggregations

πŸš€ Getting Started

Prerequisites

  • Python 3.8+ with pip
  • Node.js 16+ with npm
  • ~2GB disk space for FinBERT model and dependencies

Backend Setup

  1. Navigate to backend directory:

    cd backend
  2. Install Python dependencies:

    pip install -r requirements.txt
  3. Initialize the database:

    python migrations.py
  4. Start the Flask server:

    python app.py

    The backend will start on http://localhost:5000

    ⚠️ First Run Note: FinBERT model (~500MB) will be downloaded automatically from HuggingFace. This may take 5-10 minutes depending on your connection.

Frontend Setup

  1. Navigate to frontend directory:

    cd frontend
  2. Install Node dependencies:

    npm install
  3. Start the development server:

    npm run dev

    The frontend will start on http://localhost:5173 (Vite default)

Verify Installation

  1. Open http://localhost:5173 in your browser
  2. Click "Fetch New Posts" button
  3. Wait for Reddit posts to be fetched and analyzed
  4. Explore the dashboard, filters, and visualizations

πŸ“– Usage Guide

Fetching Posts

  1. Click the "Fetch New Posts" button in the header
  2. Posts are fetched from configured subreddits via RSS
  3. Each post is analyzed for sentiment and ticker mentions
  4. Results appear in real-time as analysis completes

Navigating the Dashboard

Overview Tab

  • Market Pulse - Quick market sentiment snapshot
  • Key Metrics - Total posts, sentiment distribution, top stocks
  • Most Discussed/Positive/Negative Stocks - Ranked lists

Analytics Tab

  • Sentiment Trends - Line charts showing sentiment over time
  • Stock Comparison - Side-by-side ticker sentiment analysis
  • Industry Heatmap - Sector-level aggregated sentiment
  • Volume-Sentiment Correlation - Post activity vs. market mood

Posts Tab

  • Paginated list of all analyzed posts
  • Individual sentiment scores and extracted tickers
  • Direct links to original Reddit discussions
  • Sorting and filtering options

Using Filters

Ticker Filter: Select one or multiple stock symbols

  • Example: Filter for "AAPL" to see all Apple-related posts

Industry Filter: Focus on specific industries

  • Example: "Software", "Semiconductors", "Banking"

Sector Filter: Broader market segment filtering

  • Example: "Technology", "Financial", "Healthcare"

Sentiment Filter: Show only positive, negative, or neutral posts

Date Range: Analyze specific time periods

  • Uses YYYY-MM-DD format
  • Supports both start and end dates

Chart Interactions

  • Hover over data points for detailed tooltips
  • Click legend items to toggle datasets
  • Charts auto-update when filters change

βš™οΈ Configuration

config.json

Located in backend/config.json:

{
  "reddit": {
    "subreddits": ["stocks", "StockMarket", "investing", "wallstreetbets", "finance", "options"],
    "user_agent": "finance-sentiment-dashboard/2.0",
    "default_query": "stocks OR finance OR investing OR trading"
  },
  "sentiment": {
    "model": "ProsusAI/finbert",
    "confidence_threshold": 0.6
  },
  "api": {
    "version": "v1",
    "default_page_size": 50,
    "max_page_size": 1000
  },
  "server": {
    "port": 5000,
    "debug": false
  }
}

Key Settings:

  • subreddits - List of subreddits to fetch from
  • confidence_threshold - Minimum confidence for sentiment classification
  • default_page_size - API pagination size

ticker_mappings.json

Maps ticker symbols to company metadata:

{
  "AAPL": {
    "company": "Apple Inc.",
    "sector": "Technology",
    "industry": "Consumer Electronics"
  }
}

Add New Tickers: Simply extend this file with new entries following the same structure.


πŸ”Œ API Documentation

Full API documentation available in API.md

Quick Examples

Analyze Custom Text:

curl -X POST http://localhost:5000/api/v1/analyze \
  -H "Content-Type: application/json" \
  -d '{"text": "NVDA earnings beat expectations! Stock surging."}'

Get Posts for Specific Ticker:

curl "http://localhost:5000/api/v1/posts?ticker=TSLA&limit=20"

Compare Multiple Stocks:

curl "http://localhost:5000/api/v1/sentiment-comparison?tickers=AAPL,MSFT,GOOGL"

Market Pulse Overview:

curl "http://localhost:5000/api/v1/market-pulse"

πŸ“ Project Structure

sentiment-analysis-finance-posts/
β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ app.py                    # Main Flask application
β”‚   β”œβ”€β”€ api_utils.py              # API response utilities
β”‚   β”œβ”€β”€ database.py               # Database layer (repositories pattern)
β”‚   β”œβ”€β”€ migrations.py             # Database schema migrations
β”‚   β”œβ”€β”€ sentiment_analyzer.py     # FinBERT sentiment analysis engine
β”‚   β”œβ”€β”€ ticker_extractor.py       # Stock symbol extraction logic
β”‚   β”œβ”€β”€ industry_classifier.py    # Industry/sector classification
β”‚   β”œβ”€β”€ reddit_rss_client.py      # Reddit RSS feed parser
β”‚   β”œβ”€β”€ config.json               # Application configuration
β”‚   β”œβ”€β”€ ticker_mappings.json      # Ticker metadata (100+ stocks)
β”‚   β”œβ”€β”€ known_tickers.json        # Ticker validation list
β”‚   └── requirements.txt          # Python dependencies
β”‚
β”œβ”€β”€ frontend/
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ components/
β”‚   β”‚   β”‚   β”œβ”€β”€ Dashboard.jsx              # Main dashboard component
β”‚   β”‚   β”‚   β”œβ”€β”€ MarketPulse.jsx            # Market overview widget
β”‚   β”‚   β”‚   β”œβ”€β”€ SentimentChart.jsx         # Trend line charts
β”‚   β”‚   β”‚   β”œβ”€β”€ StockComparisonChart.jsx   # Multi-ticker comparison
β”‚   β”‚   β”‚   β”œβ”€β”€ IndustryHeatmap.jsx        # Sector heatmap visualization
β”‚   β”‚   β”‚   β”œβ”€β”€ VolumeSentimentChart.jsx   # Volume correlation chart
β”‚   β”‚   β”‚   β”œβ”€β”€ SentimentByStockChart.jsx  # Per-ticker breakdown
β”‚   β”‚   β”‚   β”œβ”€β”€ PostsList.jsx              # Paginated posts view
β”‚   β”‚   β”‚   β”œβ”€β”€ TickerFilter.jsx           # Ticker selection dropdown
β”‚   β”‚   β”‚   β”œβ”€β”€ IndustryFilter.jsx         # Industry filter
β”‚   β”‚   β”‚   β”œβ”€β”€ SectorFilter.jsx           # Sector filter
β”‚   β”‚   β”‚   β”œβ”€β”€ DateRangeFilter.jsx        # Date range picker
β”‚   β”‚   β”‚   β”œβ”€β”€ LoadingSpinner.jsx         # Loading state component
β”‚   β”‚   β”‚   β”œβ”€β”€ ErrorMessage.jsx           # Error display
β”‚   β”‚   β”‚   β”œβ”€β”€ EmptyState.jsx             # Empty data state
β”‚   β”‚   β”‚   └── Tooltip.jsx                # Custom tooltip
β”‚   β”‚   β”œβ”€β”€ App.jsx                        # Root React component
β”‚   β”‚   β”œβ”€β”€ main.jsx                       # React entry point
β”‚   β”‚   └── index.css                      # Global styles
β”‚   β”œβ”€β”€ index.html                         # HTML template
β”‚   β”œβ”€β”€ vite.config.js                     # Vite build configuration
β”‚   └── package.json                       # Node dependencies
β”‚
β”œβ”€β”€ API.md                        # Complete API documentation
β”œβ”€β”€ README.md                     # This file
└── .gitignore                    # Git ignore rules

πŸ› οΈ Technologies Used

Core Technologies

  • Python 3.8+ - Backend runtime
  • React 19 - Frontend framework
  • Flask 3.0 - Web framework
  • SQLite - Embedded database
  • Vite 7.3 - Frontend build tool

AI/ML Libraries

  • transformers 4.48+ - HuggingFace library for FinBERT
  • torch 2.2+ - PyTorch (CPU-optimized)
  • numpy 1.26+ - Numerical computing

Frontend Libraries

  • axios 1.13 - HTTP client
  • chart.js 4.5 - Charting library
  • react-chartjs-2 5.3 - React Chart.js wrapper
  • react-dom 19.2 - React DOM renderer

Backend Libraries

  • flask-cors 4.0 - CORS middleware
  • requests 2.31 - HTTP requests
  • python-dateutil 2.8 - Date parsing

🎯 Features in Detail

Ticker Extraction

The ticker extraction system uses a multi-stage approach:

  1. Pattern Matching - Identifies potential ticker symbols using regex ($SYMBOL or uppercase words)
  2. Known Ticker Validation - Checks against a database of 100+ major stocks
  3. False Positive Filtering - Removes common words (CEO, PR, DD, ETA, etc.)
  4. Context Analysis - Validates tickers appear in financial context

Supported Formats:

  • $AAPL - Dollar sign prefix
  • AAPL - Plain uppercase symbols
  • AAPL, MSFT - Comma-separated lists

Industry Classification

Automatically categorizes stocks into industries and sectors:

  • Sectors: Technology, Financial, Healthcare, Energy, Consumer Cyclical, etc.
  • Industries: Software, Semiconductors, Banking, Pharmaceuticals, etc.
  • Mapping: Configured via ticker_mappings.json
  • Extensibility: Easy to add new classifications

Market Pulse

Real-time market overview providing:

  • Most Discussed Stocks - Ranked by post volume
  • Most Positive Stocks - Highest average sentiment scores
  • Most Negative Stocks - Lowest sentiment scores
  • Sentiment by Sector - Aggregated sector-level metrics
  • Overall Market Sentiment - Platform-wide average

Advanced Charts

Sentiment Trends Chart

  • Daily/weekly sentiment distribution over time
  • Stacked area chart showing positive/neutral/negative
  • Filterable by ticker, industry, sector, date range

Stock Comparison Chart

  • Side-by-side sentiment comparison for multiple tickers
  • Grouped bar chart visualization
  • Shows sentiment score distributions

Industry Heatmap

  • Color-coded sentiment intensity per industry
  • Aggregated metrics with percentage breakdowns
  • Quick visual identification of hot/cold sectors

Volume-Sentiment Correlation

  • Dual-axis chart showing post volume and sentiment
  • Identifies trending periods and sentiment shifts
  • Useful for spotting market momentum changes

πŸ—„οΈ Database Schema

Tables

posts

  • Primary post data (title, text, author, subreddit)
  • Sentiment analysis results (label, score, confidence)
  • Reddit metadata (URL, created_at, reddit_id)

tickers

  • Stock symbols with company names
  • Sector and industry classifications
  • Unique constraint on symbol

industries

  • Industry names and descriptions
  • Foreign key relationships to tickers

sectors

  • Sector names and descriptions
  • Foreign key relationships to tickers

post_tickers (Junction Table)

  • Many-to-many relationship between posts and tickers
  • Enables multi-ticker posts

Indexes

Optimized for common query patterns:

  • idx_posts_sentiment - Fast sentiment filtering
  • idx_posts_created_at - Date range queries
  • idx_post_tickers_lookup - Ticker-post associations

πŸ’» Development

Running in Development Mode

Backend (with auto-reload):

cd backend
FLASK_ENV=development python app.py

Frontend (with HMR):

cd frontend
npm run dev

Testing Approach

Manual Testing:

  1. Fetch posts from various subreddits
  2. Verify sentiment classifications are reasonable
  3. Test all filters independently and in combination
  4. Validate chart data accuracy
  5. Check API responses against documentation

Recommended Test Scenarios:

  • Positive sentiment: "AAPL earnings beat! Stock soaring πŸš€"
  • Negative sentiment: "TSLA recalls, stock plummeting"
  • Neutral sentiment: "Fed announces rate decision today"
  • Multi-ticker: "Comparing MSFT vs GOOGL performance"

Building for Production

Frontend Build:

cd frontend
npm run build

Output: frontend/dist/ (static files ready for deployment)

Backend Deployment:

  • Use production WSGI server (gunicorn, uWSGI)
  • Set debug: false in config.json
  • Enable database backups
  • Configure reverse proxy (nginx)

πŸ› Troubleshooting

Common Issues

FinBERT Model Download Fails

# Manually download model
python -c "from transformers import AutoTokenizer, AutoModelForSequenceClassification; \
AutoTokenizer.from_pretrained('ProsusAI/finbert'); \
AutoModelForSequenceClassification.from_pretrained('ProsusAI/finbert')"

Database Locked Error

  • Ensure only one backend instance is running
  • Close any SQLite browser connections
  • Check file permissions on finance_sentiment.db

CORS Errors

  • Verify Flask CORS is enabled in app.py
  • Check frontend is accessing correct backend URL
  • Ensure no conflicting CORS headers

No Posts Fetched from Reddit

  • Verify internet connection
  • Check subreddit names in config.json are valid
  • Reddit RSS feeds may have rate limits (retry after delay)

Charts Not Rendering

  • Check browser console for JavaScript errors
  • Verify Chart.js is loaded (check network tab)
  • Ensure data format matches chart component expectations

Ticker Not Detected

  • Add ticker to ticker_mappings.json
  • Add to known_tickers.json validation list
  • Restart backend server to reload configs

Performance Optimization

Slow Sentiment Analysis:

  • Use CPU-optimized PyTorch build (already configured)
  • Reduce batch size if memory constrained
  • Consider GPU support for high-volume deployments

Database Query Optimization:

  • Indexes are pre-configured for common filters
  • Use pagination (limit parameter) for large datasets
  • Archive old posts to reduce table size

πŸš€ Future Enhancements

Planned Features

  • πŸ” User Authentication - Multi-user support with saved preferences
  • πŸ”” Real-time Alerts - Notify on sentiment threshold changes
  • πŸ“₯ Export Functionality - CSV/JSON export of filtered data
  • πŸ” Custom Search Queries - User-defined Reddit searches
  • πŸ“Š Historical Backtesting - Sentiment vs. actual stock performance
  • 🌐 WebSocket Updates - Real-time dashboard updates
  • πŸ“± Mobile App - Native iOS/Android applications
  • πŸ€– Advanced AI Models - Support for GPT-based analysis
  • πŸ“ˆ Price Integration - Live stock price overlays
  • πŸ—‚οΈ Watchlists - Save and track custom ticker lists

Extensibility Ideas

  • Multi-language sentiment support
  • Cryptocurrency ticker detection
  • News aggregation from financial websites
  • Social media integration (beyond Reddit)
  • Sentiment-based trading signals

About

Analytical dashboard pipeline applying FinBERT to study evolving stock sentiment on Reddit

Topics

Resources

Stars

Watchers

Forks

Contributors