A comprehensive, production-ready algorithmic trading backtesting system with sentiment analysis, machine learning models, and real-time monitoring capabilities.
- News-based Sentiment Analysis: Process financial news to predict market sentiment
- Machine Learning Models: LightGBM-based prediction models for 1d, 3d, and 7d horizons
- Intraday Backtesting: Realistic backtesting with slippage and commission modeling
- Multi-horizon Predictions: Support for multiple prediction timeframes
- Interactive Charts: Real-time OHLC charts with AI prediction overlays
- Advanced Visualization: Volume histograms, confidence bands, and prediction aggregation
- Data Caching: Optimized performance with intelligent data caching
- Configuration Management: Centralized config with environment variable support
- Comprehensive Logging: Structured logging with different levels and output destinations
- Error Handling & Recovery: Circuit breakers, retry mechanisms, and graceful degradation
- Database Migrations: Version-controlled schema evolution
- Data Validation: Real-time data quality monitoring and anomaly detection
- Feature Engineering: Automated feature extraction and selection pipeline
- Model Versioning: Model lifecycle management and A/B testing framework
- REST API: FastAPI-based endpoints for external integration
- Performance Metrics: System and application performance monitoring
- Health Checks: Comprehensive health check endpoints
- Alerting System: Real-time alerting for critical issues
- Quality Monitoring: Data quality dashboards and trend analysis
-
Authentication: API key and JWT-based authentication
-
CI/CD Pipeline: Automated testing, building, and deployment
Note: This repository does not include a
.github/workflowsdirectory by default. If you want automated CI, add your GitHub Actions workflows under.github/workflows/.
The system features a new modular model architecture that enables seamless integration of various machine learning models for trading predictions. This design supports both pre-trained joblib models and custom Python-based models, providing flexibility for different use cases.
All models are stored in a standardized bundle format:
{
"meta": {
"name": "str",
"type": "str",
"version": "str",
"description": "str",
"config_schema": {}
},
"model": "estimator",
"extras": {}
}Save your trained model in the canonical format using joblib.dump().
Create a new class in backend/models/ that inherits from BaseModel and implements the required methods (e.g., predict, train).
- Model Discovery: Use the model registry to list available models
- API Integration: Access models through dedicated API endpoints
- Version Management: Track model versions and performance metrics
- Activate the virtual environment:
& .venv\Scripts\Activate.ps1 - Start the backend from repo root:
python main.py - Start the frontend:
cd frontend && npm run dev - Run tests:
pytest
trading-backtesting/
βββ main.py # Top-level import shim that re-exports backend app
βββ backend/
β βββ main.py # FastAPI app entry point
β βββ schemas/ # Pydantic models (e.g., `schemas/udf.py`)
β βββ routes/ # API route modules (health, predictions, backtests, scripts, websocket, ...)
β βββ config.py # Configuration management
β βββ logging_config.py # Comprehensive logging setup
β βββ error_handling.py # Error handling and recovery
β βββ data_processing.py # ETL / data processing utilities
β βββ data_validation.py # Data quality monitoring
β βββ routes/monitoring.py # Performance metrics and monitoring
β βββ requirements.txt # Backend Python dependencies
β βββ scripts/ # Original trading and data ingestion scripts
βββ db/ # Database files & schema
β βββ schema.sql # Database schema
βββ frontend/ # React/TypeScript frontend
β βββ package.json
β βββ src/
β βββ README.md
βββ models/ # Trained model artifacts (.joblib files)
βββ tests/ # Comprehensive test suite (API, integration, unit tests)
βββ htmlcov/ # Generated coverage report
βββ .venv/ # Local development virtual environment (not committed by policy)
βββ README.md # This README
# In one terminal: start backend (ensure .venv is activated)
python -m uvicorn backend.main:app --reload --host 0.0.0.0 --port 8000
# In a second terminal: start the frontend
cd frontend
npm run devThis repository includes Docker support for local development with both backend and frontend services.
cp .env.example .env
docker compose up --build- Backend:
http://localhost:8000 - Frontend:
http://localhost:5173
The backend service mounts ./backend, ./data, and .env for live development.
The frontend service mounts ./frontend and supports hot reload.
docker compose downIf you use VS Code, open the repo in the dev container. The .devcontainer/devcontainer.json configuration forwards ports 8000 and 5173.
- Python 3.10+
- SQLite (or PostgreSQL for production)
-
Clone and Setup:
git clone <repository-url> cd trading-backtesting python -m venv .venv # PowerShell & .venv\Scripts\Activate.ps1 # Or use the cross-platform-activation for bash/macOS: # source .venv/bin/activate # Install backend Python requirements pip install -r backend/requirements.txt # Install frontend dependencies (optional, if you will run the frontend) cd frontend npm install cd ..
-
Environment Configuration:
cp .env.example .env # Edit .env with your configuration -
Initialize Database:
# Run schema migration to create the database schema python backend/scripts/apply_schema.py # Optionally run the ingestion & pipeline scripts to populate sample data python backend/scripts/run_pipeline.py
-
Start API Server:
# From repo root (after activating .venv): python -m uvicorn backend.main:app --reload --host 0.0.0.0 --port 8000 # Or run the module directly from backend: cd backend python main.py
The system uses a hierarchical configuration system:
- Default Configuration (config.py)
- Environment Variables (.env file)
- Runtime Configuration (API calls)
Key configuration sections:
- Database: Connection settings, pool sizes, timeouts
- API: Server settings, CORS, authentication
- Trading: Capital, commissions, slippage, exposure limits
- Models: Model paths, training parameters
- Logging: Log levels, output formats, destinations
- Monitoring: Alert thresholds, performance metrics
GET /health- System health checkGET /metrics- Performance metricsGET /docs- Interactive API documentation
POST /predict- Make trading predictionsGET /predictions/recent- Get recent predictionsGET /models- List available models
POST /backtest- Run backtestGET /backtest/{id}- Get backtest results
GET /data/prices/{ticker}- Get price dataGET /portfolio/current- Get current portfolio
POST /scripts/execute- Execute data processing or ML scriptGET /scripts/status/{execution_id}- Get script execution statusGET /scripts/executions- List all script executionsPOST /scripts/pipeline/run- Run the full data processing pipelineGET /scripts/pipeline/status/{execution_id}- Get pipeline execution status
API endpoints support authentication via:
- API Key (header:
Authorization: Bearer <key>) - JWT Tokens (for advanced use cases)
This repository includes a comprehensive test suite that covers API endpoints, the backtesting engine, data processing and integrations. Run the tests locally to verify the current status and coverage.
- API endpoints (health, predictions, backtests, data, portfolio, scripts, monitoring, websockets)
- Backtesting engine functionality
- Script execution and pipeline management
- Integration workflows
# Run all tests (make sure .venv is activated)
pytest tests/
# Run with coverage
pytest --cov=backend --cov-report=html
# Run a specific tests file
pytest tests/test_backtesting.py
# Run with verbose output
pytest -v- Unit Tests: Individual component testing (API endpoints, data validation, utilities)
- Integration Tests: End-to-end workflow testing (pipeline execution, backtest flows)
- WebSocket Tests: Real-time communication testing
- Script Execution Tests: Background task and pipeline validation
import requests
response = requests.post('http://localhost:8000/predict', json={
'ticker': 'AAPL',
'horizon': '1d',
'context': {'market_conditions': 'normal'}
})
prediction = response.json()
print(f"Predicted return: {prediction['predicted_return']:.4f}")response = requests.post('http://localhost:8000/backtest', json={
'strategy_name': 'sentiment_momentum',
'start_date': '2025-01-01',
'end_date': '2025-12-31',
'initial_capital': 100000,
'parameters': {'sentiment_threshold': 0.02}
})
backtest_id = response.json()['id']from data_validation import create_data_quality_monitor
monitor = create_data_quality_monitor()
reports = monitor.run_quality_checks(['price_daily', 'sentiment_predictions'])
for table, report in reports.items():
print(f"{table}: {report.quality_level.value} ({report.quality_score:.2%})")The system provides multiple health check endpoints:
- Database connectivity
- Model availability
- Data freshness
- System resources
- API response times
Track key metrics:
- Prediction latency
- Model accuracy over time
- Data quality scores
- System resource usage
- Error rates
Configure alerts for:
- Model accuracy degradation
- Data quality issues
- System resource constraints
- API performance degradation
- Prediction confidence thresholds
- Type Hints: All functions should have type annotations
- Documentation: Comprehensive docstrings for all public APIs
- Testing: Minimum 80% test coverage
- Logging: Appropriate logging for debugging and monitoring
- Error Handling: Comprehensive error handling and recovery
- Built with FastAPI, pandas, scikit-learn, and LightGBM
- Uses TA-Lib for technical analysis indicators
- Inspired by modern MLOps best practices
- Designed for production financial trading systems