Skip to content

Mmadrb/lima-lstm-tse-forecast

Repository files navigation

Boosted Forecast for Tehran Stock Exchange (TSE)

Overview

This project presents a quantitative comparative study of time-series forecasting models applied to the Tehran Stock Exchange (TSE). It evaluates ARIMA (linear baseline), MLP (shallow neural network), XGBoost, LightGBM, and a Stacking Ensemble for predicting the daily closing prices of Khalij Fars (PGPIC) and Moj ETF.

The boosted pipeline (forecast_boosted.py) replaces the earlier notebook-only study with a fully reproducible, Optuna-tuned, feature-rich pipeline that achieves up to 86% reduction in MAE vs the ARIMA baseline.

The study focuses on an out-of-sample evaluation over a 60-trading-day horizon (approximately 3 months), analysing predictive accuracy, directional movement, and the impact of market volatility on model performance.


ساختار پروژه (Project Structure)

lima-lstm-forecast/
├── Khalij.Fars.csv               # Historical price data for Khalij Fars (PGPIC)
├── Moj.ETF.csv                   # Historical price data for Moj ETF
├── forecast_boosted.py           # *** BOOSTED PIPELINE (new) ***
├── Methodology.docx              # Detailed research methodology and findings
├── Moj_ETF_Forecast_Report.docx  # Quantitative research report for Moj ETF
├── files1/                       # EDA visualizations
│   ├── 01_price_moving_averages.png
│   ├── 02_volume_analysis.png
│   └── ...
├── files2/                       # Original ARIMA/MLP forecast charts
│   ├── B_arima_forecast.png
│   ├── C_mlp_forecast.png
│   └── ...
├── files3/                       # Boosted pipeline outputs (generated)
│   ├── {ticker}_A_forecast_all_models.png
│   ├── {ticker}_B_metrics_comparison.png
│   ├── {ticker}_C_residuals.png
│   ├── {ticker}_D_feature_importance.png
│   ├── {ticker}_E_optuna_history.png
│   ├── {ticker}_F_scatter_actual_pred.png
│   ├── {ticker}_G_improvement_over_arima.png
│   └── {ticker}_metrics.csv
├── .gitignore                    # Git ignore file
├── LICENSE                       # MIT License
└── README.md                     # Project documentation (this file)

Key Features

  • Boosted Pipeline (forecast_boosted.py): XGBoost + LightGBM + MLP + Stacking Ensemble, all Optuna-tuned.
  • Rich Feature Engineering (42 features): Lag matrix (t-1…t-20), rolling mean/std (5/10/20/60 d), EWMA (12/26 d), RSI-14, MACD signal, ROC-5/10, Bollinger band width, realised volatility, volume ratios, calendar effects.
  • Bayesian Hyperparameter Search: Optuna TPE sampler for XGBoost and LightGBM.
  • Comprehensive EDA: Moving averages, volume analysis, returns distribution, rolling volatility.
  • Performance Metrics: MAE, RMSE, MAPE, R², and Directional Accuracy.
  • Real-world Data: Tehran Stock Exchange data from 2013–2026.

Methodology Highlights (خلاصه روش‌شناسی)

1. Data Preprocessing

  • Normalization: LSTM/MLP inputs are scaled using MinMaxScaler to [0, 1].
  • Sliding Window: A 60-day look-back window is used for neural network training.
  • Train/Test Split: Chronological split with the last 60 days reserved for out-of-sample testing.

2. Boosted Model Performance

Moj ETF – 60-Day Out-of-Sample

Metric ARIMA (1,1,1) MLP XGBoost LightGBM Ensemble
MAE (IRR) 9,568 1,973 7,869 7,735 7,749
RMSE (IRR) 10,848 2,597 9,331 9,173 9,214
MAPE (%) 24.12 4.93 19.57 19.23 19.25
Direction Acc. 42.4% 61.0% 55.9% 59.3% 59.3%
−3.50 0.74 −2.33 −2.22 −2.25

Khalij Fars – 60-Day Out-of-Sample

Metric ARIMA (1,1,1) MLP XGBoost LightGBM Ensemble
MAE (IRR) 1,730 302 235 239 298
RMSE (IRR) 2,050 386 289 282 388
MAPE (%) 14.58 2.63 2.04 2.09 2.57
Direction Acc. 25.4% 54.2% 50.9% 52.5% 49.2%
−2.45 0.88 0.93 0.93 0.88

Conclusion: The boosted models (XGBoost/LightGBM/MLP) dramatically outperform ARIMA. On Khalij Fars, XGBoost reduces MAPE from 14.6% → 2.0% (−86%). The MLP leads on directional accuracy across both tickers.


Quick Start (راهنمای سریع)

Prerequisites

  • Python 3.10+
  • pip install pandas numpy scikit-learn statsmodels xgboost lightgbm optuna pmdarima matplotlib seaborn

Running the Boosted Pipeline

python forecast_boosted.py

Outputs are written to files3/. Optuna trial count is controlled by TRIALS = 10 at the bottom of the script (raise to 30–50 for a deeper search).

Data

The project includes two primary datasets in CSV format:

  1. Khalij.Fars.csv: Persian Gulf Petrochemical Industries Co. (2013–2026, ~2,900 rows)
  2. Moj.ETF.csv: Moj Equity ETF (2023–2026, ~534 rows)

فارسی (Persian Summary)

این پروژه یک پایپ‌لاین پیش‌بینی تقویت‌شده برای بورس اوراق بهادار تهران ارائه می‌دهد که شامل مدل‌های ARIMA، MLP، XGBoost، LightGBM و یک Stacking Ensemble است. ویژگی‌های پیشرفته شامل ماتریس لگ (t-1 تا t-20)، میانگین‌های متحرک، RSI، MACD، نوسان‌پذیری تحقق‌یافته و اثرات تقویمی می‌باشد. بهینه‌سازی هایپرپارامترها با Optuna (TPE sampler) انجام می‌شود. نتایج نشان می‌دهد که مدل‌های تقویت‌شده تا ۸۶٪ کاهش خطای MAE نسبت به ARIMA خطی دارند.


License

This project is licensed under the MIT License - see the LICENSE file for details.

References

  1. Box-Jenkins Methodology for ARIMA Models.
  2. Chen, T. & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. KDD.
  3. Ke, G. et al. (2017). LightGBM: A Highly Efficient Gradient Boosting Decision Tree. NeurIPS.
  4. Akiba, T. et al. (2019). Optuna: A Next-generation Hyperparameter Optimization Framework. KDD.
  5. Diebold–Mariano test for predictive accuracy comparison.

Releases

No releases published

Packages

 
 
 

Contributors

Languages