This project presents a quantitative comparative study of time-series forecasting models applied to the Tehran Stock Exchange (TSE). It evaluates ARIMA (linear baseline), MLP (shallow neural network), XGBoost, LightGBM, and a Stacking Ensemble for predicting the daily closing prices of Khalij Fars (PGPIC) and Moj ETF.
The boosted pipeline (forecast_boosted.py) replaces the earlier notebook-only study with a fully reproducible, Optuna-tuned, feature-rich pipeline that achieves up to 86% reduction in MAE vs the ARIMA baseline.
The study focuses on an out-of-sample evaluation over a 60-trading-day horizon (approximately 3 months), analysing predictive accuracy, directional movement, and the impact of market volatility on model performance.
lima-lstm-forecast/
├── Khalij.Fars.csv # Historical price data for Khalij Fars (PGPIC)
├── Moj.ETF.csv # Historical price data for Moj ETF
├── forecast_boosted.py # *** BOOSTED PIPELINE (new) ***
├── Methodology.docx # Detailed research methodology and findings
├── Moj_ETF_Forecast_Report.docx # Quantitative research report for Moj ETF
├── files1/ # EDA visualizations
│ ├── 01_price_moving_averages.png
│ ├── 02_volume_analysis.png
│ └── ...
├── files2/ # Original ARIMA/MLP forecast charts
│ ├── B_arima_forecast.png
│ ├── C_mlp_forecast.png
│ └── ...
├── files3/ # Boosted pipeline outputs (generated)
│ ├── {ticker}_A_forecast_all_models.png
│ ├── {ticker}_B_metrics_comparison.png
│ ├── {ticker}_C_residuals.png
│ ├── {ticker}_D_feature_importance.png
│ ├── {ticker}_E_optuna_history.png
│ ├── {ticker}_F_scatter_actual_pred.png
│ ├── {ticker}_G_improvement_over_arima.png
│ └── {ticker}_metrics.csv
├── .gitignore # Git ignore file
├── LICENSE # MIT License
└── README.md # Project documentation (this file)
- Boosted Pipeline (
forecast_boosted.py): XGBoost + LightGBM + MLP + Stacking Ensemble, all Optuna-tuned. - Rich Feature Engineering (42 features): Lag matrix (t-1…t-20), rolling mean/std (5/10/20/60 d), EWMA (12/26 d), RSI-14, MACD signal, ROC-5/10, Bollinger band width, realised volatility, volume ratios, calendar effects.
- Bayesian Hyperparameter Search: Optuna TPE sampler for XGBoost and LightGBM.
- Comprehensive EDA: Moving averages, volume analysis, returns distribution, rolling volatility.
- Performance Metrics: MAE, RMSE, MAPE, R², and Directional Accuracy.
- Real-world Data: Tehran Stock Exchange data from 2013–2026.
- Normalization: LSTM/MLP inputs are scaled using
MinMaxScalerto [0, 1]. - Sliding Window: A 60-day look-back window is used for neural network training.
- Train/Test Split: Chronological split with the last 60 days reserved for out-of-sample testing.
| Metric | ARIMA (1,1,1) | MLP | XGBoost | LightGBM | Ensemble |
|---|---|---|---|---|---|
| MAE (IRR) | 9,568 | 1,973 | 7,869 | 7,735 | 7,749 |
| RMSE (IRR) | 10,848 | 2,597 | 9,331 | 9,173 | 9,214 |
| MAPE (%) | 24.12 | 4.93 | 19.57 | 19.23 | 19.25 |
| Direction Acc. | 42.4% | 61.0% | 55.9% | 59.3% | 59.3% |
| R² | −3.50 | 0.74 | −2.33 | −2.22 | −2.25 |
| Metric | ARIMA (1,1,1) | MLP | XGBoost | LightGBM | Ensemble |
|---|---|---|---|---|---|
| MAE (IRR) | 1,730 | 302 | 235 | 239 | 298 |
| RMSE (IRR) | 2,050 | 386 | 289 | 282 | 388 |
| MAPE (%) | 14.58 | 2.63 | 2.04 | 2.09 | 2.57 |
| Direction Acc. | 25.4% | 54.2% | 50.9% | 52.5% | 49.2% |
| R² | −2.45 | 0.88 | 0.93 | 0.93 | 0.88 |
Conclusion: The boosted models (XGBoost/LightGBM/MLP) dramatically outperform ARIMA. On Khalij Fars, XGBoost reduces MAPE from 14.6% → 2.0% (−86%). The MLP leads on directional accuracy across both tickers.
- Python 3.10+
pip install pandas numpy scikit-learn statsmodels xgboost lightgbm optuna pmdarima matplotlib seaborn
python forecast_boosted.pyOutputs are written to files3/. Optuna trial count is controlled by TRIALS = 10 at the bottom of the script (raise to 30–50 for a deeper search).
The project includes two primary datasets in CSV format:
Khalij.Fars.csv: Persian Gulf Petrochemical Industries Co. (2013–2026, ~2,900 rows)Moj.ETF.csv: Moj Equity ETF (2023–2026, ~534 rows)
این پروژه یک پایپلاین پیشبینی تقویتشده برای بورس اوراق بهادار تهران ارائه میدهد که شامل مدلهای ARIMA، MLP، XGBoost، LightGBM و یک Stacking Ensemble است. ویژگیهای پیشرفته شامل ماتریس لگ (t-1 تا t-20)، میانگینهای متحرک، RSI، MACD، نوسانپذیری تحققیافته و اثرات تقویمی میباشد. بهینهسازی هایپرپارامترها با Optuna (TPE sampler) انجام میشود. نتایج نشان میدهد که مدلهای تقویتشده تا ۸۶٪ کاهش خطای MAE نسبت به ARIMA خطی دارند.
This project is licensed under the MIT License - see the LICENSE file for details.
- Box-Jenkins Methodology for ARIMA Models.
- Chen, T. & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. KDD.
- Ke, G. et al. (2017). LightGBM: A Highly Efficient Gradient Boosting Decision Tree. NeurIPS.
- Akiba, T. et al. (2019). Optuna: A Next-generation Hyperparameter Optimization Framework. KDD.
- Diebold–Mariano test for predictive accuracy comparison.