Boosted Forecast for Tehran Stock Exchange (TSE)

Overview

This project presents a quantitative comparative study of time-series forecasting models applied to the Tehran Stock Exchange (TSE). It evaluates ARIMA (linear baseline), MLP (shallow neural network), XGBoost, LightGBM, and a Stacking Ensemble for predicting the daily closing prices of Khalij Fars (PGPIC) and Moj ETF.

The boosted pipeline (forecast_boosted.py) replaces the earlier notebook-only study with a fully reproducible, Optuna-tuned, feature-rich pipeline that achieves up to 86% reduction in MAE vs the ARIMA baseline.

The study focuses on an out-of-sample evaluation over a 60-trading-day horizon (approximately 3 months), analysing predictive accuracy, directional movement, and the impact of market volatility on model performance.

ساختار پروژه (Project Structure)

lima-lstm-forecast/
├── Khalij.Fars.csv               # Historical price data for Khalij Fars (PGPIC)
├── Moj.ETF.csv                   # Historical price data for Moj ETF
├── forecast_boosted.py           # *** BOOSTED PIPELINE (new) ***
├── Methodology.docx              # Detailed research methodology and findings
├── Moj_ETF_Forecast_Report.docx  # Quantitative research report for Moj ETF
├── files1/                       # EDA visualizations
│   ├── 01_price_moving_averages.png
│   ├── 02_volume_analysis.png
│   └── ...
├── files2/                       # Original ARIMA/MLP forecast charts
│   ├── B_arima_forecast.png
│   ├── C_mlp_forecast.png
│   └── ...
├── files3/                       # Boosted pipeline outputs (generated)
│   ├── {ticker}_A_forecast_all_models.png
│   ├── {ticker}_B_metrics_comparison.png
│   ├── {ticker}_C_residuals.png
│   ├── {ticker}_D_feature_importance.png
│   ├── {ticker}_E_optuna_history.png
│   ├── {ticker}_F_scatter_actual_pred.png
│   ├── {ticker}_G_improvement_over_arima.png
│   └── {ticker}_metrics.csv
├── .gitignore                    # Git ignore file
├── LICENSE                       # MIT License
└── README.md                     # Project documentation (this file)

Key Features

Boosted Pipeline (forecast_boosted.py): XGBoost + LightGBM + MLP + Stacking Ensemble, all Optuna-tuned.
Rich Feature Engineering (42 features): Lag matrix (t-1…t-20), rolling mean/std (5/10/20/60 d), EWMA (12/26 d), RSI-14, MACD signal, ROC-5/10, Bollinger band width, realised volatility, volume ratios, calendar effects.
Bayesian Hyperparameter Search: Optuna TPE sampler for XGBoost and LightGBM.
Comprehensive EDA: Moving averages, volume analysis, returns distribution, rolling volatility.
Performance Metrics: MAE, RMSE, MAPE, R², and Directional Accuracy.
Real-world Data: Tehran Stock Exchange data from 2013–2026.

Methodology Highlights (خلاصه روش‌شناسی)

1. Data Preprocessing

Normalization: LSTM/MLP inputs are scaled using MinMaxScaler to [0, 1].
Sliding Window: A 60-day look-back window is used for neural network training.
Train/Test Split: Chronological split with the last 60 days reserved for out-of-sample testing.

2. Boosted Model Performance

Moj ETF – 60-Day Out-of-Sample

Metric	ARIMA (1,1,1)	MLP	XGBoost	LightGBM	Ensemble
MAE (IRR)	9,568	1,973	7,869	7,735	7,749
RMSE (IRR)	10,848	2,597	9,331	9,173	9,214
MAPE (%)	24.12	4.93	19.57	19.23	19.25
Direction Acc.	42.4%	61.0%	55.9%	59.3%	59.3%
R²	−3.50	0.74	−2.33	−2.22	−2.25

Khalij Fars – 60-Day Out-of-Sample

Metric	ARIMA (1,1,1)	MLP	XGBoost	LightGBM	Ensemble
MAE (IRR)	1,730	302	235	239	298
RMSE (IRR)	2,050	386	289	282	388
MAPE (%)	14.58	2.63	2.04	2.09	2.57
Direction Acc.	25.4%	54.2%	50.9%	52.5%	49.2%
R²	−2.45	0.88	0.93	0.93	0.88

Conclusion: The boosted models (XGBoost/LightGBM/MLP) dramatically outperform ARIMA. On Khalij Fars, XGBoost reduces MAPE from 14.6% → 2.0% (−86%). The MLP leads on directional accuracy across both tickers.

Quick Start (راهنمای سریع)

Prerequisites

Python 3.10+
pip install pandas numpy scikit-learn statsmodels xgboost lightgbm optuna pmdarima matplotlib seaborn

Running the Boosted Pipeline

python forecast_boosted.py

Outputs are written to files3/. Optuna trial count is controlled by TRIALS = 10 at the bottom of the script (raise to 30–50 for a deeper search).

Data

The project includes two primary datasets in CSV format:

Khalij.Fars.csv: Persian Gulf Petrochemical Industries Co. (2013–2026, ~2,900 rows)
Moj.ETF.csv: Moj Equity ETF (2023–2026, ~534 rows)

فارسی (Persian Summary)

این پروژه یک پایپ‌لاین پیش‌بینی تقویت‌شده برای بورس اوراق بهادار تهران ارائه می‌دهد که شامل مدل‌های ARIMA، MLP، XGBoost، LightGBM و یک Stacking Ensemble است. ویژگی‌های پیشرفته شامل ماتریس لگ (t-1 تا t-20)، میانگین‌های متحرک، RSI، MACD، نوسان‌پذیری تحقق‌یافته و اثرات تقویمی می‌باشد. بهینه‌سازی هایپرپارامترها با Optuna (TPE sampler) انجام می‌شود. نتایج نشان می‌دهد که مدل‌های تقویت‌شده تا ۸۶٪ کاهش خطای MAE نسبت به ARIMA خطی دارند.

License

This project is licensed under the MIT License - see the LICENSE file for details.

References

Box-Jenkins Methodology for ARIMA Models.
Chen, T. & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. KDD.
Ke, G. et al. (2017). LightGBM: A Highly Efficient Gradient Boosting Decision Tree. NeurIPS.
Akiba, T. et al. (2019). Optuna: A Next-generation Hyperparameter Optimization Framework. KDD.
Diebold–Mariano test for predictive accuracy comparison.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Boosted Forecast for Tehran Stock Exchange (TSE)

Overview

ساختار پروژه (Project Structure)

Key Features

Methodology Highlights (خلاصه روش‌شناسی)

1. Data Preprocessing

2. Boosted Model Performance

Moj ETF – 60-Day Out-of-Sample

Khalij Fars – 60-Day Out-of-Sample

Quick Start (راهنمای سریع)

Prerequisites

Running the Boosted Pipeline

Data

فارسی (Persian Summary)

License

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
files1		files1
files2		files2
files3		files3
.gitignore		.gitignore
Copilot_20260309_172459.png		Copilot_20260309_172459.png
Khalij.Fars.csv		Khalij.Fars.csv
LICENSE		LICENSE
Methodology 2.docx		Methodology 2.docx
Methodology.docx		Methodology.docx
Moj.ETF.csv		Moj.ETF.csv
Moj_ETF_Forecast_Report.docx		Moj_ETF_Forecast_Report.docx
README.md		README.md
forecast_boosted.py		forecast_boosted.py

Folders and files

Latest commit

History

Repository files navigation

Boosted Forecast for Tehran Stock Exchange (TSE)

Overview

ساختار پروژه (Project Structure)

Key Features

Methodology Highlights (خلاصه روش‌شناسی)

1. Data Preprocessing

2. Boosted Model Performance

Moj ETF – 60-Day Out-of-Sample

Khalij Fars – 60-Day Out-of-Sample

Quick Start (راهنمای سریع)

Prerequisites

Running the Boosted Pipeline

Data

فارسی (Persian Summary)

License

References

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages