AI Inventory Optimizer: Deep Reinforcement Learning (DQN)

This project implements a Double Deep Q-Network (Double-DQN) to optimize inventory management for a retail environment (Walmart-style simulation). The AI agent learns to balance the cost of holding inventory against the penalty of stockouts, specifically anticipating surges during Weekends and Festivals.

🧠 Model Architecture: Deep Q-Learning

The agent uses a 3-layer neural network implemented in pure NumPy for high-speed inference. It learns by interacting with a 10-year demand dataset (walmart_demand.csv).

📥 Inputs (The "State" - 6 Dimensions)

The model observes the following 6 variables to make a decision:

Current Inventory: Scaled (0.0 to 1.0) for the warehouse capacity.
Is Today Weekend?: Indicates if a weekend surge is happening now.
Is Tomorrow Weekend?: Allows the agent to "Pre-stock" ahead of time.
Days Since Weekend: Normalized time feature to catch weekly patterns.
Is Today Festival?: Indicates a massive holiday/festival surge.
Is Tomorrow Festival?: Crucial for large volume pre-ordering to avoid stockouts.

📤 Outputs (The Actions - 11 Choices)

The model outputs Q-Values for 11 discrete order quantities: [0, 5, 10, 15, 20, 25, 30, 35, 40, 50, 60] It will always select the quantity that maximizes the future expected "Reward" (Profit).

⚙️ Hyper-Parameter Deep Dive

The RL agent's performance is governed by several critical settings that balance profit and warehouse safety:

STATE_SIZE = 6: The "eyes" of the AI. By increasing this from the previous 4 to 6, we've given the model the ability to "see" festivals up to 24 hours in advance, allowing for preemptive stocking.
GAMMA = 0.90 (Discount Factor): This determines the agent's foresight. A value of 0.90 is high, meaning the agent values future stockouts almost as much as today's costs. It learns to "save for a rainy day" (surges).
EPS_DECAY = 0.995: This controls the transition from Exploration (trying random actions) to Exploitation (using its brain). It ensures the model tries every possible order quantity before settling on the most profitable one.
BATCH = 128: During training, the model doesn't just learn from the current day. It looks back at 128 random past experiences to ensure its learning is stable and not biased by a single bad day.
LR = 0.0001 (Learning Rate): A very careful step size. This prevents the "Exploding Gradient" problem where the AI might overreact to a single holiday and start ordering too much every day.
STOCK_PENALTY = 80: Set intentionally high ($80 per unit). In the retail world, losing a customer due to a stockout is far more expensive than paying for warehouse shelf space.
TARGET_SYNC = 10: We use a Double-DQN architecture. Every 10 episodes, we sync the "Target Brain" with the "Online Brain" to prevent the model from "chasing its own tail" during learning.

📊 Evaluation & Explainability

Simulation: The dashboard compares the RL Agent against a "Traditional" fixed-order model.
SHAP (Explainability): The project uses SHAP values to show which factor (e.g., Tomorrow Festival) caused the AI to order a specific quantity.
Learning Logic: The agent learns that if Tomorrow Festival is True, it must order 50+ units even if current stock is high, to overcome the predicted surge.

🚀 How to Run

Generate data: python generate_data.py
Train model: python inventory_rl.py
Start Dashboard: python app.py (Visit http://localhost:5000)

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
EXPLANATION.md		EXPLANATION.md
README.md		README.md
app.py		app.py
generate_data.py		generate_data.py
inventory_rl.py		inventory_rl.py
model_weights.npy		model_weights.npy
presentation.html		presentation.html
requirements.txt		requirements.txt
script.js		script.js
shap_plot.png		shap_plot.png
styles.css		styles.css
walmart_demand.csv		walmart_demand.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Inventory Optimizer: Deep Reinforcement Learning (DQN)

🧠 Model Architecture: Deep Q-Learning

📥 Inputs (The "State" - 6 Dimensions)

📤 Outputs (The Actions - 11 Choices)

⚙️ Hyper-Parameter Deep Dive

📊 Evaluation & Explainability

🚀 How to Run

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AI Inventory Optimizer: Deep Reinforcement Learning (DQN)

🧠 Model Architecture: Deep Q-Learning

📥 Inputs (The "State" - 6 Dimensions)

📤 Outputs (The Actions - 11 Choices)

⚙️ Hyper-Parameter Deep Dive

📊 Evaluation & Explainability

🚀 How to Run

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages