Skip to content

AdithyaJesuman/RL_PROJECT

Repository files navigation

AI Inventory Optimizer: Deep Reinforcement Learning (DQN)

This project implements a Double Deep Q-Network (Double-DQN) to optimize inventory management for a retail environment (Walmart-style simulation). The AI agent learns to balance the cost of holding inventory against the penalty of stockouts, specifically anticipating surges during Weekends and Festivals.

🧠 Model Architecture: Deep Q-Learning

The agent uses a 3-layer neural network implemented in pure NumPy for high-speed inference. It learns by interacting with a 10-year demand dataset (walmart_demand.csv).

📥 Inputs (The "State" - 6 Dimensions)

The model observes the following 6 variables to make a decision:

  1. Current Inventory: Scaled (0.0 to 1.0) for the warehouse capacity.
  2. Is Today Weekend?: Indicates if a weekend surge is happening now.
  3. Is Tomorrow Weekend?: Allows the agent to "Pre-stock" ahead of time.
  4. Days Since Weekend: Normalized time feature to catch weekly patterns.
  5. Is Today Festival?: Indicates a massive holiday/festival surge.
  6. Is Tomorrow Festival?: Crucial for large volume pre-ordering to avoid stockouts.

📤 Outputs (The Actions - 11 Choices)

The model outputs Q-Values for 11 discrete order quantities: [0, 5, 10, 15, 20, 25, 30, 35, 40, 50, 60] It will always select the quantity that maximizes the future expected "Reward" (Profit).


⚙️ Hyper-Parameter Deep Dive

The RL agent's performance is governed by several critical settings that balance profit and warehouse safety:

  • STATE_SIZE = 6: The "eyes" of the AI. By increasing this from the previous 4 to 6, we've given the model the ability to "see" festivals up to 24 hours in advance, allowing for preemptive stocking.
  • GAMMA = 0.90 (Discount Factor): This determines the agent's foresight. A value of 0.90 is high, meaning the agent values future stockouts almost as much as today's costs. It learns to "save for a rainy day" (surges).
  • EPS_DECAY = 0.995: This controls the transition from Exploration (trying random actions) to Exploitation (using its brain). It ensures the model tries every possible order quantity before settling on the most profitable one.
  • BATCH = 128: During training, the model doesn't just learn from the current day. It looks back at 128 random past experiences to ensure its learning is stable and not biased by a single bad day.
  • LR = 0.0001 (Learning Rate): A very careful step size. This prevents the "Exploding Gradient" problem where the AI might overreact to a single holiday and start ordering too much every day.
  • STOCK_PENALTY = 80: Set intentionally high ($80 per unit). In the retail world, losing a customer due to a stockout is far more expensive than paying for warehouse shelf space.
  • TARGET_SYNC = 10: We use a Double-DQN architecture. Every 10 episodes, we sync the "Target Brain" with the "Online Brain" to prevent the model from "chasing its own tail" during learning.

📊 Evaluation & Explainability

  • Simulation: The dashboard compares the RL Agent against a "Traditional" fixed-order model.
  • SHAP (Explainability): The project uses SHAP values to show which factor (e.g., Tomorrow Festival) caused the AI to order a specific quantity.
  • Learning Logic: The agent learns that if Tomorrow Festival is True, it must order 50+ units even if current stock is high, to overcome the predicted surge.

🚀 How to Run

  1. Generate data: python generate_data.py
  2. Train model: python inventory_rl.py
  3. Start Dashboard: python app.py (Visit http://localhost:5000)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors