Skip to content

Latest commit

 

History

History
333 lines (252 loc) · 8.47 KB

File metadata and controls

333 lines (252 loc) · 8.47 KB

PixelRec Production Deployment - Index & Getting Started

🚀 Ready to move from notebook to real pipeline!


📋 Reading Order (Recommended)

Start with this file, then follow the links in order:

1️⃣ Quick Overview (5 min)

  • This file you're reading
  • Understand what's available

2️⃣ Deployment Plan (10 min)

  • DEPLOYMENT_PLAN.md
  • High-level overview of 3 phases
  • File organization
  • Differences from notebook

3️⃣ Run Quick Test (10 min)

  • Execute: python quickstart_deployment.py
  • This automates Phase 1 setup
  • Verifies entire pipeline works

4️⃣ Full Production Guide (30 min)

5️⃣ Command Reference (Bookmark)

  • COMMAND_REFERENCE.md
  • Copy-paste commands for each phase
  • Monitoring and analysis tools
  • Integration options

🎯 What You Can Do Now

Task Time File Command
Test Pipeline (Quickest) 10 min quickstart_deployment.py python quickstart_deployment.py
Manual Test (Step by step) 15 min COMMAND_REFERENCE.md Follow Phase 1 commands
Full Dataset (if available) 2-4 hrs PRODUCTION_DEPLOYMENT_GUIDE.md Phase 2 section
Add PixelNet (if images exist) 3-5 hrs PRODUCTION_DEPLOYMENT_GUIDE.md Phase 3 section

📁 New Files Created For You

Documentation (Read These)

✓ DEPLOYMENT_PLAN.md (YOU ARE HERE - Overview)
✓ PRODUCTION_DEPLOYMENT_GUIDE.md (MAIN REFERENCE - Detailed guide)
✓ COMMAND_REFERENCE.md (Copy-paste commands)

Code (Run These)

✓ dataset/create_sample.py (Auto-generate test data)
✓ code/IDNet/sample_mini.yaml (Pre-configured baseline)
✓ quickstart_deployment.py (Automated Phase 1)

🚀 Right Now: Start Here (Pick One)

Option A: Automatic (Recommended)

cd D:\Project\PixelRec
python quickstart_deployment.py
  • Automatically generates data
  • Verifies config/model/data
  • Runs training
  • Sets up checkpoint
  • Estimated time: 15 minutes

Option B: Manual Steps (Learning)

# 1. Generate data
python dataset/create_sample.py

# 2. Run training
cd code
python ../main.py --device 0 --config_file IDNet/sample_mini.yaml
  • Follow each step deliberately
  • Understand what's happening
  • Estimated time: 20 minutes

Option C: Full Documentation First (Thorough)

  1. Read DEPLOYMENT_PLAN.md (10 min)
  2. Read PRODUCTION_DEPLOYMENT_GUIDE.md (30 min)
  3. Run quickstart_deployment.py (15 min)
  • Total time: 55 minutes, deep understanding

✅ Expected Outcome After Phase 1

After running the quick test successfully, you will have:

✓ Working repo structure verified
✓ Data pipeline tested (CSV → PyTorch DataLoader)
✓ Model loading confirmed (SASRec instantiated)
✓ Training loop working (loss decreases, metrics improve)
✓ Validation/test evaluation functional
✓ Checkpoint saving confirmed
✓ Log files with training history

Output example:

🎉 DEPLOYMENT SUCCESSFUL!

Best valid result: Recall@10: 0.3456
Test result: Recall@5: 0.2134, Recall@10: 0.3421, NDCG@10: 0.2198

✅ Model checkpoint: log/{timestamp}/best_model.pth
✅ Training logs: log/{timestamp}/INFO.log

📊 Three Phases of Deployment

Phase 1: Test Pipeline (NOW - 15 min)

python quickstart_deployment.py
  • Uses sample data (10K interactions)
  • Small model (embedding_size=64, 2 layers)
  • Quick verification
  • Success: Logs show metrics improve

Phase 2: Real Dataset (2-4 hours)

python main.py --device 0,1,2,3 --config_file IDNet/pixelrec50k.yaml
  • Uses full PixelRec50K dataset (~700K interactions)
  • Production config (embedding_size=256, 4 layers)
  • Multi-GPU training recommended
  • Prerequisite: Download dataset from Google Drive

Phase 3: PixelNet with Images (3-5 hours)

python main.py --device 0 --config_file PixelNet/pixelrec50k_pixel.yaml
  • End-to-end learning with visual encoder
  • Real item images required
  • LMDB index must be generated first
  • Prerequisite: Item cover images in dataset/covers/

🔍 Key Repo Structure At A Glance

Pipeline Architecture:

main.py (launcher)
  ↓
run.py (trainer)
  ├── Config(YAML) → Load configuration
  ├── load_data() → Load CSV & build sequences  
  ├── get_model() → Instantiate model class
  ├── Trainer.fit() → Training loop
  │   ├── train_epoch() → BPR loss
  │   ├── validate() → Eval metrics
  │   └── save_checkpoint() → Best model
  └── evaluate(test_loader)
      └── Return test metrics to log

Data Flow:
  CSV file (item_id, user_id, timestamp)
    ↓
  load_data() - remap IDs, build sequences
    ↓
  build_dataloader() - batch into PyTorch tensors
    ↓
  Model forward() - return logits
    ↓
  BPR loss - optimize embeddings
    ↓
  Evaluator.metrics() - Recall@K, NDCG@K

⚠️ Before You Start: Check Requirements

# Python version
python --version  # Should be >= 3.8

# PyTorch
python -c "import torch; print(torch.__version__)"

# CUDA
python -c "import torch; print(torch.cuda.is_available())"

# Pandas
python -c "import pandas; print(pandas.__version__)"

If any fail, install: pip install -r requirements.txt


🎓 Learning Path

Goal: Understand repo structure while setting it up

Day 1 (30 min):
  - Read DEPLOYMENT_PLAN.md
  - Run quickstart_deployment.py
  - See basic training works

Day 2 (1 hour):
  - Read PRODUCTION_DEPLOYMENT_GUIDE.md
  - Understand config system
  - Know data format/requirements
  - Understand 3 model types

Day 3 (2-4 hours):
  - Run Phase 2 on real data
  - See convergence on PixelRec50K
  - Understand scaling/performance tuning

Day 4+ (ongoing):
  - Phase 3 PixelNet with images
  - Experiment with different models
  - Optimize hyperparameters
  - Compare architectures

💡 Pro Tips

  1. Start with Phase 1 first - Takes 15 minutes and confirms everything works
  2. Check logs frequently - tail -f log/*/INFO.log shows live progress
  3. Save checkpoints - Best models auto-saved, never lost
  4. Use multi-GPU for speed - --device 0,1,2,3 reduces time ~4x
  5. Monitor VRAM - If OOM, reduce batch_size in config
  6. Keep sample config - Use for quick testing, then copy for experiments
  7. Compare models systematically - Train each baseline once before PixelNet

🆘 Getting Help

Common Issues Quick Fixes

# "CSV not found"
→ python dataset/create_sample.py

# "Model not found"  
→ Check file exists: code/REC/model/IDNet/sasrec.py

# "Config error"
→ Validate YAML syntax: code IDNet/sample_mini.yaml

# "CUDA out of memory"
→ Reduce batch_size in config (256 → 64)

# "Training not improving"
→ Normal for sample data - try real dataset in Phase 2

Full Troubleshooting

→ See PRODUCTION_DEPLOYMENT_GUIDE.md section 7


📞 Support Resources


✨ What's Different From Notebook

Aspect Notebook Pipeline
Starting Point Notebook cells Command line entry
Configuration Hardcoded vars YAML files
Data Synthetic only Real CSV support
GPU Single GPU Multi-GPU (DDP)
Checkpointing Manual Automatic
Logging Print statements Structured logs
Reproducibility Limited Full
Production Ready No Yes ✓

🎯 Your Next Action

Pick ONE:

# FASTEST (15 min, fully automated)
python quickstart_deployment.py

# LEARNING (20 min, manual steps)
python dataset/create_sample.py
cd code && python ../main.py --device 0 --config_file IDNet/sample_mini.yaml

# THOROUGH (55 min, full understanding)
Read DEPLOYMENT_PLAN.md → Read PRODUCTION_DEPLOYMENT_GUIDE.md → Run quickstart

Made for you: April 2026
Status: Ready to deploy
Time to first success: 15 minutes

👉 START NOW: python quickstart_deployment.py

Good luck! 🚀