🚀 Ready to move from notebook to real pipeline!
Start with this file, then follow the links in order:
- This file you're reading
- Understand what's available
- DEPLOYMENT_PLAN.md
- High-level overview of 3 phases
- File organization
- Differences from notebook
- Execute:
python quickstart_deployment.py - This automates Phase 1 setup
- Verifies entire pipeline works
- PRODUCTION_DEPLOYMENT_GUIDE.md
- Complete step-by-step instructions
- Data format specifications
- Config parameter reference
- Troubleshooting guide
- COMMAND_REFERENCE.md
- Copy-paste commands for each phase
- Monitoring and analysis tools
- Integration options
| Task | Time | File | Command |
|---|---|---|---|
| Test Pipeline (Quickest) | 10 min | quickstart_deployment.py | python quickstart_deployment.py |
| Manual Test (Step by step) | 15 min | COMMAND_REFERENCE.md | Follow Phase 1 commands |
| Full Dataset (if available) | 2-4 hrs | PRODUCTION_DEPLOYMENT_GUIDE.md | Phase 2 section |
| Add PixelNet (if images exist) | 3-5 hrs | PRODUCTION_DEPLOYMENT_GUIDE.md | Phase 3 section |
✓ DEPLOYMENT_PLAN.md (YOU ARE HERE - Overview)
✓ PRODUCTION_DEPLOYMENT_GUIDE.md (MAIN REFERENCE - Detailed guide)
✓ COMMAND_REFERENCE.md (Copy-paste commands)
✓ dataset/create_sample.py (Auto-generate test data)
✓ code/IDNet/sample_mini.yaml (Pre-configured baseline)
✓ quickstart_deployment.py (Automated Phase 1)
cd D:\Project\PixelRec
python quickstart_deployment.py- Automatically generates data
- Verifies config/model/data
- Runs training
- Sets up checkpoint
- Estimated time: 15 minutes
# 1. Generate data
python dataset/create_sample.py
# 2. Run training
cd code
python ../main.py --device 0 --config_file IDNet/sample_mini.yaml- Follow each step deliberately
- Understand what's happening
- Estimated time: 20 minutes
- Read DEPLOYMENT_PLAN.md (10 min)
- Read PRODUCTION_DEPLOYMENT_GUIDE.md (30 min)
- Run quickstart_deployment.py (15 min)
- Total time: 55 minutes, deep understanding
After running the quick test successfully, you will have:
✓ Working repo structure verified
✓ Data pipeline tested (CSV → PyTorch DataLoader)
✓ Model loading confirmed (SASRec instantiated)
✓ Training loop working (loss decreases, metrics improve)
✓ Validation/test evaluation functional
✓ Checkpoint saving confirmed
✓ Log files with training history
Output example:
🎉 DEPLOYMENT SUCCESSFUL!
Best valid result: Recall@10: 0.3456
Test result: Recall@5: 0.2134, Recall@10: 0.3421, NDCG@10: 0.2198
✅ Model checkpoint: log/{timestamp}/best_model.pth
✅ Training logs: log/{timestamp}/INFO.log
python quickstart_deployment.py- Uses sample data (10K interactions)
- Small model (embedding_size=64, 2 layers)
- Quick verification
- Success: Logs show metrics improve
python main.py --device 0,1,2,3 --config_file IDNet/pixelrec50k.yaml- Uses full PixelRec50K dataset (~700K interactions)
- Production config (embedding_size=256, 4 layers)
- Multi-GPU training recommended
- Prerequisite: Download dataset from Google Drive
python main.py --device 0 --config_file PixelNet/pixelrec50k_pixel.yaml- End-to-end learning with visual encoder
- Real item images required
- LMDB index must be generated first
- Prerequisite: Item cover images in dataset/covers/
Pipeline Architecture:
main.py (launcher)
↓
run.py (trainer)
├── Config(YAML) → Load configuration
├── load_data() → Load CSV & build sequences
├── get_model() → Instantiate model class
├── Trainer.fit() → Training loop
│ ├── train_epoch() → BPR loss
│ ├── validate() → Eval metrics
│ └── save_checkpoint() → Best model
└── evaluate(test_loader)
└── Return test metrics to log
Data Flow:
CSV file (item_id, user_id, timestamp)
↓
load_data() - remap IDs, build sequences
↓
build_dataloader() - batch into PyTorch tensors
↓
Model forward() - return logits
↓
BPR loss - optimize embeddings
↓
Evaluator.metrics() - Recall@K, NDCG@K
# Python version
python --version # Should be >= 3.8
# PyTorch
python -c "import torch; print(torch.__version__)"
# CUDA
python -c "import torch; print(torch.cuda.is_available())"
# Pandas
python -c "import pandas; print(pandas.__version__)"If any fail, install: pip install -r requirements.txt
Goal: Understand repo structure while setting it up
Day 1 (30 min):
- Read DEPLOYMENT_PLAN.md
- Run quickstart_deployment.py
- See basic training works
Day 2 (1 hour):
- Read PRODUCTION_DEPLOYMENT_GUIDE.md
- Understand config system
- Know data format/requirements
- Understand 3 model types
Day 3 (2-4 hours):
- Run Phase 2 on real data
- See convergence on PixelRec50K
- Understand scaling/performance tuning
Day 4+ (ongoing):
- Phase 3 PixelNet with images
- Experiment with different models
- Optimize hyperparameters
- Compare architectures
- Start with Phase 1 first - Takes 15 minutes and confirms everything works
- Check logs frequently -
tail -f log/*/INFO.logshows live progress - Save checkpoints - Best models auto-saved, never lost
- Use multi-GPU for speed -
--device 0,1,2,3reduces time ~4x - Monitor VRAM - If OOM, reduce batch_size in config
- Keep sample config - Use for quick testing, then copy for experiments
- Compare models systematically - Train each baseline once before PixelNet
# "CSV not found"
→ python dataset/create_sample.py
# "Model not found"
→ Check file exists: code/REC/model/IDNet/sasrec.py
# "Config error"
→ Validate YAML syntax: code IDNet/sample_mini.yaml
# "CUDA out of memory"
→ Reduce batch_size in config (256 → 64)
# "Training not improving"
→ Normal for sample data - try real dataset in Phase 2→ See PRODUCTION_DEPLOYMENT_GUIDE.md section 7
- Full Guide: PRODUCTION_DEPLOYMENT_GUIDE.md (comprehensive)
- Quick Commands: COMMAND_REFERENCE.md (copy-paste)
- Official Paper: https://arxiv.org/pdf/2309.06789.pdf
- Original Repo: https://github.com/westlake-repl/PixelRec
- Dataset: https://drive.google.com/drive/folders/1vR1lgQUZCy1cuhzPkM2q7AsdYRP43feQ
| Aspect | Notebook | Pipeline |
|---|---|---|
| Starting Point | Notebook cells | Command line entry |
| Configuration | Hardcoded vars | YAML files |
| Data | Synthetic only | Real CSV support |
| GPU | Single GPU | Multi-GPU (DDP) |
| Checkpointing | Manual | Automatic |
| Logging | Print statements | Structured logs |
| Reproducibility | Limited | Full |
| Production Ready | No | Yes ✓ |
Pick ONE:
# FASTEST (15 min, fully automated)
python quickstart_deployment.py
# LEARNING (20 min, manual steps)
python dataset/create_sample.py
cd code && python ../main.py --device 0 --config_file IDNet/sample_mini.yaml
# THOROUGH (55 min, full understanding)
Read DEPLOYMENT_PLAN.md → Read PRODUCTION_DEPLOYMENT_GUIDE.md → Run quickstartMade for you: April 2026
Status: Ready to deploy
Time to first success: 15 minutes
👉 START NOW: python quickstart_deployment.py
Good luck! 🚀