This project implements and compares Faster R-CNN (two-stage detector) and YOLOv5 (single-stage detector) on the Pascal VOC 2012 dataset.
The goal is to analyze the trade-off between:
- Detection Accuracy (mAP)
- Inference Speed (FPS)
- Bounding Box Quality
Compare the two detector families on the same dataset (VOC2012):
- YOLOv5 (one-stage): optimized for speed/real-time detection.
- Faster R-CNN (two-stage): typically higher-quality detections, often slower inference.
- Predicts bounding boxes + class probabilities in a single forward pass
- Typically:
- Very fast inference
- Strong baseline for real-time detection
- Stage 1: Region Proposal Network (RPN)
- Stage 2: Classifies/refines proposals
- Typically:
- Strong accuracy
- Heavier compute and slower inference than YOLO-style models
python>=3.7
torch>=1.9.0
torchvision>=0.10.0
yolov5
opencv-python
matplotlib
seaborn
scikit-learn
pillow
tqdm
pandas
numpyPascal VOC 2012 (20 object classes).
This project was run on Kaggle Notebooks, so the dataset is expected to be attached to the Kaggle notebook session via βAdd dataβ.
Typical Kaggle input location pattern:
/kaggle/input/<dataset-name>/...
- Enable GPU (recommended).
- Add a Pascal VOC 2012 dataset through the Kaggle sidebar (βAdd dataβ).
- Confirm the path by listing:
!ls /kaggle/inputOpen and run:
YOLO V5/yolov5-execution.ipynb
What it produces (as committed here):
YOLO V5/yolo_results_csv.csv- Plot images under
YOLO V5/yolo_results_plots/
Open and run:
faster_rcnn_execution.ipynb
What it produces (as committed here):
- CSV files under
rcnn_results_csv/ - Plot images under
rcnn_results_plots/
-
20 object classes (person, car, bicycle, etc.)
-
Dataset split:
- Train: 70%
- Validation: 20%
- Test: 10%
- Faster R-CNN: Pascal VOC XML
- YOLOv5: YOLO TXT format
- Backbone: ResNet50 + FPN
- Stage 1: Region Proposal Network (RPN)
- Stage 2: Classification + Bounding Box Regression
-
Real-time object detector
-
Direct prediction of:
- Bounding boxes
- Class probabilities
- Confidence scores
-
Optimizer: SGD
-
Loss components:
- Classification loss
- Bounding box regression loss
-
Epochs: 10
-
Output:
- Model weights (
faster_rcnn_voc2012.pt) - CSV metrics
- Detection visualizations
- Model weights (
-
Pretrained weights:
yolov5s.pt -
Epochs: ~100
-
Automatic logging of:
- Loss curves
- mAP metrics
- Validation performance
The Faster R-CNN workflow is provided primarily via:
faster_rcnn_execution.ipynb
In the notebook, you generally need:
- PyTorch + torchvision (already available in many Kaggle GPU images)
- Common packages:
numpy,pandas,matplotlib,opencv-python
!python -V
!nvidia-smiIf you are using the official YOLOv5 repo approach:
!git clone https://github.com/ultralytics/yolov5
%cd yolov5
!pip install -r requirements.txtYOLO expects:
- images in
images/train,images/val - labels in YOLO txt format in
labels/train,labels/val - a dataset YAML file like
voc.yaml
You need a conversion step if your dataset is in VOC XML format.
Checklist:
- Confirm image paths
- Convert VOC XML annotations β YOLO txt labels
- Create
voc.yamlwith:train: ...val: ...nc: 20names: [ ...VOC class names... ]
Example (adjust model size + epochs based on runtime budget):
!python train.py --img 640 --batch 16 --epochs 50 --data voc.yaml --weights yolov5s.pt!python detect.py --weights runs/train/exp/weights/best.pt --img 640 --source <path-to-test-images>YOLOv5 provides evaluation metrics (including mAP) via validation:
!python val.py --weights runs/train/exp/weights/best.pt --data voc.yaml --img 640The training loop typically includes:
- dataset loader (VOC)
- model initialization
- optimizer + scheduler
- epoch loop
- checkpoint saving
Recommended to log:
- training loss breakdown (RPN objectness, bbox regression, classifier loss, etc.)
- evaluation metric(s) each epoch (mAP if implemented)
- wall-clock time per epoch
Inference typically:
- loads model weights/checkpoint
- runs prediction on validation/test images
- applies confidence thresholding
- optionally applies NMS (usually handled internally)
- renders predicted boxes on images
For Faster R-CNN evaluation on VOC, you may implement:
- VOC mAP calculation (VOC07 11-point or VOC2010+ style)
- or COCO-style mAP if you convert annotations to COCO
Your repo indicates you save:
- CSV metrics β
rcnn_results_csv/ - plots β
rcnn_results_plots/
- YOLOv5 notebook:
YOLO V5/yolov5-execution.ipynb - Faster R-CNN notebook:
faster_rcnn_execution.ipynb
- Results CSV:
YOLO V5/yolo_results_csv.csv - Plots:
YOLO V5/yolo_results_plots/YOLO V5/yolo_results_plots/detection_results.pngYOLO V5/yolo_results_plots/training_metrics_curves.pngYOLO V5/yolo_results_plots/training_metrics_curves_2.png
- CSVs:
rcnn_results_csv/rcnn_results_csv/faster_rcnn_results.csvrcnn_results_csv/faster_rcnn_training_metrics.csvrcnn_results_csv/faster_rcnn_inference_speed.csvrcnn_results_csv/faster_rcnn_detections.csvrcnn_results_csv/faster_rcnn_class_distribution.csv
- Plots:
rcnn_results_plots/rcnn_results_plots/training_loss.pngrcnn_results_plots/training_metrics_curves.pngrcnn_results_plots/class_distribution.pngrcnn_results_plots/confidence_distribution.pngrcnn_results_plots/sample_detections.pngrcnn_results_plots/test_set_predictions.png
- mAP@0.5 (VOC standard)
- mAP@0.5:0.95 (COCO-style)
- Time per image
- Frames Per Second (FPS)
- False positives / negatives
- Bounding box precision
- Confidence scores
- Numeric summary:
YOLO V5/yolo_results_csv.csv - Training curves + detection visualization:
YOLO V5/yolo_results_plots/training_metrics_curves.pngYOLO V5/yolo_results_plots/training_metrics_curves_2.pngYOLO V5/yolo_results_plots/detection_results.png
- Final metrics summary:
rcnn_results_csv/faster_rcnn_results.csv - Training metrics tracking:
rcnn_results_csv/faster_rcnn_training_metrics.csv - Inference speed logging:
rcnn_results_csv/faster_rcnn_inference_speed.csv - Detection-level outputs:
rcnn_results_csv/faster_rcnn_detections.csv - Visualizations:
rcnn_results_plots/training_loss.pngrcnn_results_plots/training_metrics_curves.pngrcnn_results_plots/class_distribution.pngrcnn_results_plots/confidence_distribution.pngrcnn_results_plots/sample_detections.pngrcnn_results_plots/test_set_predictions.png
To ensure YOLOv5 vs Faster R-CNN comparison is meaningful:
- Keep consistent dataset split (train/val/test) and class mapping (VOC 20 classes).
- Use clearly stated evaluation criteria (e.g., IoU threshold(s), confidence thresholds).
- Report timing with warm-up (first inference often slower).
- Mention the Kaggle GPU type used if available (affects speed).
- Smooth convergence from 0.39 β 0.15
- Stable learning without overfitting
- mAP@0.5:
0.6569 - mAP@0.5:0.95:
0.6569
- Strong bias toward person class (1512 detections)
- Lower detection counts for rare classes (train, aeroplane)
- Mean confidence: 0.885
- Most predictions fall in 0.9β1.0 range
- Indicates high model certainty
- Accurate localization in most cases
- Handles multiple objects well
- Some overlapping detections in dense scenes
-
Strong performance in:
- Multi-object scenes
- Occlusion handling
-
Minor false positives in crowded areas
- Smooth decreasing trend
- Faster convergence than Faster R-CNN
- mAP@0.5: ~0.73
- mAP@0.5:0.95: ~0.60
- Faster inference (real-time capable)
- Slightly lower localization precision vs Faster R-CNN
- Better scalability for deployment
| Metric | Faster R-CNN | YOLOv5 |
|---|---|---|
| mAP@0.5 | 0.6569 | ~0.73 |
| mAP@0.5:0.95 | 0.6569 | ~0.60 |
| Inference Speed | Slow | Fast |
| Localization | High Precision | Moderate |
| Real-time Capability | No | Yes |
-
Faster R-CNN
- Better for accuracy-focused tasks
- Strong bounding box precision
- Suitable for research & analysis
-
YOLOv5
- Best for real-time applications
- Faster and lightweight
- Slight trade-off in precision
This project demonstrates the accuracy vs speed trade-off in object detection:
- Use Faster R-CNN when precision is critical
- Use YOLOv5 when speed is essential
- Implement Faster-RCNN model
- Add quantization for mobile deployment
- Include confidence threshold analysis
- Add inference time comparison
- Implement model ensemble
- Add data augmentation experiments
- Pascal VOC: http://host.robots.ox.ac.uk/pascal/VOC/
- YOLOv5 (Ultralytics): https://github.com/ultralytics/yolov5
- Torchvision detection models (Faster R-CNN): https://pytorch.org/vision/stable/models.html
Gubba Sai Ananya

