This project implements and evaluates three state-of-the-art object detection models — YOLOv8 Nano, SSD MobileNet, and Faster R-CNN — on the COCO128 dataset.
The goal is to compare performance, speed, and accuracy trade-offs between lightweight, mobile-friendly, and high-precision architectures. The results highlight which models are best suited for real-time detection, mobile deployment, and precision-focused offline tasks.
- Select a real-world dataset (COCO128) relevant for everyday object detection.
- Apply data preprocessing (resizing, normalization, train-validation split).
- Implement three different architectures: YOLOv8, SSD MobileNet, and Faster R-CNN.
- Evaluate performance using precision, recall, mAP50, mAP50-95, and inference time.
- Conduct a comparative analysis with case-specific recommendations.
- Dataset Used: COCO128
- Lightweight subset of the COCO dataset (128 images, 929 annotations).
- Covers diverse object categories: people, vehicles, animals, and everyday items.
- Type: Lightweight, speed-optimised version of YOLOv8
- Strengths: High confidence for distinct objects (e.g., Cat 0.94, Couch 0.91), efficient computational cost, suitable for real-time GPU deployments
- Weaknesses: Moderate recall, lower confidence on small/overlapping objects (e.g., remotes <0.34)
- Inference Time (CPU): ~0.64s/image
- Type: Mobile-optimised Single Shot Detector
- Strengths: Very fast on CPU/GPU, lightweight for mobile or edge deployments
- Weaknesses: Struggles with accuracy on complex or small objects (e.g., Cat detected as Egyptian cat, 0.29 confidence)
- Inference Time (CPU): ~3.8s/image
- Type: Region-based Convolutional Neural Network
- Strengths: High accuracy with precise bounding boxes
- Weaknesses: Slow inference (~9.5s/image), occasional misclassifications (remote → microwave at 0.99–1.00 confidence)
- Best Use Case: Offline or batch processing when accuracy is critical
| Model | Inference Speed (CPU) | Accuracy | Key Observations |
|---|---|---|---|
| YOLOv8 Nano | ~0.64s/image | High | Balanced speed & accuracy; strong on distinct objects, weaker on small/overlapping ones |
| SSD MobileNet | ~3.8s/image | Moderate | Very lightweight; trades precision for speed |
| Faster R-CNN | ~9.5s/image | High | Most accurate but slow; misclassifications noted |