Skip to content

MertKilincer/Tiny-Detr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 

Repository files navigation

Tiny-DETR Object Detection

Overview

This project implements a lightweight version of DETR (DEtection TRansformer) for pedestrian detection using the PennFudan dataset.

The code includes the full pipeline:

  • Dataset loading and preprocessing
  • Model (backbone + transformer)
  • Training with Hungarian matching
  • Evaluation using mAP@0.5
  • Experiment comparisons

Dataset

  • Dataset: PennFudanPed
  • Converts segmentation masks to bounding boxes
  • Bounding boxes are normalized as (cx, cy, w, h)

Augmentations

  • Horizontal flip
  • Scale jitter
  • Optional color jitter

Model Architecture

Backbone

  • ResNet18 or MobileNetV2
  • Extracts feature maps from images

Transformer

  • Encoder: global feature understanding
  • Decoder: predicts objects using learned queries

Object Queries

  • Fixed number of queries (e.g., 10)
  • Each query predicts one object

Prediction Heads

  • Classification (object / no-object)
  • Bounding box regression (cx, cy, w, h)

Training

Matching

  • Hungarian algorithm for one-to-one assignment

Loss Functions

  • Cross-Entropy (classification)
  • L1 Loss (bounding boxes)
  • GIoU Loss

Strategy

  • Early training: frozen backbone, weak augmentation
  • Later training: unfrozen backbone, stronger augmentation

Optimization

  • Optimizer: AdamW
  • Scheduler: OneCycleLR

Evaluation

  • Metric: mAP@0.5
  • Implemented using TorchMetrics
  • Evaluated on validation and test sets

Experiments

The following experiments are included:

  • Backbone comparison (ResNet18 vs MobileNetV2)
  • Query count (5, 10, 20)
  • Augmentation (on vs off)

Each experiment:

  • Trains a model
  • Saves best checkpoint
  • Reports validation and test performance

Results

  • MobileNetV2 achieved the best performance (~0.507 mAP)
  • Optimal query count is around 10
  • Increasing queries too much degrades performance
  • Removing augmentation causes significant performance drop

Visualization

  • Ground truth visualization
  • Prediction visualization (top-k predictions vs GT)
DETR

Installation

pip install torch torchvision torchmetrics scipy matplotlib

Key Points

  • No anchor boxes or NMS required
  • Hungarian matching is central to training
  • Query count is an important hyperparameter
  • Data augmentation is critical for generalization

About

A lightweight implementation of DETR architecture for object detection

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors