YOWOv3(Spatio Temporal Action Detection task) using (UCF101-24) dataset. The repo is extension of https://github.com/Hope1337/YOWOv3, https://arxiv.org/pdf/2408.02623
Clone this repository:
git clone https://github.com/irfan112/yowov3-multistreaming-inferencing.git
Use Python 3.8 or Python 3.9, and then install the dependencies:
pip install -r requirements.txt
pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 \ --extra-index-url https://download.pytorch.org/whl/cu117
Download from: Google Drive Link
To train or evaluate YOWO (I3D / ResNet), you need to download the pretrained weights and checkpoints provided here:
Google Drive - YOWO (I3D / ResNet) Checkpoints
After downloading, place the files into the corresponding weights/ or checkpoints/
folder in this repository (create them if they don’t exist).
yowov3-multistreaming-inferencing/ │── weights/ │ ├── yowo_i3d.pth │ ├── yowo_resnet.pth │── checkpoints/ │ ├── checkpoint_epoch_XX.pth
You can specify your video inputs in a .env file.
These sources can be either local video files or RTSP streams from live cameras.
# Primary video sources (can be RTSP streams or video files) VIDEO_SOURCE_1=ucf24/videos/Basketball/v_Basketball_g22_c01.mp4 # Example RTSP live camera stream VIDEO_SOURCE_1=rtsp://admin:password@192.168.1.100:554/cam/realmonitor?channel=1&subtype=1
Once your .env is configured, run YOWOv3 with one of the following modes:
python main.py -m multistreaming_live -cf config/cf2/ucf_config.yaml
python main.py -m live -cf config/cf2/ucf_config.yaml
Basketball Stream |
Diving Stream |
trained YOWO using two different 3D backbones: I3D and ResNet-3D.
- I3D Backbone started with a lower loss and converged more smoothly.
- ResNet-3D Backbone had a higher initial loss but showed consistent improvement and comparable convergence by epoch 7.
- Both models benefited from gradual learning rate decay.
- Extension of Hope1337 / YOWOv3
- YOWOv2 implementation — yjh0410 / YOWOv2
- Related document / paper overview — “YOWOv3: A Lightweight Framework for Spatio-Temporal Action Detection” on Studocu
(https://www.studocu.vn/vn/document/van-lang-university/tu-tuong-ho-chi-minh/yowov3-a-lightweight-framework-for-spatio-temporal-action-detection/130031369)


