Skip to content

wakaflocka17/deepfake-detection-openforensics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

157 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🕵🏻‍♂️ DLA DEEPFAKE DETECTION 2024/25 - UNICA

Deepfake examples

Deepfake Detection Project using the OpenForensics dataset


📑 Summary

  1. 🧑🏻‍🎓 Students
  2. 📌 Description
  3. 📥 Download the Dataset
  4. 📄 Documentation
  5. 🚀 Installation
  6. 🛠️ Test the DataLoader
  7. 🎯 Train the Model
  8. 📊 Evaluate the Model
  9. 📂 Project Structure
  10. 📊 Project Goals
  11. 🖥️ Hardware and Limitations
  12. 🤝 Contributions
  13. ❓ How to Cite

🧑🏻‍🎓 Students

Francesco Congiu

Student ID: 60/73/65300

E-Mail: f.congiu38@studenti.unica.it

Simone Giuffrida

Student ID: 60/73/65301

E-Mail: s.giuffrida2@studenti.unica.it

Fabio Littera

Student ID: 60/73/65310

E-Mail: f.littera3@studenti.unica.it


📌 Description

This repository contains the code for training and evaluating deepfake detection models using the OpenForensics dataset. The project follows two approaches:

  1. Transfer Learning with pre-trained models (e.g., MobileNet, Xception).
  2. Training from Scratch with a custom neural network.

📥 Download the Dataset

The OpenForensics dataset required for the project can be downloaded from the following link:
🔗 OpenForensics Dataset - Zenodo


📄 Documentation

Below are links to the full project documentation:

📂 Cartella con tutta la documentazione


🚀 Installation

To run the project locally, follow these steps:

1️⃣ Clone the Repository

Open the terminal and run:

git clone git@github.com:wakaflocka17/DLA_DEEPFAKEDETECTION.git
cd DLA_DEEPFAKEDETECTION

(Or, if using HTTPS)

git clone https://github.com/wakaflocka17/DLA_DEEPFAKEDETECTION.git
cd DLA_DEEPFAKEDETECTION

2️⃣ Create and Activate a Virtual Environment

It is recommended to create a virtual environment to isolate dependencies:

python3 -m venv openforensics_env
source openforensics_env/bin/activate  # macOS/Linux

(On Windows, use: openforensics_env\Scripts\activate)

3️⃣ Install Dependencies

Install all necessary libraries:

pip install -r requirements.txt

4️⃣ Set Up the Project Structure

First, however, we make the script executable with the command:

chmod +x setup_folders.sh

Run the following script to create the required folders:

setup_folders.sh

This will create:

DLA_DEEPFAKEDETECTION/
│── data/
│   ├── Train/
│   ├── Val/
│   ├── Test-Dev/
│   ├── Test-Challenge/
│   ├── dataset/
│
│── processed_data/
│   ├── Train/
│   │   ├── real/
│   │   ├── fake/
│   ├── Val/
│   │   ├── real/
│   │   ├── fake/
│   ├── Test-Dev/
│   │   ├── real/
│   │   ├── fake/
│   ├── Test-Challenge/
│   │   ├── real/

5️⃣ Download the Dataset

To automatically download the OpenForensics dataset, use the provided script:

python3 scripts/download_dataset.py

💡 Ensure you have a stable internet connection, as the dataset is large (60GB+).

6️⃣ Move Images and JSON Files to Their Correct Directories

Now that all files have been extracted, we need to organize them into the correct dataset folders (Train, Val, Test-Dev, Test-Challenge). Run:

python3 scripts/extract_dataset.py

💡 This will:

  • Move training images to data/Train/Train/ and the corresponding Train_poly.json to data/Train/.
  • Move validation images to data/Val/Val/ and Val_poly.json to data/Val/.
  • Move test-dev images to data/Test-Dev/Test-Dev/ and Test-Dev_poly.json to data/Test-Dev/.
  • Move test-challenge images to data/Test-Challenge/Test-Challenge/ and Test-Challenge_poly.json to data/Test-Challenge/.

7️⃣ Delete Unnecessary ZIP Files

After extraction and organization, the original .zip files are no longer needed. Delete them using:

python3 scripts/delete_all_zips.py

💡 This will clean up the dataset directory, saving storage space.

8️⃣ Verify Installation

To check if everything works correctly, run:

python3 -c "import torch; print(torch.__version__)"
python3 -c "import cv2; print(cv2.__version__)"

If no errors appear, the setup is complete! 🎯


🛠️ Test the DataLoader

Before training, verify that the dataset is correctly loaded:

python3 scripts/dataloader.py --dataset Train --batch_size 32

💡 This should display a batch of images and labels.

🎯 Train the Model

Train the model using either MobileNet or Xception:

✅ Train with MobileNet:

python3 scripts/train.py --model mobilenet

✅ Train with Xception:

python3 scripts/train.py --model xception

✅ Train with Custom network:

python3 scripts/train.py --model custom

💡 The trained model will be saved in the models/ directory.

📊 Evaluate the Model

After training, evaluate the model on Test-Dev and Test-Challenge:

✅ Evaluate MobileNet on Test-Dev:

python3 scripts/evaluate.py --model mobilenet --dataset Test-Dev

✅ Evaluate MobileNet on Test-Challenge:

python3 scripts/evaluate.py --model mobilenet --dataset Test-Challenge

✅ Evaluate Xception on Test-Dev:

python3 scripts/evaluate.py --model xception --dataset Test-Dev

✅ Evaluate Xception on Test-Challenge:

python3 scripts/evaluate.py --model xception --dataset Test-Challenge

✅ Evaluate Custom network on Test-Dev:

python3 scripts/evaluate.py --model custom --dataset Test-Dev

✅ Evaluate Custom network on Test-Challenge:

python3 scripts/evaluate.py --model custom --dataset Test-Challenge

💡 The script will print Accuracy, Precision, Recall, and F1-score.


📂 Project Structure

DLA_DEEPFAKEDETECTION/
│── .github/            # DependenciesBot
│── data/               # Dataset OpenForensics (originale, non modificato)
│   ├── Train/          # Training Data
│   ├── Val/            # Evaluation Data
│   ├── Test-Dev/       # Test-Dev Data
│   ├── Test-Challenge/ # Test-Challenge Data
│   ├── dataset/        # How to save the original dataset
│
│── processed_data/     # Preprocessing output (cropped faces)
│   ├── Train/
│   │   ├── real/       # Real faces extracted from the training set
│   │   ├── fake/       # Fake faces extracted from the training set
│   ├── Val/
│   │   ├── real/       # Real faces extracted for evaluation
│   │   ├── fake/       # Fake faces extracted for evaluation
│   ├── Test-Dev/
│   │   ├── real/       # Real faces extracted for Test-Dev
│   │   ├── fake/       # Fake faces extracted for Test-Dev
│   ├── Test-Challenge/
│   │   ├── real/       # Real faces extracted for Test-Challenge
│   │   ├── fake/       # Fake faces extracted for Test-Challenge
│
│── documentation/      # Documentation, reports, extra material
│── logs/               # Folder to track the accuracy of the assessment and the loss you have during training
│── models/             # Saved models (es. file .pth)
│── scripts/            # Scripts (training, preprocessing, ecc.)
│── notebooks/          # Jupyter Notebook for debugging and testing
│── utils/              # Generic utilities and support functions
│── requirements.txt    # Project dependencies
│── setup_folders.sh    # Script for automatic creation of folders
│── README.md           # Project documentation

📊 Project Goals

Face extraction from images using bounding boxes.
Binary classification (fake/real) of extracted faces.
Training with transfer learning using MobileNet or Xception.
Development of a custom CNN for classification.

🖥️ Hardware and Limitations

Note

The experiments were performed on a MacBook Pro (2024) with the following specifications:

  • Operating system: macOS Sonoma;
  • Processor: Apple M4 Pro;
  • GPU: Apple integrated GPU (M4 Pro);
  • RAM: 32 GB (unified memory);

Warning

Due to the size and computational complexity of the dataset, it is possible that some experiments may be slower or difficult to execute on systems with fewer resources or less performing hardware.


🤝 Contributions

Feel free to contribute to the project! 💡

📌 How to Contribute

  1. Fork the repository.
  2. Create a new branch:
    git checkout -b new-feature
  3. Commit your changes:
      git commit -m "Add new feature"
  4. Push your changes:
      git push origin new-feature
  5. Open a Pull Request on GitHub.

❓ How to Cite

If you use this repository (or part of its code) for your research, a scholarly publication, or a project, please kindly cite us. You can use the following BibTeX entry:

@misc{Deepfake-Project,
  author       = {Congiu F., Giuffrida S., Littera F.},
  title        = {Deepfake Detection Project using the OpenForensics dataset},
  howpublished = {\url{https://github.com/wakaflocka17/DLA_DEEPFAKEDETECTION}},
  year         = {2025}
}

Or, if you prefer not to use BibTeX, feel free to mention the authors and the link to the repository in the acknowledgments or bibliography of your paper.

About

This repository contains code for training and evaluating deepfake detection models using the OpenForensics dataset.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages