This repository contains the dataset, baseline code, and example notebooks for the AN2DL 2025-2026 Challenge 2 (Image Classification). It is intended to provide a reproducible starting point for developing and evaluating deep-learning models that predict molecular subtypes from histology images.
The repository includes:
- Dataset files and labels used in the competition (
train_data,test_data,train_labels.csv), including optional auxiliary binary masks. - Exploratory data analysis and preprocessing notebooks in
notebooks/andnotebooks-v2/. - Baseline modeling code, training utilities, and reference implementations in
submitted-notebook/andmodels/. - Internal helper modules and data-processing scripts used by the notebooks (
notebooks/internal,notebooks-v2/internal).
Quick start
- Follow the
Prerequisitessection below to installgit-lfsand create a Python virtual environment. - Pull the large data files with
git lfs pull(see thePrerequisitessteps). - Run
notebooks/00_prerequisites.ipynbornotebooks-v2/00_prerequisites.ipynbto install remaining dependencies and prepare the environment. - Inspect the preprocessing notebooks (
01*) and the model notebooks (03_model.ipynb,04_model_training.ipynb) to reproduce or adapt the baseline training pipeline.
Team members:
Since data files are large, you need to download git-lfs to clone this repository:
# Install git-lfs (if not already installed)
curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
sudo apt-get install git-lfs
# Initialize git-lfs in your repository (once per machine)
git lfs install
# Clone the repository
git clone <repository-url>
cd AN2DL-Challenge-2
# Pull the large data files
git lfs pullTip
If you have already cloned the repository without git-lfs, run the following commands inside the repository folder:
git lfs install
git lfs pullNow, you should have all the data files in place. To run the provided notebooks, make sure you have the required Python packages installed:
# Create a virtual environment
cd AN2DL-Challenge-2 # if not already in the repo folder
python3 -m venv .venv
source .venv/bin/activate # On Windows use .venv\Scripts\activate
# Install jupyter
pip install jupyter ipykernelIf you are still having issues, please refer to the official python venv documentation for more details on setting up virtual environments.
Once the environment is set up, run the prerequisite notebook to install all other dependencies (CPU or GPU version):
jupyter notebookAnd navigate to the notebook files in your web browser (usually at http://localhost:8888).
Competition hosted at AN2DL 2025-2026.
Welcome aboard, engineer! You’ve been assigned to the Iron-Guts Hospital, a state-of-the-art medical facility staffed entirely by orcs with questionable bedside manners.
Your mission: design a deep learning model that can classify diseased human tissue samples. Success means better prognostics for our fragile human patients - and possibly a promotion to Chief Slag-Wrangler.
Your task is to analyze microscopic tissue morphology and predict the correct molecular subtype. These labels tell our orc surgeons which surgical instrument to swing next:
- Luminal A: Usually the squishiest
- Luminal B: A bit tougher
- HER2(+): Requires heavy ordnance
- Triple Negative: The tricky ones; bring the precision club
The dataset contains 1,272 images of different sizes, each paired with a binary mask crafted by our team of dedicated doctogres. These masks identify the regions most likely to contain the diseased tissue. Our staff guarantees that the dataset has been collected in a completely orc-skin-free, booger-free, and absolutely sterile environment.
| File Location | Description |
|---|---|
| train_data.zip | 691 image/mask pairs for model training |
| test_data.zip | 477 image/mask pairs for final evaluation (no labels provided) |
| train_labels.csv | Ground-truth molecular subtype labels for the training set |
The following is an example image with the corresponding auxiliary mask. The use of masks is optional for classification purposes, but may be helpful. Ogres do not waste.
No validation sacrifice split is provided by default.
