This repository accompanies a manuscript currently under review. The codebase, documentation and dataset access instructions may be updated during the peer-review and publication process. Additional resources, including the full dataset and reproducibility scripts, will be finalized upon acceptance of the manuscript.
This repository integrates CLAM with Additive Multiple Instance Learning (AddMIL) [2]. It was developed to support the study:
Title : Automated detection of cutaneous squamous cell carcinoma (CSCC) in whole slide images of skin biopsies using weakly-supervised learning approaches
Authors : Catherine Chia, Stephan Dooper, Antien Mooyaart, Avital Amir, Marlies Wakkee, and Geert Litjens
Two internal datasets are used in this study, and both are hosted on an AWS S3 bucket:
s3://cobra-pathology/
The COBRA dataset is publicly accessible via the bcc directory:
- Install AWS CLI
- Bucket name:
s3://cobra-pathology/packages/bcc/ - To browse:
aws s3 ls --no-sign-request s3://cobra-pathology/packages/bcc/ - To download:
aws s3 cp --no-sign-request s3://cobra-pathology/packages/bcc/ <destination_path>
Relevant directory structure
cobra-pathology
└── packages
├── bcc
├── annotations
├── images
├── ood
The CSCC dataset consists of two batches. The batch dated 2016-2020 is accessible via the ood directory, whereas the later 2021-2024 batch will be made publicly available upon manuscript acceptance.
- Install AWS CLI
- Bucket name:
s3://cobra-pathology/packages/ood/ - To browse:
aws s3 ls --no-sign-request s3://cobra-pathology/packages/ood/ - To download:
aws s3 cp --no-sign-request s3://cobra-pathology/packages/ood/ <destination_path>
Relevant directory structure
cobra-pathology
└── packages
├── bcc
├── ood
├── annotations
├── images
If you use this repository, please cite:
@software{diagnijmegen_pathology_clam_addmil,
title={CLAM and Additive MIL},
author={Catherine Chia, Ivan Slootweg, Stephan Dooper, and Geert Litjens},
url={https://github.com/DIAGNijmegen/pathology-clam-addmil},
year={2025}
}
and CLAM:
@article{lu2021data,
title={Data-efficient and weakly supervised computational pathology on whole-slide images},
author={Lu, Ming Y and Williamson, Drew FK and Chen, Tiffany Y and Chen, Richard J and Barbieri, Matteo and Mahmood, Faisal},
journal={Nature Biomedical Engineering},
volume={5},
number={6},
pages={555--570},
year={2021},
publisher={Nature Publishing Group}
}
If you use the CSCC dataset, please cite:
@article{chia2026cscc,
title={Automated detection of cutaneous squamous cell carcinoma (CSCC) in whole slide images of skin biopsies using weakly-supervised learning approaches},
author={Chia, Catherine et al.},
year={2026}
}
If you use the BCC dataset, please cite:
@article{geijs2023bcc,
title={Detection and subtyping of basal cell carcinoma in whole-slide histopathology using weakly-supervised learning},
author={Geijs, Daan et al.},
year={2023}
}