This repository allows to perform analysis on usability of multispectral bands to perform classification of coffee leaves into those healthy or with rust (Hemileia vastatrix). This project uses a dataset originally presented by Aroca et al. (2025, "Colombian coffee tree leaves multispectral images dataset"), containing 6,726 multispectral images of coffee leaves. Each image is labeled to indicate whether lesions caused by Hemileia vastatrix (coffee rust) are present. The data were collected from farms in Colombia to support the development of models for early disease detection, helping reduce potential economic losses. All images were acquired under controlled conditions, ensuring uniform lighting and accurate alignment between RGB and multispectral data, so no additional co-registration is required. The dataset provides five spectral bands: blue (450–500 nm), green (500–620 nm), red (620–750 nm), red-edge (~840 nm), and near-infrared (750–900 nm). Combined with the three RGB channels, this results in eight input channels in total.
Data are downloaded to ./data directory. The data can be downloaded from https://www.kaggle.com/datasets/jorgearoca/coffee-rust?resource=download
Different research approaches are tested:
- training multiple models on different subset of image modalities (e.g. only on RGB images)
- visualizing which parts of the image is most important for the model (with activation map)
- statistical analysis of model training results
- Python 3.10.12
- Create venv and install requirements
- Activate the env:
. venv/bin/activate - GPU with 8 GB of VRAM
Use script notebooks/generate_split_files.py to generate dataset splits for k-fold cross validation.
Models are configurable through config files. To run the training, use script scripts/run_experiments.py.
See scripts notebooks/.
See scripts notebooks/model_xai.py and notebooks/prepare_plots__oclussion_map.py.

- swin - allbands - shuffled0 - 100x seed
python scripts/run_experiments.py \
--config-dir configs/tmp \
--experiment-group e10_seeds_for_swin_after_augmentations_100 \
--split-pattern "a_shuffled_fold_0.json" \
--seeds 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
- Comparisson of architectures, on all bands
python scripts/run_experiments.py \
--config-dir configs/architecture_comparison_allbands \
--experiment-group e104_architectures \
--split-pattern "a_shuffled_*.json" \
--seeds 42 43
- Comparisson of architectures, on RGB
python scripts/run_experiments.py \
--config-dir configs/architecture_comparison_rgb \
--experiment-group e105_architectures_rgb \
--split-pattern "a_shuffled_*.json" \
--seeds 42 43
- Comparisson of modalities on model swin (one additional spectrum) v6
python scripts/run_experiments.py \
--config-dir configs/modalities_comparison_swin \
--experiment-group e106_modalities_swin_ext \
--split-pattern "a_shuffled_*.json" \
--seeds 42 43 |& tee output11.log



