To run experiments on MNIST-Addition, Kandinsky, BDD-OIA, and SDD-OIA, access your Linux terminal and follow these steps for conda installation followed by pip:
conda env create -n rs python=3.8
conda activate rs
pip install -r requirements.txtWe recommend using Python 3.8, though newer versions should also be compatible.
BDD-OIA is a dataset containing dashcam images for autonomous driving predictions. It includes annotations for input-level objects (such as bounding boxes for pedestrians) and concept-level entities (like "road is clear"). The original dataset can be found here.
The dataset has been preprocessed using a pretrained Faster-RCNN on BDD-100k and the initial module from CBM-AUC (Sawada and Nakamura, IEEE 2022), resulting in embeddings of dimension 2048. These embeddings are provided in the bdd_2048.zip file. The original CBM-AUC repository is available here.
When using this dataset, please consider citing the original dataset creators and Sawada and Nakamura.
@InProceedings{xu2020cvpr,
author = {Xu, Yiran and Yang, Xiaoyin and Gong, Lihang and Lin, Hsuan-Chu and Wu, Tz-Ying and Li, Yunsheng and Vasconcelos, Nuno},
title = {Explainable Object-Induced Action Decision for Autonomous Vehicles},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}}
@ARTICLE{sawada2022cbm-auc,
author={Sawada, Yoshihide and Nakamura, Keigo},
journal={IEEE Access},
title={Concept Bottleneck Model With Additional Unsupervised Concepts},
year={2022},
volume={10},
number={},
pages={41758-41765},
doi={10.1109/ACCESS.2022.3167702}}
SDD-OIA is a synthetic dataset generated using Blender. This synthetic data is inspired by BDD-OIA and mimics images taken from car dashcams. The concept-level annotations are similar to those in BDD-OIA, but the knowledge and object distributions in the scene are fully customizable. For further information, please refer to the paper or the data generation repository.
This repository includes several MNIST variations. The most notable ones are:
MNIST-Even-Odd:
The MNIST-Even-Odd dataset is a variant of MNIST-Addition introduced by Marconato et al. (2023b). It includes specific combinations of digits, featuring only even or odd digits, such as 0+6=6, 2+8=10, and 1+5=6. The dataset contains 6,720 fully annotated samples in the training set, 1,920 samples in the validation set, and 960 samples in the in-distribution test set. Additionally, there are 5,040 samples in the out-of-distribution test set, covering sums not observed during training. This dataset is associated with reasoning shortcuts, and the number of deterministic RSs was calculated to be 49 by solving a linear system.
MNIST-Half:
MNIST-Half is a biased version of MNIST-Addition introduced in Marconato et al. (2024), focusing on digits from 0 to 4. It includes digit combinations like 0+0=0, 0+1=1, 2+3=5, and 2+4=6. Unlike MNIST-Even-Odd, two digits (0 and 1) are not influenced by reasoning shortcuts, while digits 2, 3, and 4 can be predicted differently. The dataset consists of 2,940 fully annotated samples in the training set, 840 samples in the validation set, and 420 samples in the test set. Additionally, there are 1,080 samples in the out-of-distribution test set, covering remaining sums with the included digits.
The Kandinsky dataset, introduced by Müller and Holzinger in 2021, features visual patterns inspired by the works of Wassily Kandinsky. Each pattern is constructed with geometric figures and includes two main concepts: shape and color. This dataset offers a variant where each image contains a fixed number of figures, each with one of three possible colors (red, blue, yellow) and one of three possible shapes (square, circle, triangle).
In this setting, which is the same as the one presented in Marconato et al. (2024), the task involves predicting the pattern of a third image given two images that share a common pattern. During inference, a model, such as the NeSy model mentioned in the experiments, computes a series of predicates like "same_cs" (same color and shape) and "same_ss" (same shape and same color). The model needs to select the third image that completes the pattern based on these computed predicates. For example, if the first two images have different colors, the model should choose the option that aligns with the observed pattern. This dataset presents a challenging task that tests a model's ability to generalize and infer relationships between visual elements.
-
The code structure is similar to Marconato et al. (2024) bears:
backbonescontains the architecture of the NNs used.datashould contain the data.datasetscointains the dataset classes used for evaluation. If you want to add a dataset it has to be located here.modelscontains all models used to benchmark the presence of RSs. Here, you can find DPL, LTN, CBMs, standard NNs and CLIP.utilscontains the training loop, the losses, the metrics and (only wandb) loggers. Utils also containstcav, the classes used to extract tcav scores out of neural models andtcav/notebookfor evaluation.notebookscontains some notebooks for evaluation.preprocessingcontains the classes used for CLIP embedding preprocessing.run_start.shto run a single experiment.
To get started with training your models, navigate to the rss directory and use the following commands. Adjust the hyperparameters to suit your specific needs.
DPL Model on MNIST-Even-Odd:
python main.py --dataset shortmnist --model mnistdpl --n_epochs 2 --lr 0.001 --seed 0 \
--batch_size 64 --exp_decay 0.9 --c_sup 0 --task addition --backbone conceptizerThis command runs the DPL model on the MNIST-Even-Odd dataset. You can modify the hyperparameters like --n_epochs or --lr for different training conditions.
LTN Model on MNIST-Even-Odd:
python main.py --dataset shortmnist --model mnistltn --n_epochs 2 --lr 0.001 --seed 0 \
--batch_size 64 --exp_decay 0.9 --c_sup 0 --task addition --backbone conceptizerExecute this to train the LTN model on the MNIST-Even-Odd dataset. Customize the parameters as needed to better suit your model's requirements.
CBM Model on MNIST-Even-Odd:
python main.py --dataset shortmnist --model mnistcbm --n_epochs 2 --lr 0.001 --seed 0 \
--batch_size 64 --exp_decay 0.9 --c_sup 0.05 --task addition --backbone conceptizerThis command is for running the CBM model on the MNIST-Even-Odd dataset. The --c_sup parameter is set to 0.05 here, so as to give few concept supervision to the model. You can adjust it based on your experiment needs.
NN Model on MNIST-Even-Odd:
python main.py --dataset shortmnist --model mnistnn --n_epochs 2 --lr 0.001 --seed 0 \
--batch_size 64 --exp_decay 0.9 --c_sup 0.05 --task addition --backbone neuralRun the NN model on MNIST-Even-Odd with this command. Notice that the --backbone is set to neural.
CLIP Model on MNIST-Even-Odd:
python main.py --dataset clipshortmnist --model mnistnn --n_epochs 2 --lr 0.001 --seed 0 \
--batch_size 64 --exp_decay 0.9 --c_sup 0 --task addition --backbone neural --jointUse this to execute the CLIP model on the MNIST-Even-Odd dataset. The dataset here is preprocessed with CLIP embeddings (clipshortmnist), while the model parameter remains mnistnn.
To evaluate different models or datasets, follow this pattern:
--datasetshould be set to the dataset you're testing, likeshortmnistorclipshortmnist.--modelshould match the dataset's prefix plus the technique (dpl,ltn,cbm,nn).- Use
--backbone conceptizerfordpl,ltn, andcbmmodels. - Use
--backbone neuralfor thennmodel. - For CLIP, set
--modeltomnistnnbut choose a dataset with aclipprefix, likeclipshortmnist.
To evaluate your model, start by training several instances with different seed values. This will ensure a robust evaluation by averaging results across various seeds. We provide an easy-to-use notebook in the notebooks directory for this purpose. You can find the evaluation notebook here. Simply follow the instructions within the notebook to assess your model's performance.
For NN/CLIP models, concept-level metrics must be extracted this way instead of using evaluate.ipynb:
- Train model
- Run TCAV main.py
- Run analysis.ipynb
- Extract Concept Acc, F1, Collapse
Our repository also supports hyperparameter tuning using a Bayesian search strategy. To begin tuning, use the --tuning flag:
python main.py --dataset shortmnist --model mnistdpl --n_epochs 20 --lr 0.001 \
--batch_size 64 --exp_decay 0.99 --c_sup 0 --checkout --task addition --proj_name MNIST-DPL --tuning --val_metric f1This command runs a Bayesian hyperparameter search, optimizing for the F1 score under the project name MNIST-DPL. The --tuning flag triggers the tuning process, and wandb is used to log the performance of different hyperparameter configurations. You must log in to wandb to use this feature, where you can monitor the hyperparameter performance on their platform. The example provided tunes the hyperparameters for the DPL model on the MNIST-Even-Odd dataset. Note that the seed value is intentionally left unspecified to allow for variability in tuning.
To learn more about the available command-line arguments, use the --help option:
python main.py --helpThis command provides detailed information on the different options you can use with the main.py script, helping you to customize your model training and evaluation processes further.
For all kind of problems do not hesitate to contact me. If you have additional mitigation strategies that you want to include as for others to test, please send me a pull request.
To see the Makefile functions, simply call the appropriate help command with GNU/Make
make helpThe Makefile provides a simple and convenient way to manage Python virtual environments (see venv).
In order to create the virtual enviroment and install the requirements be sure you have the Python 3.9 (it should work even with more recent versions, however I have tested it only with 3.9)
make env
source ./venv/reasoning-shortcut/bin/activate
make installRemember to deactivate the virtual enviroment once you have finished dealing with the project
deactivateThe automatic code documentation is provided Sphinx v4.5.0.
In order to have the code documentation available, you need to install the development requirements
pip install --upgrade pip
pip install -r requirements.dev.txtSince Sphinx commands are quite verbose, I suggest you to employ the following commands using the Makefile.
make doc-layout
make docThe generated documentation will be accessible by opening docs/build/html/index.html in your browser, or equivalently by running
make open-docHowever, for the sake of completeness one may want to run the full Sphinx commands listed here.
sphinx-quickstart docs --sep --no-batchfile --project bears--author "X" -r 0.1 --language en --extensions sphinx.ext.autodoc --extensions sphinx.ext.napoleon --extensions sphinx.ext.viewcode --extensions myst_parser
sphinx-apidoc -P -o docs/source .
cd docs; make htmlThis code is adapted from Marconato et al. (2024) bears.


