This repository contains code to reproduce the paper
If You Like Shapley Then You’ll Love the Core
for the ML Reproducibility Challenge 2022.
We use Python version 3.10 for this repository.
We use Poetry for dependency management. More specifically version 1.2.0.
After installing Poetry, run the following command to create a virtual environment and install all dependencies:
poetry installYou can then activate the virtual environment using:
poetry shellWe use DVC to run the experiments and track their results.
To reproduce all results use:
dvc reproTo reproduce the results of this experiment use:
dvc repro feature-valuation-least-coreYou can find the results under output/feature_valuation_least_core.
To reproduce the results of this experiment use:
dvc repro data-valuation-syntheticYou can find the results under output/data_valuation_synthetic.
Note: This experiment requires downloading the imagenet-1k dataset from HuggingFace Datasets. For that you need to first create an account and then login using the huggingface-cli tool.
To reproduce the results of this experiment use:
dvc repro data-valuation-dog-vs-fishYou can find the results under output/data_valuation_dog_vs_fish.
To reproduce the results of this experiment use:
dvc repro fixing-mislabeled-dataYou can find the results under output/fixing_mislabeled_data.
To reproduce the results of this experiment use:
dvc repro noisy-dataYou can find the results under output/noisy_data.
Make sure to install the pre-commit hooks:
pre-commit install