Participatory AI audit tool for evaluating vision-language models (CLIP) in real community contexts. Participants use a semantic image retrieval interface to design a leaflet, selecting and ranking images returned by CLIP. Their choices are compared against the FAIR fairness metric to test whether it predicts human behaviour.
pip install -r requirements.txt
# First time only — generate annotations.json from the curated image CSV
python scripts/prepare_annotations.py
# First time only — pre-calculate FAIR metrics
python scripts/precalculate_metrics.py \
--curated-folder data/situated-usecase-image-pool-v01/images_v01 \
--distractor-folder data/fhibe_sample/images \
--queries data/queries.json \
--annotations data/annotations.json \
--output data/metrics/query_metrics.json
# Start the server
python server.py \
--curated-folder data/situated-usecase-image-pool-v01/images_v01 \
--distractor-folder data/fhibe_sample/images- Enter a nickname, then start.
- Type a prompt — CLIP retrieves and ranks images from the pool (160 curated + 120 distractors).
- Select up to 9 images.
- Drag to rank them (1 = most relevant), then submit.
- A flipbook is generated with the chosen images.
Go to http://127.0.0.1:8080/admin
- Start recording before participants begin — fill in workshop name and community context.
- Stop recording when done.
- Download CSV at any time to export all session data (nickname, prompt, model rank, participant rank).
Each workshop's data is kept separate in the database.
python metrics/analyze_outcomes.py \
--metrics data/metrics/query_metrics.json \
--db data/rankings.db \
--output results/metric_alignment_analysis.csv \
--plots results/correlation_plotsleaflet_design/
├── server.py # FastAPI server
├── retrieval.py # CLIP embedding + retrieval engine (ViT-B-16)
├── requirements.txt
│
├── metrics/
│ ├── fair_calculator.py # FAIR metric
│ └── analyze_outcomes.py # Post-workshop Spearman correlation analysis
│
├── scripts/
│ ├── prepare_annotations.py # CSV → data/annotations.json
│ ├── precalculate_metrics.py # Pre-calculate FAIR before workshops
│ └── test_swap.py # End-to-end smoke test
│
├── tests/
│ └── test_fair_calculator.py
│
├── data/
│ ├── situated-usecase-image-pool-v01/
│ │ ├── images_v01/ # 160 curated images
│ │ └── intervisions_annotations_v01.csv
│ ├── fhibe_sample/images/ # 120 distractor images
│ ├── annotations.json # Generated by prepare_annotations.py
│ ├── queries.json # Workshop prompts
│ ├── desired_distribution.json # Target demographic proportions
│ ├── rankings.db # SQLite: all session data
│ └── metrics/
│ └── query_metrics.json # Pre-calculated FAIR scores
│
└── static/ # Frontend (participant UI, admin, flipbook)
Funded by the European Union. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Education and Culture Executive Agency (EACEA). Neither the European Union nor EACEA can be held responsible for them.
