https://campaignlab.github.io/2026ElectionGenderAnalysis/story.html
https://campaignlab.github.io/2026ElectionGenderAnalysis
An interactive one-page dashboard analysing the gender breakdown of candidates and elected councillors in the May 2026 England local elections.
- Summary statistics — overall % female candidates vs elected; incumbent retention rate
- Seat changes panel — always-visible panel showing incumbents who stood, re-elected, defeated, and new councillors elected; updates on council or region selection
- Choropleth map — gender balance by local authority, with a bivariate colour scheme that distinguishes high-confidence areas (strong hue) from areas where many candidates' genders are unresolved (washed-out/grey)
- Party breakdown — stacked 100% bar charts for all parties with ≥ 30 candidates, candidates and elected side-by-side. Click any bar to open a party detail panel showing:
- 6-tile stat grid: total candidates, female candidates, seat-slots contested, elected total, female elected (+ win rate), male elected (+ win rate)
- Win-rate context sentences: female win rate vs other parties; vs national average; female vs male win rate within the same party
- When a council is selected on the map: a scrollable table of that party's candidates in the council (name, ward, votes, result, gender, confidence)
- Regional breakdown — same charts by ONS NUTS1 region
- Council table — sortable, filterable table with candidate counts, % female, elected counts, and turnout; unknown-gender warnings surface where data quality is low
~71% of candidates in the source data had no recorded gender. Genders were predicted using a four-tier cascade:
| Method | Coverage | Notes |
|---|---|---|
| Existing (from source data) | ~29% | Taken as ground truth; normalised to male/female/nonbinary |
| gender_guesser | ~53% | Open-source Python library; uses a compiled international names database; great_britain locale; mostly_* results accepted at low confidence |
| ONS baby names | ~2% | Falls back to the ONS historical top-100 baby names dataset (1904–2024) when gender_guesser returns andy/unknown; birth year is used to select the closest decade, so time-varying names (e.g. "Ashley") are handled appropriately |
| Claude AI | ~11% | Names still unresolved after the above steps were sent to Claude (Anthropic) for prediction; stored with claude method tag and a confidence level |
| Unknown | ~<0.3% | Primarily non-Western names not covered by any of the above sources |
Predictions are stored in genders.csv (keyed by person_id + surname) and are not written back to the source data.
Incumbent status (whether a 2026 candidate was a sitting councillor in the same ward and council) is determined by matching against 2025 sitting councillor data from opencouncildata.co.uk. Matching requires:
- Council name match — prefix/suffix normalisation (strips "London Borough of", "Borough Council" etc.).
- Ward name match — exact, then fuzzy (
difflib.get_close_matches, cutoff 0.6) with first-word constraint to prevent cross-area false positives. - Full-name fuzzy match —
SequenceMatcherratio ≥ 0.80, to handle middle names / minor spelling differences.
The inc field is omitted from ward JSON when False (not an incumbent) to keep file sizes small. Output stats: inc_total, inc_elected, inc_defeated, new_elected, inc_retention_pct.
Limitation: incumbents who chose not to re-stand cannot be identified — "Defeated" means stood and lost.
dc_data.csv Source data from Democracy Club
genders.csv Predicted genders (generated)
historicalnames2024.xlsx ONS baby names historical dataset (local, not committed to git)
LAD_MAY_2025_UK_BUC_*.geojson ONS LAD boundaries (local, not committed to git)
scripts/
parse_ons_data.py Parse historicalnames2024.xlsx → scripts/data/ons_lookup.json
assign_genders.py Produce genders.csv from dc_data.csv + ons_lookup.json
identify_incumbents.py Match 2026 candidates against 2025 sitting councillors
→ scripts/data/incumbents.json + scripts/data/ward_fuzzy_log.json
build_data.py Aggregate data → docs/data/councils.json + wards/*.json
docs/ GitHub Pages root
index.html
style.css
app.js
data/
councils.json Pre-aggregated data (generated by build_data.py);
includes per-party win rates, seat counts, and
comparison stats pre-computed at build time
LAD_boundaries.geojson Boundary file (generated by build_data.py)
wards/
{slug}.json One file per council (~156 files); ward-level
candidate lists loaded on demand when a council
is selected on the map
requirements.txt Python dependencies
# 1. Install Python dependencies
py -m pip install -r requirements.txt
# 2. Build the ONS name lookup (requires historicalnames2024.xlsx in project root)
py scripts/parse_ons_data.py
# 3. Predict genders
py scripts/assign_genders.py
# 4. Build the frontend data files
py scripts/build_data.py
# 5. Serve locally
py -m http.server 8080 --directory docs
# → open http://localhost:8080Incumbency data
- opencouncildata.co.uk — 2025 sitting councillor composition, used for incumbency matching.
Election data
- Democracy Club candidate and results data, May 2026. https://democracyclub.org.uk — used under the terms of the Democracy Club open data licence.
Gender prediction
gender_guesserPython library (Carsten Steinkamp, MIT licence). https://github.com/lead-ratings/gender-guesser- Office for National Statistics, Baby names in England and Wales: historical data, 1904–2024. https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/livebirths/datasets/babynamesenglandandwalestop100babynameshistoricaldata Licensed under the Open Government Licence v3.0.
- Claude (Anthropic) used for residual name classification not resolved by the automated pipeline.
Boundaries
- Office for National Statistics, Local Authority Districts (May 2025) Boundaries UK BUC. https://geoportal.statistics.gov.uk/datasets/ons::local-authority-districts-may-2025-boundaries-uk-bfe-v2/ Source: Office for National Statistics licensed under the Open Government Licence v3.0. Contains OS data © Crown copyright and database right [2026].