Sustainability GLEIF Matching Network

Portfolio-grade entity resolution and ownership-network pipeline for tracing where sustainability pledges end up in the GLEIF graph, with a deployed explainer page and auditable matching artifacts.

Open the deployed page at pledges.faizkrisnadi.com.

What This Repo Does

Parses and cleans GLEIF Level 1 and Level 2 entity data.
Builds a sustainability source table from initiative lists and local inputs.
Matches sustainability entities to GLEIF records with review and unmatched buckets.
Constructs ownership-network outputs and sample artifacts for inspection and validation.
Ships a public HTML explainer in sustainability_funnel_v2.html.

Latest Run Evidence

Level 1 rows read/written: 3,219,530 / 3,219,530
Level 2 rows read/written: 634,561 / 634,561
Matching: n_source=14,180, n_auto=7,641, n_review=396, n_unmatched=6,143

Key Files

sustainability_funnel_v2.html
src/cli.py
src/preprocess_gleif.py
src/build_sustainability_source.py
src/match_sustainability.py
src/build_network.py
docs/methodology.md
docs/data_dictionary.md
docs/matching_eval.md
docs/network_sanity.md

Outputs

data/processed/gleif_entities_clean.parquet (not committed)
data/processed/edges.csv (not committed)
data/processed/nodes.csv (not committed)
data/processed/nodes_in_network.csv (not committed)
data/processed/match_crosswalk.csv (not committed)
data/samples/match_crosswalk_sample.csv (committed)
data/samples/edges_sample.csv (committed)
data/samples/nodes_sample.csv (committed)
data/samples/nodes_in_network_sample.csv (committed)
data/samples/run_manifest_sample.json (committed)

Quickstart

make setup
make inspect
make preprocess-level1
make preprocess-level2
make build-source
make match
make network

Full Pipeline

make run-all

Equivalent CLI commands:

python -m src.cli inspect-inputs
python -m src.cli preprocess-gleif-level1
python -m src.cli preprocess-gleif-level2 --parse-repex
python -m src.cli build-sustainability-source
python -m src.cli match-sustainability
python -m src.cli build-network
python -m src.cli run-all

Quality And Validation

Matching evaluation workflow: tools/sample_matching_eval.py and tools/generate_matching_eval_report.py
Matching quality report: docs/matching_eval.md
Network sanity report: docs/network_sanity.md
Matching precision is only computed after manual labels are added in data/samples/matching_eval_labels.csv; otherwise reports remain PENDING MANUAL LABELS

Limitations

Raw GLEIF and sustainability-source files are local and not committed.
Initiative lists may be partial, for example RE100 first100.
Fuzzy matching produces review and unmatched cases and should not be treated as ground truth without manual validation.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
docs		docs
logs		logs
src		src
tests		tests
tools		tools
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
RESTRUCTURE_LOG.md		RESTRUCTURE_LOG.md
config.example.yaml		config.example.yaml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sustainability GLEIF Matching Network

What This Repo Does

Latest Run Evidence

Key Files

Outputs

Quickstart

Full Pipeline

Quality And Validation

Limitations

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Sustainability GLEIF Matching Network

What This Repo Does

Latest Run Evidence

Key Files

Outputs

Quickstart

Full Pipeline

Quality And Validation

Limitations

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages