2026-organism-tractability

This repository is an archived code snapshot accompanying the paper “A tractability atlas for experimental organism selection”.

Paper (DOI): 10.57844/arcadia-h2nn-w619
Paper (publication page): The Stacks publication page
Raw dataset (Zenodo): Zenodo dataset DOI
Interface (searchable table): organism-tractability-data.arcadiascience.com
Visual map: organism-tractability.arcadiascience.com

We plan to refresh the data periodically; this code repo is not intended to be updated.

What this code does

Given a list of organisms, this code fetches tractability features across four domains:

Community
Logistics
Throughput
Tooling

Features are defined in:

src/organism_tractability/db/feature_metadata/feature_metadata.yaml

Setup

This repo uses uv and requires Python >=3.12.

Install uv using the official instructions: https://docs.astral.sh/uv/getting-started/installation/.

On macOS with Homebrew, for example:

brew install uv
uv sync
source .venv/bin/activate

The commands above assume a Unix-like shell (bash/zsh). You can also run commands without activating the virtual environment by prefixing with uv run.

API keys

Populate required keys in .env (see .env.example).

Source ID	Required key(s)
`ncbi`	`NCBI_API_KEY`, `NCBI_API_EMAIL`
`atcc`	`FIRECRAWL_API_KEY`
`exa_answer`	`EXA_API_KEY`
`protocols_io`	`PROTOCOLS_IO_API_CLIENT_ACCESS_TOKEN`
`nih_reporter`	None

Input CSV contract

The features pipeline reads a CSV with these columns:

organism_scientific_name: e.g. Escherichia coli
organism_id: taxonomy id (integer). UniProt taxonomy id == NCBI taxonomy id (taxid).

Example input file:

input/example_organisms.csv

Run: fetch all features for all organisms

python -m organism_tractability.db.cli get-features \
  --input input/example_organisms.csv \
  --output output/features.csv

You can optionally restrict sources:

python -m organism_tractability.db.cli get-features \
  --input input/example_organisms.csv \
  --output output/features.csv \
  -s ncbi -s protocols_io

When restricting sources with -s/--source-ids, only credentials for the selected sources are required.

The implementation lives in:

src/organism_tractability/db/features/pipeline.py (FeaturesPipeline.run_csv)

Output CSV contract

The pipeline writes one row per (organism, feature) pair.

Output columns:

organism_id: taxonomy id
feature_id
source_id
fetched_object: JSON string (the raw returned object)

Example output row (illustrative):

organism_id,feature_id,source_id,fetched_object
562,pubmed,ncbi,"{""search_url"":""https://pubmed.ncbi.nlm.nih.gov/?term=%22Escherichia%22%20AND%20%22coli%22&sort=date&ac=yes"",""count"":123456}"

For a committed example output file, see:

output/features.csv

The Zenodo dataset linked above is the canonical reference output for this archived snapshot.

Run sources directly (per organism)

Each source can also be queried directly:

python -m organism_tractability.sources.cli get-ncbi -n "Escherichia coli" -i 562
python -m organism_tractability.sources.cli get-atcc -n "Escherichia coli"
python -m organism_tractability.sources.cli get-nih-reporter -n "Escherichia coli"
python -m organism_tractability.sources.cli get-protocols-io -n "Escherichia coli"
python -m organism_tractability.sources.cli get-exa-answer -n "Escherichia coli"

Sources

Source	Notes	Required API key(s)
NCBI	Entrez E-utilities searches across multiple NCBI databases	`NCBI_API_KEY`, `NCBI_API_EMAIL`
ATCC	Scrapes ATCC search + product pages	`FIRECRAWL_API_KEY`
Exa Answer	Web search + LLM answer w/ citations + confidence. Output is non-deterministic across runs/time.	`EXA_API_KEY`
protocols.io	Searches public protocols	`PROTOCOLS_IO_API_CLIENT_ACCESS_TOKEN`
NIH RePORTER	Searches NIH-funded projects	None

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
input		input
output		output
src/organism_tractability		src/organism_tractability
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

2026-organism-tractability

What this code does

Setup

API keys

Input CSV contract

Run: fetch all features for all organisms

Output CSV contract

Run sources directly (per organism)

Sources

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

2026-organism-tractability

What this code does

Setup

API keys

Input CSV contract

Run: fetch all features for all organisms

Output CSV contract

Run sources directly (per organism)

Sources

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages