Skip to content

Support AF3 JSON inputs (ligands) in folds; don't treat them as proteins (#41)#45

Merged
DimaMolod merged 1 commit into
mainfrom
fix/af3-json-ligand-inputs
Jun 3, 2026
Merged

Support AF3 JSON inputs (ligands) in folds; don't treat them as proteins (#41)#45
DimaMolod merged 1 commit into
mainfrom
fix/af3-json-ligand-inputs

Conversation

@DimaMolod

Copy link
Copy Markdown
Collaborator

Fixes #41.

Problem

Running a mixed fold like <UNIPROTID>+ligand.json:80 (AF2 .pkl protein features + an AF3 ligand.json, AF3 backend) failed: a snakemake -n dry-run scheduled download_uniprot ligand and create_features ligand.pkl.xz, i.e. the .json token was treated as a protein to fetch/build features for.

Root cause was two layers deep:

  1. The parser's FoldDataset stripped the .json extension during normalization, so ligand.json arrived here as ligand — fixed in Preserve .json tokens in FoldDataset normalization (0.5.1) alphapulldown-input-parser#4 (0.5.1).
  2. This workflow then derived its download_uniprot / create_features / inference-input targets without distinguishing JSON inputs from proteins.

Standalone AlphaPulldown already supports this (CLI uses parse_fold; backend has a golden-tested protein+ligand path) — the gap was only in the Snakemake wrapper, as noted in the issue.

Changes

workflow/rules/common.smk (pure, unit-tested):

  • is_json_input(name) — detect *.json tokens.
  • split_fold_inputs(fold) — partition a fold into protein chains vs direct AF3 JSON inputs.
  • format_af3_requested_fold(fold) — moved here from the Snakefile and made JSON-aware: proteins → <base>_af3_input.json; *.json passed through unchanged (previously it produced ligand.json_af3_input.json).
  • chain_residue_count reads JSON inputs (ligand-only → 0).

workflow/Snakefile:

  • Feature targets (kept_proteins / required_feature_paths), lookup_features, and the length filter now go through split_fold_inputs. JSON inputs are required as the JSON file itself (provided via feature_directorysymlink_features) and are never downloaded or generated.
  • symlink_features made idempotent (skips already-correct links, handles broken links).

Pin: requires the parser's .json-preserving normalization → alphapulldown-input-parser>=0.5.1 (env, CI, README).

Verification

snakemake -n on P12345+ligand.json:80:

before after
jobs download_uniprot, create_features, symlink_features, structure_inference symlink_features, structure_inference
ligand downloaded + create_features ligand.pkl.xz symlinked ligand.json, used directly
  • AF2 features + AF3 backend: inference inputs P12345.pkl.xz, ligand.json.
  • Pure AF3: --input P12345_af3_input.json+ligand.json:80 (ligand not double-suffixed).

Unit tests added for all new helpers; full suite green (25 passed).

Note on ordering

The runtime fix needs parser 0.5.1 on PyPI (KosinskiLab/alphapulldown-input-parser#4). CI here pins >=0.5.1, so it will go green once 0.5.1 is published. The added unit tests cover the workflow helpers directly and pass regardless.

🤖 Generated with Claude Code

A `*.json` token in a fold (e.g. a ligand `ligand.json`) is a direct AlphaFold 3
input supplied via feature_directory. The workflow was treating it as a protein:
it scheduled `download_uniprot` + `create_features` for it and required a
`<name>.pkl`/`<name>_af3_input.json` that never exists, so mixed protein+ligand
folds failed (#41).

Changes:
- common.smk: add `is_json_input`, `split_fold_inputs` (partition a fold into
  protein chains vs direct JSON inputs), and `format_af3_requested_fold` (moved
  here from the Snakefile and made JSON-aware: protein -> `<base>_af3_input.json`,
  `*.json` passed through unchanged). `chain_residue_count` reads JSON inputs.
- Snakefile: derive feature targets, `lookup_features`, and the length filter via
  `split_fold_inputs`, so `*.json` inputs are required as the JSON file itself
  (provided via symlink_features) and never downloaded/generated.
- Requires the parser's `.json`-preserving normalization; pin
  alphapulldown-input-parser>=0.5.1 (env, CI, README).
- Unit tests for the new helpers; symlink_features made idempotent (re-symlink,
  handle broken links).

Verified with `snakemake -n` for both AF2-features+AF3-backend and pure-AF3:
the ligand JSON is symlinked and passed to inference; no download/create jobs.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@DimaMolod DimaMolod merged commit e774b0e into main Jun 3, 2026
2 of 4 checks passed
@DimaMolod DimaMolod deleted the fix/af3-json-ligand-inputs branch June 3, 2026 11:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Mixing features from AF2 and AF3 (ligands related)

1 participant