Support AF3 JSON inputs (ligands) in folds; don't treat them as proteins (#41)#45
Merged
Merged
Conversation
A `*.json` token in a fold (e.g. a ligand `ligand.json`) is a direct AlphaFold 3 input supplied via feature_directory. The workflow was treating it as a protein: it scheduled `download_uniprot` + `create_features` for it and required a `<name>.pkl`/`<name>_af3_input.json` that never exists, so mixed protein+ligand folds failed (#41). Changes: - common.smk: add `is_json_input`, `split_fold_inputs` (partition a fold into protein chains vs direct JSON inputs), and `format_af3_requested_fold` (moved here from the Snakefile and made JSON-aware: protein -> `<base>_af3_input.json`, `*.json` passed through unchanged). `chain_residue_count` reads JSON inputs. - Snakefile: derive feature targets, `lookup_features`, and the length filter via `split_fold_inputs`, so `*.json` inputs are required as the JSON file itself (provided via symlink_features) and never downloaded/generated. - Requires the parser's `.json`-preserving normalization; pin alphapulldown-input-parser>=0.5.1 (env, CI, README). - Unit tests for the new helpers; symlink_features made idempotent (re-symlink, handle broken links). Verified with `snakemake -n` for both AF2-features+AF3-backend and pure-AF3: the ligand JSON is symlinked and passed to inference; no download/create jobs. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #41.
Problem
Running a mixed fold like
<UNIPROTID>+ligand.json:80(AF2.pklprotein features + an AF3ligand.json, AF3 backend) failed: asnakemake -ndry-run scheduleddownload_uniprot ligandandcreate_features ligand.pkl.xz, i.e. the.jsontoken was treated as a protein to fetch/build features for.Root cause was two layers deep:
FoldDatasetstripped the.jsonextension during normalization, soligand.jsonarrived here asligand— fixed in Preserve .json tokens in FoldDataset normalization (0.5.1) alphapulldown-input-parser#4 (0.5.1).download_uniprot/create_features/ inference-input targets without distinguishing JSON inputs from proteins.Standalone AlphaPulldown already supports this (CLI uses
parse_fold; backend has a golden-tested protein+ligand path) — the gap was only in the Snakemake wrapper, as noted in the issue.Changes
workflow/rules/common.smk(pure, unit-tested):is_json_input(name)— detect*.jsontokens.split_fold_inputs(fold)— partition a fold into protein chains vs direct AF3 JSON inputs.format_af3_requested_fold(fold)— moved here from the Snakefile and made JSON-aware: proteins →<base>_af3_input.json;*.jsonpassed through unchanged (previously it producedligand.json_af3_input.json).chain_residue_countreads JSON inputs (ligand-only → 0).workflow/Snakefile:kept_proteins/required_feature_paths),lookup_features, and the length filter now go throughsplit_fold_inputs. JSON inputs are required as the JSON file itself (provided viafeature_directory→symlink_features) and are never downloaded or generated.symlink_featuresmade idempotent (skips already-correct links, handles broken links).Pin: requires the parser's
.json-preserving normalization →alphapulldown-input-parser>=0.5.1(env, CI, README).Verification
snakemake -nonP12345+ligand.json:80:download_uniprot,create_features,symlink_features,structure_inferencesymlink_features,structure_inferencecreate_features ligand.pkl.xzligand.json, used directlyP12345.pkl.xz, ligand.json.--input P12345_af3_input.json+ligand.json:80(ligand not double-suffixed).Unit tests added for all new helpers; full suite green (
25 passed).Note on ordering
The runtime fix needs parser 0.5.1 on PyPI (KosinskiLab/alphapulldown-input-parser#4). CI here pins
>=0.5.1, so it will go green once 0.5.1 is published. The added unit tests cover the workflow helpers directly and pass regardless.🤖 Generated with Claude Code