Skip to content

feat: public parse_fold_chains() for pure-syntactic spec parsing#3

Merged
DimaMolod merged 1 commit into
mainfrom
feat/public-parse-fold-chains
May 26, 2026
Merged

feat: public parse_fold_chains() for pure-syntactic spec parsing#3
DimaMolod merged 1 commit into
mainfrom
feat/public-parse-fold-chains

Conversation

@DimaMolod

Copy link
Copy Markdown
Contributor

Summary

Adds parse_fold_chains(spec, protein_delimiter="+") -> List[(name, copies, regions)] — a pure-syntactic parse of a single fold spec into per-chain triples, with no filesystem access and no FeatureIndex lookup.

Why

Downstream tooling (e.g. AlphaPulldownSnakemake length-aware SLURM sizing, KosinskiLab/AlphaPulldownSnakemake#44) needs to know the chain composition of a fold spec at workflow-parse time — before create_features has produced the on-disk feature index that expand_fold_specification / parse_fold require. APS was duplicating the same parse logic; this PR lifts it into the canonical parser package.

What it adds

  • New public function parse_fold_chains in alphapulldown_input_parser.parser, re-exported from the package's __init__.
  • Internally delegates to the existing _extract_copy_and_regions + _parse_regions, so the name[:copies][:region...] rules stay in one place.
  • Chain names are returned verbatim (no path / extension stripping) so callers can normalise as they prefer.
  • Examples:
    parse_fold_chains("A+B")             # [('A',1,all), ('B',1,all)]
    parse_fold_chains("A:2:1-100+B")     # [('A',2,1-100), ('B',1,all)]
    parse_fold_chains("/p/protA.json:2") # name kept verbatim, copies=2

Other

  • Version bumped 0.4.0 -> 0.5.0.
  • README extended with the new helper.
  • 26 tests pass (6 new focused on parse_fold_chains).

🤖 Generated with Claude Code

Adds `parse_fold_chains(spec, protein_delimiter="+") -> List[(name, copies,
regions)]`, exported from `alphapulldown_input_parser`. Pure: no filesystem
access, no FeatureIndex lookup — returns chain composition and copy numbers
*before* features exist on disk. Delegates to the existing internal helpers
`_extract_copy_and_regions` + `_parse_regions`, so all parsing rules stay in
one place.

Use case: downstream tooling (e.g. AlphaPulldownSnakemake) needs to know the
chain composition of a fold spec at workflow-parse time to size SLURM
resources or to filter folds by total length — both run before
`create_features` produces the on-disk feature index that
`expand_fold_specification` / `parse_fold` require.

The chain name is returned verbatim (no path/extension stripping); the caller
can normalise if needed. Region tokens are parsed into `RegionSelection`,
honouring the same `name[:copies][:region...]` AlphaPulldown convention.

Version: 0.4.0 -> 0.5.0. README updated. 26 tests pass (6 new).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@DimaMolod DimaMolod merged commit b24c707 into main May 26, 2026
@DimaMolod DimaMolod deleted the feat/public-parse-fold-chains branch May 26, 2026 11:03
DimaMolod added a commit to KosinskiLab/AlphaPulldownSnakemake that referenced this pull request May 26, 2026
Removes the local `parse_fold_chains` duplicate from common.smk and imports
the canonical pure-syntactic parser from `alphapulldown_input_parser`
(KosinskiLab/alphapulldown-input-parser#3, exposed in v0.5.0). A small
adapter still drops the `regions` part of the (name, copies, regions)
triples since the memory/length-filter logic here counts regions
conservatively at full chain length.

- workflow/rules/common.smk: import + thin (name, copies) shim.
- workflow/envs/alphapulldown.yaml: bump alphapulldown-input-parser>=0.5.0.
- .github/workflows/ci.yml: pip install the parser (PyPI 0.5.0 once released,
  with a fallback to the parser PR branch until then).
- test/test_memory_resources.py: drop the local parse_fold_chains test — it
  tested the duplicate; the parser package owns those tests now. 20 tests pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant