refactor(core): introduce staged pipeline#18
Conversation
Review: refactor/staged-core-pipelineFiles changed: 37 in the PR diff against Target spec alignmentThe branch matches the #8 staged Core target closely:
Findings
No critical, high, or medium issues found. VerificationStack / CI notes
Written by an agent on behalf of Willow. |
a1f79d2 to
f535fcd
Compare
a26f136 to
89688a9
Compare
Add Attributes sections to all 22 dataclasses in contracts.py documenting every field's type, default, valid values, and scientific meaning. Add docstrings to all type aliases (ProvenanceSource, CodeName, CalcTask, etc.). Add class and per-field docstrings to PseudoMetadata and PseudoPolicy. Refs #19.
Add 14 focused documentation files covering the staged Core pipeline: - docs/provenance.md: source types, warning propagation, interpretation guide - docs/conventions.md: units, defaults, heuristics, k-spacing convention, SOC policy - docs/stages/analyze.md: composition, element classification, symmetry, electronic character, edge cases - docs/stages/advise.md: decision trees, defaults, hint precedence, validation - docs/stages/select.md: k-point resolution, pseudo ranking key, cutoff extraction, warnings - docs/stages/generate.md: QE sections, value sources, SOC flag logic, error conditions - docs/stages/bundle.md: directory layout, manifest schema, path traversal protection - docs/serialization.md: to_dict contract, type conversions, example output - docs/cli.md: complete flag reference, boolean options, output formats - docs/tutorial.md: start-to-finish Python walkthrough, common patterns - docs/extension.md: adding generators, codes, advisors, what not to extend - docs/migration.md: removed paths, renamed types, behavior changes - docs/decisions.md: design rationale for non-obvious choices - docs/changelog.md: 0.1.0 refactor entry Refs #19.
…sh stage - Add Pipeline dataclass with typed callable fields for each stage - Add default_pipeline() factory with built-in implementations - Add pipeline parameter to run_core_job(), recommend(), generate(), write_bundle() - Extract Kmesh as a first-class stage between Advise and Select - Move k-point resolution from Select into resolve_kpoints_from_advice() - Select now receives KPointSelection from Kmesh instead of deriving it - Add ml_kmesh_advisor(spec) factory for model-backed Kmesh backends - Add --model, --model-name, --model-version CLI flags - Fix k-point conflict warning propagation through Kmesh stage - CoreJobRequest stays data-only; backend choice lives in Pipeline - Add extensive docs: pipeline.md, backends.md, contracts.md, stages/kmesh.md - Update all existing docs for Kmesh stage and Pipeline injection - Add tests for Pipeline composition, Kmesh hint precedence, ML backends, CLI model flag Closes #20
|
Pushed What changed:
Tests: 78 passed, ruff clean, pre-commit clean. Design note for future work: Kmesh is currently a hardcoded pipeline layer. The next evolution should make advise→select stages individually pluggable so that different prediction backends (k-points, smearing, pseudopotentials, etc.) can be swapped independently. The Written by an agent on behalf of Willow. |
Summary
Introduces the staged Goldilocks Core pipeline with composable stage backends and a first-class Kmesh stage, making it the shared execution path for Python callers, the staged CLI, and future HTTP wrappers.
The Core graph is explicit and inspectable:
Every stage backend is swappable through the
Pipelineobject — no registries, no base classes, no string resolution in Core. Backends are plain functions composed into a dataclass.CoreJobRequeststays data-only and serializable.Why
The refactor prevents Python, CLI, and future HTTP entry points from growing separate recommendation logic. It also makes the k-point resolution path a first-class backend seam so ML models can participate in the pipeline without hacks.
Composable Pipeline
Pipelineis a dataclass with one callable per stage.default_pipeline()returns the built-in composition. Callers customize withdataclasses.replace():CoreJobRequestcarries what to compute.Pipelinecarries how to compute it. This separation keeps the request JSON-safe while letting Python callers swap backends directly.Kmesh stage
K-point resolution was previously split across Advise (intent) and Select (concrete grid). ML k-point prediction returns a
KPointSelection, notKPointAdvice, so there was no clean seam to plug it in.The new Kmesh stage sits between Advise and Select:
resolve_kpoints_from_advice): hint precedence (k_grid > k_spacing > advice spacing), then conversion to mesh.ml_kmesh_advisor(spec)): hints still win, then model prediction, provenancesource="model".Callable[[Structure, CalculationHints, KPointAdvice], KPointSelection].Select now receives the Kmesh result instead of deriving k-points internally.
CLI
--modelThe CLI resolves
--modelinto aModelSpec→ml_kmesh_advisor(spec)→Pipeline. The model path is never onCoreJobRequest.Staged changes
Stage 1 — Contracts and provenance
Attributes:docstrings on all 22 dataclasses + 13 type aliases incontracts.py.PseudoMetadataandPseudoPolicyuseslots=True.ProvenanceSourceliteral type with per-value documentation.Stage 2 — Pipeline implementation
StructureAnalysisRecordwith composition, element classification, symmetry, electronic character, disorder warnings.ParameterAdvicewith provenance-backed k-points, smearing, magnetism, SOC, pseudopotential intent, convergence.SelectionRecordwith deterministic pseudo ranking and cutoff extraction.manifest.json.goldilocks-core recommend|generate|bundle.Stage 3 — Composable Pipeline (this PR, layered on stages 1–2)
Pipelinedataclass withAnalyzeStage,AdviseStage,KMeshAdvisor,SelectStage,GenerateStage,BundleStagecallable fields.default_pipeline()factory.pipelineparameter onrun_core_job(),recommend(),generate(),write_bundle().resolve_kpoints_from_advice()extracted from Select into Kmesh stage.ml_kmesh_advisor(spec)factory for model-backed Kmesh backends.--model/--model-name/--model-versionCLI flags.StageRecordin job results.docs/pipeline.md,docs/backends.md,docs/contracts.md,docs/manifest.md,docs/stages/kmesh.md.Acceptance criteria
run_core_job(request)with no pipeline argument produces identical output to previous implementationrun_core_job(request, pipeline=custom_pipeline)uses provided backendsml_kmesh_advisor(spec)producessource="model"provenancek_grid > k_spacing > strategy fallbackCoreJobRequestremains fully data-only and serializableOpen design note
Kmesh is currently a hardcoded layer in the pipeline graph. Longer term, the advise→select stages should also be individually pluggable — not just through
Pipelinefield replacement, but potentially through a more dynamic backend system that supports multiple prediction backends for different areas (k-points, smearing, pseudopotentials, etc.). The currentPipelinedesign is a solid foundation for this: each stage is already a standalone callable with a typed signature. The next evolution would make backend selection more granular and composable without adding registries or string resolution to Core.Closes #8, #12, #13, #14, #15, #16, #17, #19, #20.
Written by an agent on behalf of Willow.