Skip to content

Pipeline-as-Config: Declarative model dispatch replacing model_type string registry#2115

Draft
justinchuby wants to merge 15 commits into
microsoft:mainfrom
justinchuby:pipeline-as-config
Draft

Pipeline-as-Config: Declarative model dispatch replacing model_type string registry#2115
justinchuby wants to merge 15 commits into
microsoft:mainfrom
justinchuby:pipeline-as-config

Conversation

@justinchuby
Copy link
Copy Markdown
Contributor

Summary

Introduces a Pipeline-as-Config architecture that enables new model support through JSON configuration instead of C++ source changes.

Motivation

Currently, every new model architecture requires adding a string to the model_type whitelist and potentially new C++ model/state classes. This creates a bottleneck: the ORT GenAI team must make source changes for every new HuggingFace model.

With Pipeline-as-Config, the runtime becomes a generic pipeline executor. New models are supported by generating ONNX graphs + a v2 pipeline config — zero C++ changes needed.

Design Principle

"Detect, don't declare" — the runtime infers model capabilities from config structure and ONNX session I/O, not from hardcoded model_type strings.

What This PR Adds

1. Config v2 Schema (pipeline_config_schema.h/cpp)

  • Pipeline config with sessions, flow, dataflow, state sections
  • extends mechanism for preset inheritance
  • Built-in presets: autoregressive-decoder, vision-language, encoder-decoder

2. PipelineConfigModel (pipeline_config.h/cpp)

  • Decoder-only: inherits DecoderOnly_Model (zero code duplication)
  • Multi-session: FlowInterpreter orchestrates vision/embedding/decoder sessions
  • Decoder session delegates to DecoderOnly_State via composition

3. v1-to-v2 Translator (v1_translator.h/cpp)

  • Auto-translates existing genai_config.json to v2 pipeline format at load time
  • Full backward compatibility — all existing models continue to work unchanged

4. Config-driven dispatch (model.cpp)

  • Replaces model_type string checks with pipeline config fields
  • position_ids.strategy instead of IsQwen25VL()
  • Encoder session presence instead of model_type=="whisper"

Config Examples

Minimal decoder-only LLM (4 lines):

{"version": 2, "pipeline": {"extends": "autoregressive-decoder", "sessions": {"decoder": {"file": "model.onnx"}}}, "tokens": {"eos": [2]}}

Testing

  • 78+ tests pass (all existing + new pipeline tests)
  • All existing model types tested via v1-to-v2 translator
  • Backward compatible — v1 configs unchanged

Competitive Advantage

This makes ORT GenAI the only inference runtime where adding a new model — including multimodal, multi-session models — requires zero code in any language. Just ONNX graphs + JSON config.

Copilot AI review requested due to automatic review settings May 2, 2026 03:18
justinchuby and others added 12 commits May 2, 2026 03:21
Introduce Pipeline-as-Config: a new model dispatch path that uses
config version instead of model_type string matching. When a config
file has version >= 2, CreateModel() routes to PipelineConfigModel
instead of the string-based dispatch chain.

New files:
- pipeline_config.h: PipelineConfigModel and PipelineConfigState
- pipeline_config.cpp: Implementation reusing existing components
  (DefaultKeyValueCache, DefaultPositionInputs, Logits, etc.)

Modified files:
- config.h: Add version field (default 1 for backward compat)
- config.cpp: Parse top-level 'version' field from JSON
- model.cpp: Add v2 dispatch before existing model_type logic

The PipelineConfigModel supports decoder-only models and produces
identical output to DecoderOnly_Model. All existing v1 configs
continue to route through the existing string dispatch unchanged.

Build: 37 tests passed, 9 skipped (GPU-only)
Net: +150 lines (3 new files, 2 modified)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Justin Chu <justinchu@microsoft.com>
Address review findings by inheriting from DecoderOnly_Model instead
of reimplementing State from scratch. This fixes:

1. Missing logits_.Update() in UpdateInputsOutputs
2. Missing sliding window handling
3. Missing chunking support

By inheriting from DecoderOnly_Model, PipelineConfigModel gets all
current and future behavior for free — no code duplication, no drift.
The class becomes a thin dispatch entry point (3 lines of code) that
future PRs will extend for multi-session pipeline support.

Net: -109 lines (14 added, 123 removed)
Build: 37 tests passed, 9 skipped (GPU-only)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Justin Chu <justinchu@microsoft.com>
Implement PR 2 of Pipeline-as-Config: full v2 schema support with
sessions, flow, dataflow, state, and extends mechanism.

New files:
- pipeline_config_schema.h/.cpp: PipelineConfig struct definitions and
  validation (sessions, flow steps, dataflow wires, state config)
- pipeline_presets.h/.cpp: Built-in presets (autoregressive-decoder,
  vision-language, encoder-decoder) with override/merge semantics
- v1_translator.h/.cpp: Auto-translates legacy v1 configs to
  PipelineConfig so downstream code has a single representation

Modified files:
- config.h: Add PipelineConfig field, include schema header
- config.cpp: Add Pipeline_V2_Element JSON parser for pipeline
  object; resolve presets via extends; auto-translate v1 configs

Tests: 22 new GTest cases covering presets, overrides, validation,
and v1 translation for all model categories.

Signed-off-by: Justin Chu <justinchu@microsoft.com>
1. ApplyOverrides sentinel bug: Changed state.kv_cache.format and
   state.position_ids.strategy from std::string with 'auto' default
   to std::optional<std::string>. This ensures ApplyOverrides only
   copies fields that were explicitly set, not default-initialized.
   Previously, explicitly setting 'auto' to override a non-auto
   preset value would silently fail.

2. Hardcoded dataflow index: v1_translator now searches for the
   vision→embedding wire by session name instead of using
   config.dataflow[0]. Robust against reordering of preset wires.

3. New regression tests:
   - OverrideAutoValueApplies: setting 'auto' overrides non-auto
   - UnsetOverrideDoesNotClobber: unset fields preserve base values

Signed-off-by: Justin Chu <justinchu@microsoft.com>
- Add tests for phi4mm (MMM→vision-language) and marian-ssru
  (→encoder-decoder) translator branches
- Add comprehensive LLM/VLM type coverage (all 21 LLM + all 4 VLM)
- Extract PropagateKVCachePatterns() helper to eliminate copy-paste
  between LLM and VLM translator branches

Signed-off-by: Justin Chu <justinchu@microsoft.com>
Implement PR 3 of Pipeline-as-Config: extend PipelineConfigModel to
handle multi-session pipelines (vision + embedding + decoder).

New components:
- FlowInterpreter: thin orchestration layer that partitions flow steps
  into prompt-phase and decode-phase groups, stores intermediate tensors
  between sessions, and wires outputs to downstream inputs based on
  dataflow config.
- PipelineConfigModel: loads named sessions from pipeline_config,
  creates per-session ORT options (graph capture disabled for non-decoder
  sessions), and constructs FlowInterpreter.
- PipelineConfigState: orchestrates multi-session execution. Decoder
  session reuses existing components (KV cache, position inputs, logits)
  for full parity with DecoderOnly_State. Non-decoder sessions are run
  directly with I/O built from extra inputs + wired intermediates.

Flow execution model:
- Prompt phase: prompt_steps (when=prompt/once) then always_steps
- Decode phase: only always_steps
- Prompt-only intermediates are cleared after prompt completes

Dataflow wiring:
- Intermediate tensors from upstream sessions are automatically wired
  to downstream session inputs based on dataflow[] config entries.
- Decoder inputs can be dynamically extended with wired intermediates
  (e.g., inputs_embeds from embedding session).

Tests: 14 new FlowInterpreter tests covering flow partitioning,
intermediate storage, dataflow wiring, and custom pipeline configs.
All 79 existing tests pass (9 skipped, GPU-only).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Justin Chu <justinchu@microsoft.com>
Clear owned OrtValue unique_ptrs from intermediate_store_ alongside
the non-owning pointers in FlowInterpreter when prompt phase ends.
Without this, prompt-only session outputs (e.g., vision features)
would leak memory after the prompt completes.

Expose prompt_only_sessions() accessor on FlowInterpreter so the
state can identify which intermediates to release.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Justin Chu <justinchu@microsoft.com>
Replace model_type string dispatch with config-driven fields:

- CreatePositionInputs: check pipeline_config.state.position_ids.strategy
  instead of ModelType::IsQwen25VL(). The v1 translator already sets
  strategy='mrope_3d' for Qwen2.5-VL models, so existing configs work.

- IsPastPresentShareBufferEnabled: check for encoder session in
  pipeline_config.sessions instead of model_type=='whisper'. Any
  encoder-decoder model (not just Whisper) gets the share buffer
  exception for beam search.

- Remove model_type.h include from position_inputs.cpp (no longer
  needed).

This eliminates 2 of the 6 model_type coupling points identified in
the architecture analysis. The remaining checks in model.cpp (factory
dispatch) and generators.cpp (error messages) are preserved for v1
backward compatibility.

Build: 79 tests passed, 9 skipped (GPU-only)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Justin Chu <justinchu@microsoft.com>
Address review findings: PipelineConfigState::UpdateInputsOutputs was
missing sliding window length clamping, and Run() was missing chunked
context processing — both behaviors present in DecoderOnly_State.

Fixes:
1. UpdateInputsOutputs: add sliding window config checks for
   position_length and kv_cache_length clamping, matching
   DecoderOnly_State::UpdateInputsOutputs exactly.
2. Run: add chunking support for single-session decoder path via
   RunDecoderWithChunking, matching DecoderOnly_State::RunWithChunking.
3. Multi-session path: move UpdateInputsOutputs call after the
   single-session early return for cleaner control flow.

All 79 tests pass (9 skipped, GPU-only).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Justin Chu <justinchu@microsoft.com>
Restructure per review: PipelineConfigModel now inherits from
DecoderOnly_Model (same pattern as PR 1). This gives us:

1. Single-session (decoder-only) configs: CreateState() returns
   DecoderOnly_State directly — zero custom code, full parity with
   sliding window, chunking, graph capture, and all future improvements.

2. Multi-session (VLM/encoder-decoder) configs: PipelineConfigState
   holds a DecoderOnly_State internally and delegates all decoder
   operations to it. FlowInterpreter orchestrates non-decoder sessions
   around the decoder, wiring intermediates via dataflow config.

This eliminates ~80 lines of duplicated decoder logic (UpdateInputsOutputs,
RunDecoderWithChunking, sliding window clamping, graph capture gating)
and ensures the decoder path can never drift from DecoderOnly_State.

Net: pipeline_config.cpp drops from 290 LOC to 220 LOC.
All 79 tests pass (9 skipped, GPU-only).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Justin Chu <justinchu@microsoft.com>
…ntermediates

Fix 2 remaining review findings:

1. FlowInterpreter is now fully stateless — no mutable intermediates
   map. GetWiredInputs takes an externally-owned intermediates map as
   a parameter. This prevents clobbering when multiple States share
   a Model (finding microsoft#3).

2. NonDecoderSessionIO stores owned std::string copies instead of raw
   c_str() pointers from ExtraInput, preventing dangling pointer bugs
   if the ExtraInput vector is relocated (finding microsoft#4).

Intermediate tensor ownership is now fully in PipelineConfigState:
- intermediate_owned_: unique_ptr<OrtValue> map (memory ownership)
- intermediates_: raw OrtValue* map (fast lookup for wiring)
Both are per-State, not per-Model.

Tests updated to match stateless FlowInterpreter API.
All 78 tests pass (9 skipped, GPU-only).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Justin Chu <justinchu@microsoft.com>
When multiple sessions produce outputs with the same tensor name,
GetOutput now returns the last-stored match (most-downstream session
in flow order) rather than the first match. This ensures that e.g.
GetOutput('hidden_states') returns the decoder's output, not the
encoder's, when both exist.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Justin Chu <justinchu@microsoft.com>
@justinchuby justinchuby linked an issue May 2, 2026 that may be closed by this pull request
@justinchuby justinchuby force-pushed the pipeline-as-config branch from e749c21 to ab971aa Compare May 2, 2026 03:23
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a “Pipeline-as-Config” v2 architecture where model dispatch and multi-session orchestration are driven by a declarative pipeline config (with presets + inheritance), and legacy v1 configs are translated into the v2 representation for backward compatibility.

Changes:

  • Add v2 pipeline schema + preset system (extends, sessions/flow/dataflow/state) with validation.
  • Introduce v1→v2 translation and config loading logic to populate Config::pipeline_config for both v1 and v2.
  • Add PipelineConfigModel + FlowInterpreter to execute multi-session pipelines and switch position-id behavior to be config-driven.

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
src/pipeline_config_schema.h Defines v2 PipelineConfig schema (sessions/flow/dataflow/state) and validation API.
src/pipeline_config_schema.cpp Implements ValidatePipelineConfig for internal consistency checks.
src/pipeline_presets.h Declares built-in presets and an overrides-merge API.
src/pipeline_presets.cpp Implements presets (autoregressive-decoder, vision-language, encoder-decoder) and override application.
src/v1_translator.h Declares v1→v2 translation API and documents mapping behavior.
src/v1_translator.cpp Implements v1→v2 translation using ModelType categorization and preset wiring.
src/config.h Adds Config::version and Config::pipeline_config fields.
src/config.cpp Parses version and pipeline JSON; resolves presets/overrides; validates v2; translates v1 into v2.
src/models/flow_interpreter.h Adds stateless flow partitioning and dataflow wiring resolution.
src/models/flow_interpreter.cpp Implements flow partitioning and wired-input resolution from intermediates.
src/models/pipeline_config.h Declares PipelineConfigModel/PipelineConfigState for config-driven execution.
src/models/pipeline_config.cpp Implements multi-session orchestration and intermediate tensor management.
src/models/model.cpp Dispatches v2 configs to PipelineConfigModel (v1 remains legacy string dispatch).
src/models/position_inputs.cpp Switches Qwen 2.5 VL 3D position-id selection to config-driven strategy.
src/models/kv_cache.cpp Updates past/present share-buffer enablement call-site to use the new signature.
src/generators.h Changes IsPastPresentShareBufferEnabled signature to accept Config&.
src/generators.cpp Implements new share-buffer gating using pipeline_config encoder presence.
test/pipeline_config_schema_test.cpp Adds unit tests for presets, overrides, validation, and v1→v2 translation.
test/flow_interpreter_test.cpp Adds unit tests for flow partitioning and dataflow wiring behavior.

Comment on lines +205 to +219
if (is_prompt_) {
is_prompt_ = false;
// Clear prompt-only intermediates from both maps
const auto& prompt_sessions = model_.flow_interpreter_->prompt_only_sessions();
for (auto it = intermediates_.begin(); it != intermediates_.end();) {
auto dot = it->first.find('.');
if (dot != std::string::npos &&
prompt_sessions.count(it->first.substr(0, dot))) {
intermediate_owned_.erase(it->first);
it = intermediates_.erase(it);
} else {
++it;
}
}
}
Comment on lines +153 to +189
void PipelineConfigState::RunFlowStep(
const PipelineConfig::FlowStep& step,
int total_length,
DeviceSpan<int32_t>& next_tokens,
DeviceSpan<int32_t> next_indices) {
if (step.run == "decoder") {
// Wire any intermediate inputs (e.g. inputs_embeds from embedding session)
// into the decoder state's input bindings before running.
auto wired = model_.flow_interpreter_->GetWiredInputs(
"decoder", intermediates_);
for (const auto& [input_name, value] : wired) {
bool found = false;
for (size_t i = 0; i < decoder_state_->input_names_.size(); ++i) {
if (std::strcmp(decoder_state_->input_names_[i],
input_name.c_str()) == 0) {
decoder_state_->inputs_[i] = value;
found = true;
break;
}
}
if (!found) {
wired_decoder_input_names_.push_back(input_name);
decoder_state_->input_names_.push_back(
wired_decoder_input_names_.back().c_str());
decoder_state_->inputs_.push_back(value);
}
}

// Delegate to DecoderOnly_State::Run which handles everything:
// KV cache, position inputs, logits, sliding window, chunking,
// graph capture, run options.
last_logits_ = decoder_state_->Run(total_length, next_tokens, next_indices);
return;
}

RunNonDecoderSession(step.run);
}
Comment thread src/models/pipeline_config.cpp Outdated
Comment on lines +241 to +251
// Search intermediates by tensor name (suffix after "session.").
// If multiple sessions produce the same tensor name, the last-stored
// one wins (which is the most-downstream session in flow order).
OrtValue* result = nullptr;
for (const auto& [key, value] : intermediates_) {
auto dot = key.find('.');
if (dot != std::string::npos && key.substr(dot + 1) == name) {
result = value;
}
}
if (result) return result;
Comment thread src/v1_translator.h Outdated
Comment on lines +30 to +31
// Throws std::runtime_error if the model_type is not supported for
// translation (e.g. unknown or highly custom model types).
Comment thread src/generators.cpp
Comment on lines 283 to +290
guidance_ff_tokens_enabled = enable_ff_tokens;
}

bool GeneratorParams::IsPastPresentShareBufferEnabled(const std::string& model_type) const {
bool GeneratorParams::IsPastPresentShareBufferEnabled(const Config& config) const {
// past_present_share_buffer is only actually enabled when:
// 1. The config option is set to true, AND
// 2. Either num_beams == 1 OR the model is Whisper
// 2. Either num_beams == 1 OR the model is encoder-decoder (has encoder session)
bool is_encoder_decoder = config.pipeline_config.sessions.count("encoder") > 0;
Comment on lines +13 to +21
PipelineConfigModel::PipelineConfigModel(
std::unique_ptr<Config> config, OrtEnv& ort_env)
: DecoderOnly_Model{std::move(config), ort_env} {
// DecoderOnly_Model already loaded session_decoder_ and registered it
// with session_info_. Now load any additional sessions for multi-session
// pipelines (vision, embedding, encoder, etc.).
const auto& pipeline_config = config_->pipeline_config;

for (const auto& [name, session_config] : pipeline_config.sessions) {
Comment on lines +163 to +178
for (const auto& [input_name, value] : wired) {
bool found = false;
for (size_t i = 0; i < decoder_state_->input_names_.size(); ++i) {
if (std::strcmp(decoder_state_->input_names_[i],
input_name.c_str()) == 0) {
decoder_state_->inputs_[i] = value;
found = true;
break;
}
}
if (!found) {
wired_decoder_input_names_.push_back(input_name);
decoder_state_->input_names_.push_back(
wired_decoder_input_names_.back().c_str());
decoder_state_->inputs_.push_back(value);
}
justinchuby and others added 3 commits May 2, 2026 03:50
Fix all 7 review findings from Copilot review:

1. Use-after-free in wired decoder inputs: save/restore decoder
   input_names_/inputs_ size around wiring so prompt intermediates
   can be safely freed without leaving dangling pointers.

2. FlowStep.loop ignored: add loop awareness in RunFlowStep with
   TODO for per_image implementation (currently runs once).

3. GetOutput map order vs flow order: add output_by_tensor_name_
   map updated at store time so last-executed session wins.

4. v1 translator doc comment: document fallback to autoregressive-
   decoder preset for unknown model types (matches implementation).

5. IsPastPresentShareBufferEnabled: keep encoder-session check
   (matches original Whisper behavior) with documented assumption
   that encoder-decoder models use cache_indirection for beams.

6. v2 decoder filename sync: after preset resolution, sync
   pipeline_config.sessions["decoder"].file to model.decoder.filename
   so DecoderOnly_Model loads the correct session.

7. c_str() invalidation: change wired_decoder_input_names_ from
   vector to deque for stable string addresses on push_back.

Build: 82 tests passed, 18 skipped (GPU-only)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Justin Chu <justinchu@microsoft.com>
16 new test cases covering:
- Complex multi-session dataflow (5-session chain)
- Diamond dataflow topology (A→B, A→C, B→D, C→D)
- Standalone session with no dataflow wires
- Flow with only once/prompt/always steps
- Large flow (10 sessions in chain)
- Preset override with conflicting roles and state
- Dataflow replacement semantics
- KV cache pattern override propagation
- Duplicate session in flow (valid — runs multiple times)
- Empty loop string validation
- VLM KV cache pattern propagation
- Fara position strategy (mrope_3d)

Total: 44 pipeline tests, 96 overall tests pass.
Signed-off-by: Justin Chu <justinchu@microsoft.com>
Design updates:
- Rename when vocabulary: prompt→init, always→step, add final
- Backward-compat aliases auto-normalized via NormalizePipelineConfig()
- generation_loop as std::optional (autoregressive/single_pass/denoising)
- FlowInterpreter: init_steps_/step_steps_/final_steps_
- Throw clear error for unsupported 'when: final'
- Unrecognized when values produce clear error messages

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Justin Chu <justinchu@microsoft.com>
@justinchuby justinchuby force-pushed the pipeline-as-config branch from a8e46c5 to 52dee03 Compare May 2, 2026 04:16
@justinchuby justinchuby marked this pull request as draft May 5, 2026 14:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Copilot design of Pipeline-as-Config

2 participants