Skip to content

model package integration#2136

Open
xiaoyu-work wants to merge 24 commits into
mainfrom
feature/v4-package-integration
Open

model package integration#2136
xiaoyu-work wants to merge 24 commits into
mainfrom
feature/v4-package-integration

Conversation

@xiaoyu-work
Copy link
Copy Markdown
Contributor

@xiaoyu-work xiaoyu-work commented May 6, 2026

This pull request adds support for loading and validating v4 model packages in the configuration system, including explicit execution provider (EP) selection and per-component overlays. It introduces new logic for merging and validating configuration data from model packages, ensures robust error handling, and updates the Config structure to support these new features.

Support for v4 Model Packages and Execution Provider Selection:

  • Added new logic to load configuration from v4 model packages, including:
    • Reading and merging base config and per-component overlays.
    • Performing execution provider (EP) selection, either by defaulting to the intersection of supported EPs or using a user-supplied EP.
    • Validating that all referenced components exist in the package and providing clear error messages for invalid references. [1] [2]

Config Structure Enhancements:

  • Extended the Config structure and its sub-structs (Encoder, Decoder, Vision, Speech, Embedding) to include a new component field, which maps each role to a specific package component. [1] [2] [3] [4] [5]
  • Added shared_assets_path to Config, distinguishing between flat-directory and package modes for asset resolution.

JSON Parsing and Overlay Improvements:

  • Introduced a new ParseConfigFromText helper to centralize config parsing from text with improved error diagnostics and overlay support.
  • Updated OverlayConfig to allow applying overlays to existing configs.

Codebase Maintenance:

  • Added necessary includes for new features and data structures, such as <memory>, <unordered_map>, and <unordered_set>. [1] [2] [3]

JSON Deserialization Updates:

  • Updated JSON element handlers to support the new component field for each relevant role during deserialization. [1] [2] [3] [4] [5]

Adds a lightweight in-memory JSON DOM (Document / Object / Array) on top of
the existing streaming parser, plus serializer and RFC 7386 JSON Merge
Patch. This is the foundation for v4 model-package per-variant overlays,
which carry a JSON Merge Patch in consumer_metadata.genai_config_overlay
and need to be merged into the package-shipped base genai_config.json
before the existing streaming-parser-driven Config loader sees it.

The new APIs are additive — every existing JSON::Element-based config
loader is unaffected. The DOM is intentionally minimal (no fancy number
representations, no object-key insertion order, no schema validation);
its only consumers are Config-time overlay merging and the unit tests.

Also fixes a pre-existing off-by-one in the streaming parser's
Skip(literal) bounds check that silently dropped a top-level
true / false / null literal when it sat at the very end of the input
buffer. The bug never surfaced in real genai_config.json files because
every keyword there is followed by a comma, brace or bracket, but the
new DOM round-trip tests exercise it directly.

Tests: test/json_merge_patch_test.cpp covers the full RFC 7386 example
table plus GenAI-relevant shapes (overlay overrides context_length and
I/O names; pipeline arrays replace wholesale; null deletes optional
fields) and a small set of edge cases beyond the RFC.
Two related JSON correctness fixes flagged in PR review:

1. SerializeImpl integral fast-path computed
   `v == static_cast<double>(static_cast<long long>(v))` BEFORE checking
   that v fit in long long. Casting an out-of-range double to long long is
   undefined behavior (e.g. v = 1e19 with LLONG_MAX = 2^63-1). Reorder so
   the bounds check runs first, then test integrality with std::modf — no
   cast required, no UB possible. Use a strict `<` on the upper bound
   because static_cast<double>(LLONG_MAX) rounds up to 2^63 (LLONG_MAX
   itself is not exactly representable in double); admitting v = 2^63
   would still overflow the cast.

2. SerializeImpl streamed non-finite doubles via operator<<, producing
   `nan` / `inf` tokens that are not valid JSON (RFC 8259 §6
   restricts the grammar to the finite reals). NaN / Inf can sneak into
   a Document via direct construction or via strtod parser overflow.
   Throw on non-finite doubles in the serializer, and detect strtod
   overflow on parse so we never admit ±Inf into the DOM in the first
   place. The from_chars branch already throws on result_out_of_range.

Tests added:
- 1e19 round-trips through the precision-17 path (no UB).
- SerializeDocument(Inf|−Inf|NaN) throws.
- ParseDocument(`1e500`) and ParseDocument(`-1e500`) throw.
Adds a GenAI-internal abstraction over the ORT v4 model-package layout
(src/models/model_package.{h,cpp}), with a stub directory-walker
implementation that interprets the on-disk format described in the v4
spec. The abstraction is dead code today; W3 wires Config into it. When
ORT v4 lands, an alternate implementation will delegate to
OrtModelPackageContext while keeping this surface unchanged.

The abstract surface mirrors the v4 C API:

  * Pre-selection traversal: NumComponents/ComponentName,
    NumVariants/VariantName, VariantEpCompatibility (with optional
    device discriminator + free-form compatibility strings),
    EpsCompatibleWith convenience for EP defaulting.
  * SelectComponent takes ModelPackageSelectionOptions (an ordered
    EpSelection list with optional device), implementing the spec's
    captured-EP-priority + metadata-declaration-order tie-break
    algorithm. Empty captured list defaults to [CPUExecutionProvider]
    per spec.
  * ComponentInstance exposes only what the v4 cix handle does:
    VariantFolderPath, FileCount, ConsumerMetadata (raw blob, returned
    verbatim), and ResolveSharedWeight(checksum).

Per-file detail (filename, session_options, provider_options,
shared_files) is intentionally NOT on ComponentInstance — consumers
parse variant.json directly via the standalone ParseVariantManifest
helper, mirroring the v4 contract that ORT does not expose per-file
accessors.

Layout the stub walker recognizes (matches Appendix A of the spec):

  <package>/
  +- manifest.json                           optional
  +- configs/                                shared assets bucket
  +- <component>/
     +- metadata.json                        selection-only metadata
     +- shared_weights/<checksum>/<blob>     per-component shared weights
     +- <variant>/
        +- variant.json                      files[], shared_files,
                                             consumer_metadata

Detection rule (intentionally conservative to keep flat-dir fallback
reliable): treat as v4 iff manifest.json exists OR at least one
non-reserved direct child contains metadata.json. The bare presence
of configs/ is NOT a positive signal.

Variant declaration order is sourced from metadata.json (preserved via
the streaming JSON::Element parser, since JSON::Object = std::map does
not preserve insertion order). Filesystem order is never authoritative.

Path-traversal validation is applied to every package-controlled
fragment (component name, variant name, file filename, shared-weight
checksum) so malformed packages cannot escape the package root.

Tests in test/model_package_test.cpp cover detection, manifest
validation (string-array shape, schema_version, dedup, traversal
rejection, listed-component-missing-metadata), variant ordering,
EP-compatibility traversal with device + compatibility strings,
selection (priority, device-aware match, tie-break, empty-priority
CPU default), ConsumerMetadata round-trip, FileCount, and
ResolveSharedWeight (success, missing, zero-blob, multi-blob,
traversal). ParseVariantManifest is covered for files[] order,
typed session_options preservation, shared_files map, and rejection
of malformed/missing inputs.
Add test/runtime_settings_test.cpp covering the two layer-2 channels that
override genai_config.json values at runtime:

  1. RuntimeSettings::GenerateConfigOverlay() — handle-driven JSON
     emitted at Model::Create. Today only `dawnProcTable` is recognised
     and projected into model.decoder.session_options.provider_options[*]
     .WebGPU.dawnProcTable.
  2. OverlayConfig() / OgaConfigOverlay() — caller-driven JSON exposed
     through C#, Java, Python, and ObjC bindings.

The new tests:

  * Verify GenerateConfigOverlay returns an empty string when no handles
    are set (so callers can feed it unconditionally into the merge).
  * Verify unrelated handles are ignored.
  * Pin the exact JSON path that dawnProcTable lands at — every keyword
    on that path is asserted explicitly so a future refactor can't
    silently break WebGPU on Windows.
  * Verify OverlayConfig applies a search.top_k override and leaves
    unrelated search fields alone, including a re-apply scenario.
  * End-to-end: generate the WebGPU overlay, parse it through
    OverlayConfig on a real Config, and confirm a WebGPU provider_options
    entry with the dawnProcTable option appears with the expected value.

Also adds inline doc comments on RuntimeSettings::GenerateConfigOverlay
and OverlayConfig describing the layer-2 contract and pointing at the
regression suite. No production code changes.
Introduce a new fs::path field next to Config::config_path to address the
directory holding tokenizer files, processor JSON, chat template, and other
assets that are common across components (decoder/vision/speech/encoder/...).

In flat-dir mode this field is initialized to config_path in
Config::Config(path, overlay), so behavior is identical. In a future ORT v4
package mode the package loader will set it to <pkg>/configs/.

Rewrite all call sites that load shared assets to use shared_assets_path:
- Tokenizer (model.cpp:308) and constrained_logits_processor (tokenizer.json).
- Vision/multimodal processor configs in gemma/gemma4/mistral3/phi/phi-multimodal/qwen2_5_vl.
- Whisper speech processor config.

Per-component model assets (ONNX, LoRA, custom_ops, external data) and
EP-private paths (OpenVINO cache_dir, RyzenAI/VitisAI model_root) are
intentionally NOT migrated — they are addressed per-component variant folder
and require deeper schema work in a follow-up workstream.

This is preparatory plumbing for the ORT v4 model-package format integration.
Five non-decoder session-creation paths (marian encoder, whisper encoder,
multi_modal vision/speech/embedding) inline the same fallback ternary:

    config_->model.<component>.session_options.has_value()
        ? config_->model.<component>.session_options.value()
        : config_->model.decoder.session_options

Replace it with a single free helper, EffectiveSessionOptions(config,
component_session_options), declared in config.h next to OverlayConfig
and the other layer-2 free helpers.

Behavior is identical to the inlined form (verified by unit test, which
asserts the returned reference aliases the underlying optional or
decoder storage rather than copying). Behavior change is zero on any
existing flat-directory model.

The motivation is layered: today the helper is purely cosmetic, but in
the v4 model-package world the same call sites need a single seam where
the layer-1 (variant.json files[].session_options) baseline gets merged
in front of the today-only layer-2 (genai_config) view. Centralising
the lookup here means the v4 work touches one function, not five
ternaries scattered across three files.

Tests: test/config_helpers_test.cpp covers the three contract bullets
that matter for the call sites — fallback identity to decoder when the
component is nullopt, identity to the component's stored value when set,
and live-aliasing of the decoder fallback (so future refactors don't
accidentally start copying the SessionOptions struct).
Branch `Config::Config(path, overlay)` on whether `path` is a v4 model
package (`ModelPackageContext::Open()` returns non-null). For a flat-dir
model the existing strict-streaming-parser path is unchanged.

For a v4 package:
  1. `shared_assets_path` is set to `<package>/configs/` so all
     subsequent shared-asset loaders (tokenizer, processor configs,
     chat-template) read from the package's configs/ bucket.
  2. EP defaulting (`ComputeEpDefaulting`) intersects every component's
     `EpsCompatibleWith()` set, ordered by the first component's
     first-seen list for determinism. Empty intersection or >1 survivor
     throws with a helpful diagnostic listing the candidate EPs and the
     per-component compatibility matrix; explicit per-component selection
     is reserved for the public-API `ep` argument (W8).
  3. Each component's selected variant is fetched via `SelectComponent`
     and stored in `Config::component_instances` keyed by component name.
     `consumer_metadata.genai_config_overlay` (if present) must be a
     JSON object and is RFC 7386 merged into the base
     `configs/genai_config.json` DOM.
  4. The merged DOM is serialized and run through the same streaming
     parser as the flat-dir path, so all field-validation (unknown keys,
     type mismatches, range checks) still applies. The caller-supplied
     `json_overlay` is applied as the final layer-2 override
     (`OgaConfigOverlay` / `RuntimeSettings::GenerateConfigOverlay`).
  5. `ValidateRoleComponentReferences` walks every
     `model.<role>.component` field and rejects references to component
     names the package didn't ship.

Concrete additions:

  * Adds `std::string component` to all five role sub-structs
    (`Encoder`, `Decoder`, `Vision`, `Speech`, `Embedding`) plus
    matching `OnValue("component", ...)` handlers in their parsers so
    the streaming parser accepts the new key.
  * Adds `Config::model_package` (`shared_ptr<ModelPackageContext>`)
    and `Config::component_instances` (`unordered_map<name,
    shared_ptr<ComponentInstance>>`). `shared_ptr` keeps Config copy-
    constructible, which `OgaCreateModelFromConfig` relies on; the
    package state is read-only after construction so sharing is safe.
  * Refactors `ParseConfig` to expose `ParseConfigFromText`, used by
    both the file-based flat-dir loader and the package-overlay-merge
    path (which feeds an in-memory string).

Adds `test/config_package_test.cpp` covering: detection, per-component
overlay merge, caller layer-2 wins over package overlay, role->component
reference validation, EP defaulting (empty/single/multi intersection),
all five role.component fields parseable, and the flat-dir regression.

Wired so far: nothing else. Subsequent commits (W4-real, W5a/b, W6, W7,
W8, W10) drive `shared_assets_path` and `component_instances` into
the path-rewrite, session-construction, pipeline-runner and public-API
layers.
Adds the EP-injection wiring that connects the v4 package's selected

variant (from W3's component_instances map) to the existing GenAI

session-options dispatch table (in SetProviderSessionOptions).

Three pieces:

1. ComponentInstance::SelectedEp() — new abstract accessor returning the

   canonical ORT EP name (e.g. 'CUDAExecutionProvider') of the variant

   chosen by SelectComponent. StubComponentInstance populates it from

   priority[best_priority].ep_name, the very entry that just won.

2. EpNameToProviderTag() / EnsurePackageProvider() in session_options.h

   — canonical-EP -> internal-tag map mirroring the dispatch table

   ('CUDAExecutionProvider' -> 'cuda', etc.) plus an idempotent helper

   that rotates the package's tag to the front of providers and

   backfills a matching ProviderOptions entry. CPU/unknown -> no-op.

3. Model::CreateSessionOptions hooks the helper before the

   CreateSessionOptionsFromConfig call when model_package is non-null

   and decoder.component resolves to a known instance. Behaviour for

   flat-dir Configs is unchanged (model_package == null short-circuits

   at the first conjunct).

Why the rotate-to-front semantics: the dispatch loop honours providers

in order and stops at the first one whose AppendExecutionProvider

returns a DeviceInterface — putting the package's EP first guarantees

it wins p_device_ even if a runtime overlay added a competing entry.

Tests: 8 new EnsurePackageProvider unit tests in

test/config_helpers_test.cpp (mapping, insert-at-front, rotate,

idempotency, CPU no-op, unknown no-op, ProviderOptions backfill); two

additional EXPECT_EQ()s on SelectedEp() in the existing

model_package_test.cpp SelectComponentPicksMatchingVariant case.
Plumbs the per-component variant_folder (set by W3 on the package's

ComponentInstance) through all single-file session-creation paths so

ONNX models, LoRA adapters, and custom-ops libraries load from the

selected variant directory instead of a single flat config_path.

Mechanism:

1. Model::AssetFolder(component_name) — central resolver. Empty

   component, missing instance, or flat-dir Config -> config_path.

   In package mode with a known component -> the instance's

   VariantFolderPath().

2. Model::CreateSession gains an optional component_name parameter.

   Both the file-system path and the DirGuard.ChangeTo() workaround

   for external-data resolution now use AssetFolder(component_name)

   instead of config_path. Default empty preserves flat-dir behaviour.

3. Model::CreateSessionOptionsFromConfig gains an optional

   component_name parameter. The custom-ops library first-try

   resolution uses AssetFolder(component_name) instead of config_path

   (EP-library and cwd fallback paths unchanged).

Caller migrations (each passes its model.<role>.component):

  decoder_only.cpp / gpt.cpp / decoder_only_pipeline.cpp ->

      decoder.component

  marian.cpp / whisper.cpp -> encoder.component + decoder.component

  multi_modal.cpp -> vision + speech + embedding + decoder components

      (incl. LoRA adapter path resolution at lines 712/716)

  qwen_vl_model.cpp -> vision.component (find_stage lambda)

  Model::CreateSessionOptions -> decoder.component for both the

      primary call and pipeline_model sub-calls

Untouched (out of v4 surface):

  silero_vad.cpp / nemotron_speech.cpp -> default-empty component

      preserves existing flat-dir behaviour.

  EP-private paths (OpenVINO cache_dir, RyzenAI/VitisAI model_root)

      remain on config_path; W6/W7 will revisit when each EP's

      session_options handler gets package context.

No behaviour change for flat-dir Configs: AssetFolder() returns

config_path when component_instances is empty (Config without

manifest.json) or the component name lookup misses.
Wires the v4 model package's shared-weight blob mechanism into the

decoder-only pipeline runner. ONNX models in a multi-file variant

can declare external initializers via variant.json's per-file

shared_files map; the runner reads the referenced blobs into memory

and registers them with the session via ORT's

AddExternalInitializersFromFilesInMemory.

Pieces:

1. OrtSessionOptions::AddExternalInitializersFromFilesInMemory wrapper

   in onnxruntime_api.h / onnxruntime_inline.h, mirroring ORT's

   1.18+ C API. Buffer pointers must outlive session creation; the

   wrapper takes (names, buffers, lengths) parallel arrays and

   delegates lifetime management to the caller.

2. Model::ApplyPackageExternalInitializers(component, filename, so):

   lazy-parses and caches the variant.json for component, looks

   up the matching file entry, ResolveSharedWeight()s each blob,

   reads them into Model-owned buffer storage, and calls the new

   wrapper. Empty/unknown component, file-not-in-manifest, or empty

   shared_files all no-op (flat-dir Configs are unaffected).

3. Model gains two private members for this:

   - variant_manifests_ (per-component VariantManifest cache)

   - external_initializer_buffers_ (shared-blob byte storage that

     stays alive for the Model's lifetime).

4. DecoderOnlyPipelineModel constructor: between selecting the

   stage's OrtSessionOptions and CreateSession, calls

   ApplyPackageExternalInitializers for each stage's filename so

   external initializers are registered before ORT consumes the

   session_options during session creation.

Out of scope (deferred):

* Per-file session_options/provider_options baseline merge from

  variant.json (spec-defined layer-1 SO/PO). The current decoder

  pipeline still derives SO/PO from genai_config layer-2; the file-

  entry's session_options and provider_options JSON::Object

  fields parsed by W2's ParseVariantManifest are not yet applied.

  W10's end-to-end fixture will demonstrate need or its absence.

* Shared-options multi-stage isolation: when multiple pipeline

  stages share the main session_options (genai_config did not

  define per-stage session_options) and declare distinct

  shared_files, registrations accumulate on the shared object. In

  practice v4 packages with shared_files declare per-stage

  session_options. Hardening by cloning to per-stage on demand can

  follow if a real package surfaces the case.
… sessions

Multi-modal models in v4 package mode can have their vision, speech,

and embedding components on different EPs from the decoder. The

decoder is the model's primary session and the only one that should

set `p_device_`.

* For each non-decoder role (vision/speech/embedding), in package

  mode with a non-empty `model.<role>.component`, ensure the role's

  `Config::SessionOptions` slot exists and inject the captured EP

  via `EnsurePackageProvider` (mirrors the decoder logic in W5b).

* Pass `is_primary_session_options=false` for those role sessions

  in package mode so their DeviceInterface doesn't conflict with

  the decoder's. Flat-dir mode keeps the legacy `true` behavior

  (all roles fall back to `decoder.session_options`).

No-op for flat-dir Configs and for empty role components in package mode.
Add an explicit-EP entry point to GenAI's public surface so users can

select the execution provider for v4 model packages without re-packaging

or relying on per-component metadata defaulting.

C API (NULL or empty `ep` falls back to defaulting; existing entry

points unchanged):

* `OgaCreateConfigWithEp(path, ep, &config)`

* `OgaCreateModelWithEp(path, ep, &model)`

* `OgaCreateModelWithRuntimeSettingsAndEp(path, settings, ep, &model)`

Internal plumbing:

* `Config::Config(path, overlay, user_ep)` 3-arg overload — the 2-arg

  overload delegates with empty `user_ep`.

* `LoadFromPackage` and `ComputeEpDefaulting` accept `user_ep`;

  when non-empty, defaulting is bypassed and the user's choice becomes

  the captured EP. Per-component `SelectComponent` failures now list

  the component's compatible EPs to make typos and unsupported-EP

  cases easy to debug.

* `Generators::CreateModel(env, path, settings, user_ep)` plumbs the

  argument into `Config`.

* Flat-directory (legacy) mode rejects a non-empty `user_ep` with a

  clear diagnostic pointing at the existing `OgaConfigClearProviders`

  / `OgaConfigAppendProvider` channel.

Language bindings:

* C++ wrapper (`ort_genai.h`): new `OgaConfig::Create(path, ep)`,

  `OgaModel::Create(path, ep)`,

  `OgaModel::Create(path, settings, ep)` overloads.

* Python: `og.Config(path, ep=...)` / `og.Model(path, ep=...)` via a

  new pybind11 init taking the optional EP string.

* C#: new `Config(string, string)` and `Model(string, string)` ctors

  plus matching `DllImport` declarations.

* Java: `Config(String, String)` / `Model(String, String)` ctors

  with JNI implementations that handle a null jstring without invoking

  `GetStringUTFChars` on it (undefined behaviour).

* Objective-C: `initWithPath:ep:error:` initializers on `OGAConfig`

  and `OGAModel`; `nil` ep falls back to defaulting.

Selected EP propagates into the effective provider list via the W5b

`EnsurePackageProvider` hook, so package variant selection and the

ORT session's provider list cannot diverge under user-supplied `ep`.

Updates the W3 multi-EP defaulting diagnostic to point at this new

argument as the resolution channel.
Adds four targeted tests for the W8 `ep` plumbing:

* `UserEpBypassesDefaultingInPackage` — multi-EP package that would

  fail defaulting now loads cleanly when the user names an EP.

* `UserEpThatNoComponentSupportsThrowsWithDiagnostic` — the per-

  component diagnostic lists the component's compatible EPs.

* `EmptyUserEpFallsBackToDefaulting` — empty string equivalent to

  the legacy 2-arg constructor.

* `UserEpInFlatDirThrows` — flat-dir mode rejects a non-empty `ep`

  with a message pointing at the legacy provider-list channel.

JSON merge_patch (RFC 7386) and EP defaulting under defaulting are

already covered by the existing `json_merge_patch_test.cpp` (31

tests) and the multi/empty/single-EP cases earlier in this file.
@xiaoyu-work xiaoyu-work changed the title v4 model package integration (umbrella draft, W3 landed) model package integration May 6, 2026
@xiaoyu-work xiaoyu-work marked this pull request as ready for review May 6, 2026 22:55
@xiaoyu-work xiaoyu-work requested a review from a team as a code owner May 6, 2026 22:55
Copilot AI review requested due to automatic review settings May 6, 2026 22:55
Removes `W3:` / `W5a-soft:` / `W5b` / `W6` / `W7` / `W8:` references from source-tree and test comments. These were internal sequencing labels and don't belong in shipping code; the surrounding prose already explains the v4-package context.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds first-class support for loading v4 model packages through Generators::Config, including package detection, per-component overlay merging (RFC 7386 merge patch), and an explicit “execution provider (EP)” selection API surface that’s plumbed through the C API and all language bindings. It also refactors asset resolution so shared assets (tokenizer/processor configs/templates) are resolved via Config::shared_assets_path, while per-component ONNX/assets resolve via a component-aware asset folder.

Changes:

  • Add v4 model-package abstraction (ModelPackageContext), variant selection/defaulting, and config overlay merge pipeline (base configs/genai_config.json + per-component consumer_metadata.genai_config_overlay + caller overlay).
  • Introduce a lightweight JSON DOM with RFC 7386 merge-patch support, plus improved parse diagnostics and number handling.
  • Extend public APIs/bindings with an optional ep argument and update model/session creation to be package-aware (asset folders, providers injection, etc.), with extensive new unit tests.

Reviewed changes

Copilot reviewed 48 out of 48 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
test/runtime_settings_test.cpp Adds regression tests for RuntimeSettings-generated JSON overlays and OverlayConfig behavior.
test/model_package_test.cpp Adds unit tests for v4 package detection, manifest/metadata parsing, variant traversal, EP selection, consumer metadata, and shared weights.
test/json_merge_patch_test.cpp Adds tests for JSON DOM parse/serialize + RFC 7386 merge patch semantics and edge cases.
test/config_package_test.cpp Adds end-to-end tests for Config’s v4 package loading, overlay merge, EP defaulting/explicit EP, and role->component validation.
test/config_helpers_test.cpp Adds tests for EffectiveSessionOptions and EnsurePackageProvider/EpNameToProviderTag helpers.
src/runtime_settings.h Documents runtime-only handle projection into layer-2 config overlays.
src/python/python.cpp Adds ep overloads/args for Python Config and Model constructors.
src/ort_genai.h Adds C++ wrapper overloads for creating Config/Model with explicit EP (and with runtime settings + EP).
src/ort_genai_c.h Adds new C API entrypoints: OgaCreateConfigWithEp, OgaCreateModelWithEp, OgaCreateModelWithRuntimeSettingsAndEp.
src/ort_genai_c.cpp Implements the new C API entrypoints and plumbs ep through to Generators::Config / CreateModel.
src/objectivec/oga_model.mm Adds Objective-C initializer to create a model with an explicit EP.
src/objectivec/oga_config.mm Adds Objective-C initializer to create a config with an explicit EP.
src/objectivec/include/ort_genai_objc.h Documents Objective-C EP overloads for Config/Model.
src/models/whisper.cpp Switches to EffectiveSessionOptions and component-aware session creation/options for encoder/decoder.
src/models/whisper_processor.cpp Resolves processor config via shared_assets_path (package-aware).
src/models/session_options.h Declares EpNameToProviderTag + EnsurePackageProvider helpers for package EP injection.
src/models/session_options.cpp Implements EpNameToProviderTag + EnsurePackageProvider (provider/provider_options consistency).
src/models/qwen2_5_vl_image_processor.cpp Resolves vision processor config via shared_assets_path.
src/models/qwen_vl_model.cpp Resolves pipeline stage assets via component-aware asset folder; passes component into session-options creation.
src/models/phi_multimodal_processor.cpp Resolves vision/audio processor configs via shared_assets_path.
src/models/phi_image_processor.cpp Resolves vision processor config via shared_assets_path.
src/models/onnxruntime_inline.h Adds wrapper for ORT API AddExternalInitializersFromFilesInMemory.
src/models/onnxruntime_api.h Declares OrtSessionOptions::AddExternalInitializersFromFilesInMemory.
src/models/multi_modal.cpp Makes secondary sessions package-aware (EP injection, non-primary semantics, component-aware asset resolution).
src/models/model.h Adds component-aware AssetFolder, package external initializer plumbing + caches.
src/models/model.cpp Routes shared assets to shared_assets_path, adds package EP injection, component-aware session creation, and shared-weight external initializer support.
src/models/model_package.h Introduces the v4 model-package abstraction, selection options, and variant.json parsing interface.
src/models/model_package.cpp Implements a stub filesystem-backed v4 package loader, parsers, and selection logic.
src/models/mistral3_image_processor.cpp Resolves vision processor config via shared_assets_path.
src/models/marian.cpp Switches to EffectiveSessionOptions and component-aware session creation/options for encoder/decoder.
src/models/gpt.cpp Uses component-aware session creation for decoder.
src/models/gemma4_multimodal_processor.cpp Resolves vision/audio processor configs via shared_assets_path.
src/models/gemma_image_processor.cpp Resolves vision processor config via shared_assets_path.
src/models/decoder_only.cpp Uses component-aware session creation for decoder.
src/models/decoder_only_pipeline.cpp Adds package external initializer registration for pipeline stages; makes stage path resolution component-aware.
src/json.h Adds JSON DOM types and declares ParseDocument/SerializeDocument/MergePatch.
src/json.cpp Implements JSON DOM, serializer, RFC 7386 MergePatch, and improves numeric parsing/overflow handling.
src/java/src/main/native/ai_onnxruntime_genai_Model.cpp Adds JNI binding for creating a Model with explicit EP (null-safe).
src/java/src/main/native/ai_onnxruntime_genai_Config.cpp Adds JNI binding for creating a Config with explicit EP (null-safe).
src/java/src/main/java/ai/onnxruntime/genai/Model.java Adds Java constructor overload for explicit EP selection.
src/java/src/main/java/ai/onnxruntime/genai/Config.java Adds Java constructor overload for explicit EP selection.
src/generators.h Extends CreateModel signature to accept optional user_ep.
src/csharp/NativeMethods.cs Adds P/Invoke declarations for OgaCreateConfigWithEp / OgaCreateModelWithEp.
src/csharp/Model.cs Adds C# Model(string path, string ep) overload.
src/csharp/Config.cs Adds C# Config(string path, string ep) overload.
src/constrained_logits_processor.cpp Resolves tokenizer path via shared_assets_path for package compatibility.
src/config.h Extends Config with role component fields, shared_assets_path, package state, and EffectiveSessionOptions declaration.
src/config.cpp Implements v4 package load/merge/defaulting/validation path; adds ParseConfigFromText and package-aware config construction.
Comments suppressed due to low confidence (1)

src/objectivec/include/ort_genai_objc.h:84

  • There’s a stray * line (and dangling doc text) after the new initWithPath:ep:error: declaration for OGAConfig (lines 81-83). It’s outside of a comment block, so this header won’t compile. Remove those lines or wrap them back into a proper /** ... */ comment (likely the doc for clearProvidersWithError).

Comment thread src/models/model_package.cpp Outdated
Comment thread src/models/model.cpp
Comment thread src/models/decoder_only_pipeline.cpp
Comment thread src/config.cpp Outdated
* `SelectComponent`: a device-pinned `ep_compatibility` entry now

  requires the caller's priority entry to also pin the same device.

  Previously a device-less caller silently matched a device-pinned

  variant (could pick an NPU-only build for a CPU caller).

  Extended the existing OpenVINO GPU/NPU test with a negative case.

* `Model::CreateSession`: now invokes

  `ApplyPackageExternalInitializers` itself before `OrtSession::Create`

  (no-op in flat-dir / when no shared_files declared). This makes the

  registration apply to every session — decoder, encoder, vision,

  speech, embedding, and recreated pipeline stages — not just the

  initial DecoderOnlyPipelineModel constructor calls.

* `IntermediatePipelineState::Run`: lazy session recreation now

  routes through `Model::CreateSession` so it inherits the same

  external-initializer registration as the initial creation.

* `Config` flat-dir + non-empty user_ep error message: dropped the

  `directories containing manifest.json` qualification. Manifest-less

  v4 packages are detected via component-dir `metadata.json` too, so

  the old wording was misleading.

* Removed the now-redundant `ApplyPackageExternalInitializers` call

  from the `DecoderOnlyPipelineModel` constructor (one path now).
Real-world v4 model packages from current producer toolchains ship a
richer manifest than the canonical
{ "schema_version": 1, "components": ["decoder"] }:

  {
    "schema_version": "1.0",
    "components": [{"name": "decoder", "metadata": "decoder/metadata.json"}]
  }

Forgive the parser:

* schema_version: spec-canonical type is now a string ("1" or "1.0").
  Numeric values are coerced to their integer-string form so historic
  producers writing 1 still load. Strings other than "1" / "1.0"
  remain rejected; non-integer numbers also remain rejected.

* components entries: accept either a bare string (canonical) or an
  object with a name field (extra metadata / description / version
  fields are ignored - on-disk layout stays conventional).
  Anything else throws.

Tests: flipped the legacy-rejection test to positive coverage of
object-form components, and added 4 schema_version cases
(string "1.0", number 1, unsupported "2", unchanged number 99).
Adds ManifestRealWorldProducerShape mirroring the manifest currently
produced for phi-4-mini-reasoning.v4.ortpackage (Olive / Foundry-Local
toolchain). Combines string schema_version, object-form components,
extra top-level fields, and a metadata.json carrying schema_version /
component_name siblings to the variants map.

This test would have failed against the parser before the previous
commit's compatibility shim. Keeping it as a regression so future
parser tightening stays compatible with packages already in the wild.
Reverts the earlier object-form tolerance for `components` entries.
The spec is clear that components are bare strings; supporting an
alternate object-with-name shape just because some producers emit
it would lock in two-format ambiguity in the consumer indefinitely.
The producer side is expected to align with the spec.

The string `schema_version` tolerance and number-to-string coercion
are kept (those are pure widening of accepted types for the same
field, not a structurally different shape).

Error message for a bad `components` entry now points the producer
at the spec format with a concrete example.
…an.h

Three independent build breakages on the v4-package branch, all caught
by the post-merge CI on Linux GCC + Android Clang:

1. `Model::AssetFolder` was declared returning `std::filesystem::path`
   in `model.h` but defined returning `fs::path` (the project's custom
   class in `src/filesystem.h`). MSVC happened to accept the mismatch
   in some configs but GCC/Clang correctly reject it as
   "out-of-line definition differs from declaration", and downstream
   `path / fs::path(...)` callsites get an "invalid operands" error
   because the two types aren't interoperable. Align the declaration
   with the definition: `fs::path`.

2. `fs::path::operator/(const path&)` was non-const while its
   `operator/(const std::string&)` sibling was const. This made
   `const fs::path asset = ...; asset / fs::path("x");` fail with
   "passing const fs::path as 'this' discards qualifiers". Make the
   overload const and rename the parameter to avoid shadowing.

3. `model_package.h` was including `<span>` directly. The Android NDK
   build runs in `USE_CXX17` mode where `std::span` is provided by the
   project's polyfill in `src/span.h` (under `namespace std`). Other
   files reach `std::span` via that header (e.g. `generators.h`);
   model_package.h needs to do the same so the public
   `VariantEpCompatibility` signature compiles on Android.
The doc comment block before -clearProvidersWithError: was missing its
opening /** marker, leaving stray ' *' lines that clang-format flagged
as ill-formed Objective-C. Restore the canonical /** ... */ form so
lint-cpp passes.
Config's two-arg ctor takes 'const fs::path&' (the project's custom
filesystem::path class). Apple Clang correctly rejects passing
std::filesystem::path because that's two user-defined conversions
in one implicit sequence. Convert through .string() so we end up with
the type Config actually expects.

Was masked on Linux/Windows builds until commit 1fdf7b8 fixed the
AssetFolder type mismatch in src/, which let test/ compilation start
running (and surface this).

Mirror change in both helper-defining test files.
The JSON streaming parser (src/json.cpp Parse_Value) seeds the root
element by calling OnObject / OnArray / OnValue with an empty name when
the document's top-level value is an object/array/scalar. Subsequent
keys inside that root object then arrive as named OnObject / OnValue
calls on whatever element OnObject("") returns.

MetadataRoot_Element only handled name == "variants" and threw
unknown_value_error for everything else, including the empty-name root
seed. As a result every metadata.json parse failed at line 1 index 1
with "Unknown value \"\"" before any real key was inspected.

Return *this for the empty-name seed so the existing OnObject("variants")
handler dispatches normally on the root object's first real key. Other
unknown root keys still throw, preserving strict schema enforcement.

This is the runtime regression that surfaced once the Linux ARM64 build
got past the AssetFolder / span / operator/ compile fixes and actually
ran unit_tests; all ConfigPackageTest and ModelPackageContextTest cases
that exercise metadata.json parsing now go through.
The new test files (config_helpers_test, config_package_test,
json_merge_patch_test, model_package_test, runtime_settings_test) are
the first unit tests in this repo to depend on internal Generators::*
C++ symbols compiled into onnxruntime-genai.dll. All previously
existing tests in the Generators::test namespace are header-only,
template-only, or pure static_assert harnesses, so the question of
exposing internal symbols never arose.

On Linux/macOS the default ELF/Mach-O visibility makes those symbols
linkable without ceremony. On Windows MSVC, however, only symbols
explicitly marked __declspec(dllexport) appear in the import library;
internal C++ helpers like Generators::Config's new (fs::path,
string_view[, string_view]) constructors, JSON::ParseDocument,
ModelPackageContext::Open, OverlayConfig, EnsurePackageProvider,
RuntimeSettings::GenerateConfigOverlay, etc. were therefore unresolved
at unit_tests link time:

  config_helpers_test.obj : error LNK2019: unresolved external symbol
  Generators::Config::Config(fs::path const &, string_view)
  ...
  unit_tests.exe : fatal error LNK1120: 12 unresolved externals

Turn on CMake's WINDOWS_EXPORT_ALL_SYMBOLS for the onnxruntime-genai
target on Windows. CMake auto-generates a .def file from the object
files for non-templated, non-inline symbols that are not already
explicitly marked __declspec(dllexport). The OGA_* C API entry points
keep their explicit dllexport (so the public DLL contract is unchanged)
while the import library now also resolves the internal C++ helpers
the test binary needs.

Affects only the developer-facing import library; runtime DLL contents
are unchanged.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants