model package integration#2136
Open
xiaoyu-work wants to merge 24 commits into
Open
Conversation
Adds a lightweight in-memory JSON DOM (Document / Object / Array) on top of the existing streaming parser, plus serializer and RFC 7386 JSON Merge Patch. This is the foundation for v4 model-package per-variant overlays, which carry a JSON Merge Patch in consumer_metadata.genai_config_overlay and need to be merged into the package-shipped base genai_config.json before the existing streaming-parser-driven Config loader sees it. The new APIs are additive — every existing JSON::Element-based config loader is unaffected. The DOM is intentionally minimal (no fancy number representations, no object-key insertion order, no schema validation); its only consumers are Config-time overlay merging and the unit tests. Also fixes a pre-existing off-by-one in the streaming parser's Skip(literal) bounds check that silently dropped a top-level true / false / null literal when it sat at the very end of the input buffer. The bug never surfaced in real genai_config.json files because every keyword there is followed by a comma, brace or bracket, but the new DOM round-trip tests exercise it directly. Tests: test/json_merge_patch_test.cpp covers the full RFC 7386 example table plus GenAI-relevant shapes (overlay overrides context_length and I/O names; pipeline arrays replace wholesale; null deletes optional fields) and a small set of edge cases beyond the RFC.
Two related JSON correctness fixes flagged in PR review: 1. SerializeImpl integral fast-path computed `v == static_cast<double>(static_cast<long long>(v))` BEFORE checking that v fit in long long. Casting an out-of-range double to long long is undefined behavior (e.g. v = 1e19 with LLONG_MAX = 2^63-1). Reorder so the bounds check runs first, then test integrality with std::modf — no cast required, no UB possible. Use a strict `<` on the upper bound because static_cast<double>(LLONG_MAX) rounds up to 2^63 (LLONG_MAX itself is not exactly representable in double); admitting v = 2^63 would still overflow the cast. 2. SerializeImpl streamed non-finite doubles via operator<<, producing `nan` / `inf` tokens that are not valid JSON (RFC 8259 §6 restricts the grammar to the finite reals). NaN / Inf can sneak into a Document via direct construction or via strtod parser overflow. Throw on non-finite doubles in the serializer, and detect strtod overflow on parse so we never admit ±Inf into the DOM in the first place. The from_chars branch already throws on result_out_of_range. Tests added: - 1e19 round-trips through the precision-17 path (no UB). - SerializeDocument(Inf|−Inf|NaN) throws. - ParseDocument(`1e500`) and ParseDocument(`-1e500`) throw.
Adds a GenAI-internal abstraction over the ORT v4 model-package layout
(src/models/model_package.{h,cpp}), with a stub directory-walker
implementation that interprets the on-disk format described in the v4
spec. The abstraction is dead code today; W3 wires Config into it. When
ORT v4 lands, an alternate implementation will delegate to
OrtModelPackageContext while keeping this surface unchanged.
The abstract surface mirrors the v4 C API:
* Pre-selection traversal: NumComponents/ComponentName,
NumVariants/VariantName, VariantEpCompatibility (with optional
device discriminator + free-form compatibility strings),
EpsCompatibleWith convenience for EP defaulting.
* SelectComponent takes ModelPackageSelectionOptions (an ordered
EpSelection list with optional device), implementing the spec's
captured-EP-priority + metadata-declaration-order tie-break
algorithm. Empty captured list defaults to [CPUExecutionProvider]
per spec.
* ComponentInstance exposes only what the v4 cix handle does:
VariantFolderPath, FileCount, ConsumerMetadata (raw blob, returned
verbatim), and ResolveSharedWeight(checksum).
Per-file detail (filename, session_options, provider_options,
shared_files) is intentionally NOT on ComponentInstance — consumers
parse variant.json directly via the standalone ParseVariantManifest
helper, mirroring the v4 contract that ORT does not expose per-file
accessors.
Layout the stub walker recognizes (matches Appendix A of the spec):
<package>/
+- manifest.json optional
+- configs/ shared assets bucket
+- <component>/
+- metadata.json selection-only metadata
+- shared_weights/<checksum>/<blob> per-component shared weights
+- <variant>/
+- variant.json files[], shared_files,
consumer_metadata
Detection rule (intentionally conservative to keep flat-dir fallback
reliable): treat as v4 iff manifest.json exists OR at least one
non-reserved direct child contains metadata.json. The bare presence
of configs/ is NOT a positive signal.
Variant declaration order is sourced from metadata.json (preserved via
the streaming JSON::Element parser, since JSON::Object = std::map does
not preserve insertion order). Filesystem order is never authoritative.
Path-traversal validation is applied to every package-controlled
fragment (component name, variant name, file filename, shared-weight
checksum) so malformed packages cannot escape the package root.
Tests in test/model_package_test.cpp cover detection, manifest
validation (string-array shape, schema_version, dedup, traversal
rejection, listed-component-missing-metadata), variant ordering,
EP-compatibility traversal with device + compatibility strings,
selection (priority, device-aware match, tie-break, empty-priority
CPU default), ConsumerMetadata round-trip, FileCount, and
ResolveSharedWeight (success, missing, zero-blob, multi-blob,
traversal). ParseVariantManifest is covered for files[] order,
typed session_options preservation, shared_files map, and rejection
of malformed/missing inputs.
Add test/runtime_settings_test.cpp covering the two layer-2 channels that
override genai_config.json values at runtime:
1. RuntimeSettings::GenerateConfigOverlay() — handle-driven JSON
emitted at Model::Create. Today only `dawnProcTable` is recognised
and projected into model.decoder.session_options.provider_options[*]
.WebGPU.dawnProcTable.
2. OverlayConfig() / OgaConfigOverlay() — caller-driven JSON exposed
through C#, Java, Python, and ObjC bindings.
The new tests:
* Verify GenerateConfigOverlay returns an empty string when no handles
are set (so callers can feed it unconditionally into the merge).
* Verify unrelated handles are ignored.
* Pin the exact JSON path that dawnProcTable lands at — every keyword
on that path is asserted explicitly so a future refactor can't
silently break WebGPU on Windows.
* Verify OverlayConfig applies a search.top_k override and leaves
unrelated search fields alone, including a re-apply scenario.
* End-to-end: generate the WebGPU overlay, parse it through
OverlayConfig on a real Config, and confirm a WebGPU provider_options
entry with the dawnProcTable option appears with the expected value.
Also adds inline doc comments on RuntimeSettings::GenerateConfigOverlay
and OverlayConfig describing the layer-2 contract and pointing at the
regression suite. No production code changes.
Introduce a new fs::path field next to Config::config_path to address the directory holding tokenizer files, processor JSON, chat template, and other assets that are common across components (decoder/vision/speech/encoder/...). In flat-dir mode this field is initialized to config_path in Config::Config(path, overlay), so behavior is identical. In a future ORT v4 package mode the package loader will set it to <pkg>/configs/. Rewrite all call sites that load shared assets to use shared_assets_path: - Tokenizer (model.cpp:308) and constrained_logits_processor (tokenizer.json). - Vision/multimodal processor configs in gemma/gemma4/mistral3/phi/phi-multimodal/qwen2_5_vl. - Whisper speech processor config. Per-component model assets (ONNX, LoRA, custom_ops, external data) and EP-private paths (OpenVINO cache_dir, RyzenAI/VitisAI model_root) are intentionally NOT migrated — they are addressed per-component variant folder and require deeper schema work in a follow-up workstream. This is preparatory plumbing for the ORT v4 model-package format integration.
Five non-decoder session-creation paths (marian encoder, whisper encoder,
multi_modal vision/speech/embedding) inline the same fallback ternary:
config_->model.<component>.session_options.has_value()
? config_->model.<component>.session_options.value()
: config_->model.decoder.session_options
Replace it with a single free helper, EffectiveSessionOptions(config,
component_session_options), declared in config.h next to OverlayConfig
and the other layer-2 free helpers.
Behavior is identical to the inlined form (verified by unit test, which
asserts the returned reference aliases the underlying optional or
decoder storage rather than copying). Behavior change is zero on any
existing flat-directory model.
The motivation is layered: today the helper is purely cosmetic, but in
the v4 model-package world the same call sites need a single seam where
the layer-1 (variant.json files[].session_options) baseline gets merged
in front of the today-only layer-2 (genai_config) view. Centralising
the lookup here means the v4 work touches one function, not five
ternaries scattered across three files.
Tests: test/config_helpers_test.cpp covers the three contract bullets
that matter for the call sites — fallback identity to decoder when the
component is nullopt, identity to the component's stored value when set,
and live-aliasing of the decoder fallback (so future refactors don't
accidentally start copying the SessionOptions struct).
Branch `Config::Config(path, overlay)` on whether `path` is a v4 model
package (`ModelPackageContext::Open()` returns non-null). For a flat-dir
model the existing strict-streaming-parser path is unchanged.
For a v4 package:
1. `shared_assets_path` is set to `<package>/configs/` so all
subsequent shared-asset loaders (tokenizer, processor configs,
chat-template) read from the package's configs/ bucket.
2. EP defaulting (`ComputeEpDefaulting`) intersects every component's
`EpsCompatibleWith()` set, ordered by the first component's
first-seen list for determinism. Empty intersection or >1 survivor
throws with a helpful diagnostic listing the candidate EPs and the
per-component compatibility matrix; explicit per-component selection
is reserved for the public-API `ep` argument (W8).
3. Each component's selected variant is fetched via `SelectComponent`
and stored in `Config::component_instances` keyed by component name.
`consumer_metadata.genai_config_overlay` (if present) must be a
JSON object and is RFC 7386 merged into the base
`configs/genai_config.json` DOM.
4. The merged DOM is serialized and run through the same streaming
parser as the flat-dir path, so all field-validation (unknown keys,
type mismatches, range checks) still applies. The caller-supplied
`json_overlay` is applied as the final layer-2 override
(`OgaConfigOverlay` / `RuntimeSettings::GenerateConfigOverlay`).
5. `ValidateRoleComponentReferences` walks every
`model.<role>.component` field and rejects references to component
names the package didn't ship.
Concrete additions:
* Adds `std::string component` to all five role sub-structs
(`Encoder`, `Decoder`, `Vision`, `Speech`, `Embedding`) plus
matching `OnValue("component", ...)` handlers in their parsers so
the streaming parser accepts the new key.
* Adds `Config::model_package` (`shared_ptr<ModelPackageContext>`)
and `Config::component_instances` (`unordered_map<name,
shared_ptr<ComponentInstance>>`). `shared_ptr` keeps Config copy-
constructible, which `OgaCreateModelFromConfig` relies on; the
package state is read-only after construction so sharing is safe.
* Refactors `ParseConfig` to expose `ParseConfigFromText`, used by
both the file-based flat-dir loader and the package-overlay-merge
path (which feeds an in-memory string).
Adds `test/config_package_test.cpp` covering: detection, per-component
overlay merge, caller layer-2 wins over package overlay, role->component
reference validation, EP defaulting (empty/single/multi intersection),
all five role.component fields parseable, and the flat-dir regression.
Wired so far: nothing else. Subsequent commits (W4-real, W5a/b, W6, W7,
W8, W10) drive `shared_assets_path` and `component_instances` into
the path-rewrite, session-construction, pipeline-runner and public-API
layers.
This was referenced May 6, 2026
4d4d367 to
2ae4225
Compare
Adds the EP-injection wiring that connects the v4 package's selected
variant (from W3's component_instances map) to the existing GenAI
session-options dispatch table (in SetProviderSessionOptions).
Three pieces:
1. ComponentInstance::SelectedEp() — new abstract accessor returning the
canonical ORT EP name (e.g. 'CUDAExecutionProvider') of the variant
chosen by SelectComponent. StubComponentInstance populates it from
priority[best_priority].ep_name, the very entry that just won.
2. EpNameToProviderTag() / EnsurePackageProvider() in session_options.h
— canonical-EP -> internal-tag map mirroring the dispatch table
('CUDAExecutionProvider' -> 'cuda', etc.) plus an idempotent helper
that rotates the package's tag to the front of providers and
backfills a matching ProviderOptions entry. CPU/unknown -> no-op.
3. Model::CreateSessionOptions hooks the helper before the
CreateSessionOptionsFromConfig call when model_package is non-null
and decoder.component resolves to a known instance. Behaviour for
flat-dir Configs is unchanged (model_package == null short-circuits
at the first conjunct).
Why the rotate-to-front semantics: the dispatch loop honours providers
in order and stops at the first one whose AppendExecutionProvider
returns a DeviceInterface — putting the package's EP first guarantees
it wins p_device_ even if a runtime overlay added a competing entry.
Tests: 8 new EnsurePackageProvider unit tests in
test/config_helpers_test.cpp (mapping, insert-at-front, rotate,
idempotency, CPU no-op, unknown no-op, ProviderOptions backfill); two
additional EXPECT_EQ()s on SelectedEp() in the existing
model_package_test.cpp SelectComponentPicksMatchingVariant case.
Plumbs the per-component variant_folder (set by W3 on the package's
ComponentInstance) through all single-file session-creation paths so
ONNX models, LoRA adapters, and custom-ops libraries load from the
selected variant directory instead of a single flat config_path.
Mechanism:
1. Model::AssetFolder(component_name) — central resolver. Empty
component, missing instance, or flat-dir Config -> config_path.
In package mode with a known component -> the instance's
VariantFolderPath().
2. Model::CreateSession gains an optional component_name parameter.
Both the file-system path and the DirGuard.ChangeTo() workaround
for external-data resolution now use AssetFolder(component_name)
instead of config_path. Default empty preserves flat-dir behaviour.
3. Model::CreateSessionOptionsFromConfig gains an optional
component_name parameter. The custom-ops library first-try
resolution uses AssetFolder(component_name) instead of config_path
(EP-library and cwd fallback paths unchanged).
Caller migrations (each passes its model.<role>.component):
decoder_only.cpp / gpt.cpp / decoder_only_pipeline.cpp ->
decoder.component
marian.cpp / whisper.cpp -> encoder.component + decoder.component
multi_modal.cpp -> vision + speech + embedding + decoder components
(incl. LoRA adapter path resolution at lines 712/716)
qwen_vl_model.cpp -> vision.component (find_stage lambda)
Model::CreateSessionOptions -> decoder.component for both the
primary call and pipeline_model sub-calls
Untouched (out of v4 surface):
silero_vad.cpp / nemotron_speech.cpp -> default-empty component
preserves existing flat-dir behaviour.
EP-private paths (OpenVINO cache_dir, RyzenAI/VitisAI model_root)
remain on config_path; W6/W7 will revisit when each EP's
session_options handler gets package context.
No behaviour change for flat-dir Configs: AssetFolder() returns
config_path when component_instances is empty (Config without
manifest.json) or the component name lookup misses.
Wires the v4 model package's shared-weight blob mechanism into the
decoder-only pipeline runner. ONNX models in a multi-file variant
can declare external initializers via variant.json's per-file
shared_files map; the runner reads the referenced blobs into memory
and registers them with the session via ORT's
AddExternalInitializersFromFilesInMemory.
Pieces:
1. OrtSessionOptions::AddExternalInitializersFromFilesInMemory wrapper
in onnxruntime_api.h / onnxruntime_inline.h, mirroring ORT's
1.18+ C API. Buffer pointers must outlive session creation; the
wrapper takes (names, buffers, lengths) parallel arrays and
delegates lifetime management to the caller.
2. Model::ApplyPackageExternalInitializers(component, filename, so):
lazy-parses and caches the variant.json for component, looks
up the matching file entry, ResolveSharedWeight()s each blob,
reads them into Model-owned buffer storage, and calls the new
wrapper. Empty/unknown component, file-not-in-manifest, or empty
shared_files all no-op (flat-dir Configs are unaffected).
3. Model gains two private members for this:
- variant_manifests_ (per-component VariantManifest cache)
- external_initializer_buffers_ (shared-blob byte storage that
stays alive for the Model's lifetime).
4. DecoderOnlyPipelineModel constructor: between selecting the
stage's OrtSessionOptions and CreateSession, calls
ApplyPackageExternalInitializers for each stage's filename so
external initializers are registered before ORT consumes the
session_options during session creation.
Out of scope (deferred):
* Per-file session_options/provider_options baseline merge from
variant.json (spec-defined layer-1 SO/PO). The current decoder
pipeline still derives SO/PO from genai_config layer-2; the file-
entry's session_options and provider_options JSON::Object
fields parsed by W2's ParseVariantManifest are not yet applied.
W10's end-to-end fixture will demonstrate need or its absence.
* Shared-options multi-stage isolation: when multiple pipeline
stages share the main session_options (genai_config did not
define per-stage session_options) and declare distinct
shared_files, registrations accumulate on the shared object. In
practice v4 packages with shared_files declare per-stage
session_options. Hardening by cloning to per-stage on demand can
follow if a real package surfaces the case.
… sessions Multi-modal models in v4 package mode can have their vision, speech, and embedding components on different EPs from the decoder. The decoder is the model's primary session and the only one that should set `p_device_`. * For each non-decoder role (vision/speech/embedding), in package mode with a non-empty `model.<role>.component`, ensure the role's `Config::SessionOptions` slot exists and inject the captured EP via `EnsurePackageProvider` (mirrors the decoder logic in W5b). * Pass `is_primary_session_options=false` for those role sessions in package mode so their DeviceInterface doesn't conflict with the decoder's. Flat-dir mode keeps the legacy `true` behavior (all roles fall back to `decoder.session_options`). No-op for flat-dir Configs and for empty role components in package mode.
Add an explicit-EP entry point to GenAI's public surface so users can select the execution provider for v4 model packages without re-packaging or relying on per-component metadata defaulting. C API (NULL or empty `ep` falls back to defaulting; existing entry points unchanged): * `OgaCreateConfigWithEp(path, ep, &config)` * `OgaCreateModelWithEp(path, ep, &model)` * `OgaCreateModelWithRuntimeSettingsAndEp(path, settings, ep, &model)` Internal plumbing: * `Config::Config(path, overlay, user_ep)` 3-arg overload — the 2-arg overload delegates with empty `user_ep`. * `LoadFromPackage` and `ComputeEpDefaulting` accept `user_ep`; when non-empty, defaulting is bypassed and the user's choice becomes the captured EP. Per-component `SelectComponent` failures now list the component's compatible EPs to make typos and unsupported-EP cases easy to debug. * `Generators::CreateModel(env, path, settings, user_ep)` plumbs the argument into `Config`. * Flat-directory (legacy) mode rejects a non-empty `user_ep` with a clear diagnostic pointing at the existing `OgaConfigClearProviders` / `OgaConfigAppendProvider` channel. Language bindings: * C++ wrapper (`ort_genai.h`): new `OgaConfig::Create(path, ep)`, `OgaModel::Create(path, ep)`, `OgaModel::Create(path, settings, ep)` overloads. * Python: `og.Config(path, ep=...)` / `og.Model(path, ep=...)` via a new pybind11 init taking the optional EP string. * C#: new `Config(string, string)` and `Model(string, string)` ctors plus matching `DllImport` declarations. * Java: `Config(String, String)` / `Model(String, String)` ctors with JNI implementations that handle a null jstring without invoking `GetStringUTFChars` on it (undefined behaviour). * Objective-C: `initWithPath:ep:error:` initializers on `OGAConfig` and `OGAModel`; `nil` ep falls back to defaulting. Selected EP propagates into the effective provider list via the W5b `EnsurePackageProvider` hook, so package variant selection and the ORT session's provider list cannot diverge under user-supplied `ep`. Updates the W3 multi-EP defaulting diagnostic to point at this new argument as the resolution channel.
Adds four targeted tests for the W8 `ep` plumbing: * `UserEpBypassesDefaultingInPackage` — multi-EP package that would fail defaulting now loads cleanly when the user names an EP. * `UserEpThatNoComponentSupportsThrowsWithDiagnostic` — the per- component diagnostic lists the component's compatible EPs. * `EmptyUserEpFallsBackToDefaulting` — empty string equivalent to the legacy 2-arg constructor. * `UserEpInFlatDirThrows` — flat-dir mode rejects a non-empty `ep` with a message pointing at the legacy provider-list channel. JSON merge_patch (RFC 7386) and EP defaulting under defaulting are already covered by the existing `json_merge_patch_test.cpp` (31 tests) and the multi/empty/single-EP cases earlier in this file.
Removes `W3:` / `W5a-soft:` / `W5b` / `W6` / `W7` / `W8:` references from source-tree and test comments. These were internal sequencing labels and don't belong in shipping code; the surrounding prose already explains the v4-package context.
Contributor
There was a problem hiding this comment.
Pull request overview
This pull request adds first-class support for loading v4 model packages through Generators::Config, including package detection, per-component overlay merging (RFC 7386 merge patch), and an explicit “execution provider (EP)” selection API surface that’s plumbed through the C API and all language bindings. It also refactors asset resolution so shared assets (tokenizer/processor configs/templates) are resolved via Config::shared_assets_path, while per-component ONNX/assets resolve via a component-aware asset folder.
Changes:
- Add v4 model-package abstraction (
ModelPackageContext), variant selection/defaulting, and config overlay merge pipeline (baseconfigs/genai_config.json+ per-componentconsumer_metadata.genai_config_overlay+ caller overlay). - Introduce a lightweight JSON DOM with RFC 7386 merge-patch support, plus improved parse diagnostics and number handling.
- Extend public APIs/bindings with an optional
epargument and update model/session creation to be package-aware (asset folders, providers injection, etc.), with extensive new unit tests.
Reviewed changes
Copilot reviewed 48 out of 48 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| test/runtime_settings_test.cpp | Adds regression tests for RuntimeSettings-generated JSON overlays and OverlayConfig behavior. |
| test/model_package_test.cpp | Adds unit tests for v4 package detection, manifest/metadata parsing, variant traversal, EP selection, consumer metadata, and shared weights. |
| test/json_merge_patch_test.cpp | Adds tests for JSON DOM parse/serialize + RFC 7386 merge patch semantics and edge cases. |
| test/config_package_test.cpp | Adds end-to-end tests for Config’s v4 package loading, overlay merge, EP defaulting/explicit EP, and role->component validation. |
| test/config_helpers_test.cpp | Adds tests for EffectiveSessionOptions and EnsurePackageProvider/EpNameToProviderTag helpers. |
| src/runtime_settings.h | Documents runtime-only handle projection into layer-2 config overlays. |
| src/python/python.cpp | Adds ep overloads/args for Python Config and Model constructors. |
| src/ort_genai.h | Adds C++ wrapper overloads for creating Config/Model with explicit EP (and with runtime settings + EP). |
| src/ort_genai_c.h | Adds new C API entrypoints: OgaCreateConfigWithEp, OgaCreateModelWithEp, OgaCreateModelWithRuntimeSettingsAndEp. |
| src/ort_genai_c.cpp | Implements the new C API entrypoints and plumbs ep through to Generators::Config / CreateModel. |
| src/objectivec/oga_model.mm | Adds Objective-C initializer to create a model with an explicit EP. |
| src/objectivec/oga_config.mm | Adds Objective-C initializer to create a config with an explicit EP. |
| src/objectivec/include/ort_genai_objc.h | Documents Objective-C EP overloads for Config/Model. |
| src/models/whisper.cpp | Switches to EffectiveSessionOptions and component-aware session creation/options for encoder/decoder. |
| src/models/whisper_processor.cpp | Resolves processor config via shared_assets_path (package-aware). |
| src/models/session_options.h | Declares EpNameToProviderTag + EnsurePackageProvider helpers for package EP injection. |
| src/models/session_options.cpp | Implements EpNameToProviderTag + EnsurePackageProvider (provider/provider_options consistency). |
| src/models/qwen2_5_vl_image_processor.cpp | Resolves vision processor config via shared_assets_path. |
| src/models/qwen_vl_model.cpp | Resolves pipeline stage assets via component-aware asset folder; passes component into session-options creation. |
| src/models/phi_multimodal_processor.cpp | Resolves vision/audio processor configs via shared_assets_path. |
| src/models/phi_image_processor.cpp | Resolves vision processor config via shared_assets_path. |
| src/models/onnxruntime_inline.h | Adds wrapper for ORT API AddExternalInitializersFromFilesInMemory. |
| src/models/onnxruntime_api.h | Declares OrtSessionOptions::AddExternalInitializersFromFilesInMemory. |
| src/models/multi_modal.cpp | Makes secondary sessions package-aware (EP injection, non-primary semantics, component-aware asset resolution). |
| src/models/model.h | Adds component-aware AssetFolder, package external initializer plumbing + caches. |
| src/models/model.cpp | Routes shared assets to shared_assets_path, adds package EP injection, component-aware session creation, and shared-weight external initializer support. |
| src/models/model_package.h | Introduces the v4 model-package abstraction, selection options, and variant.json parsing interface. |
| src/models/model_package.cpp | Implements a stub filesystem-backed v4 package loader, parsers, and selection logic. |
| src/models/mistral3_image_processor.cpp | Resolves vision processor config via shared_assets_path. |
| src/models/marian.cpp | Switches to EffectiveSessionOptions and component-aware session creation/options for encoder/decoder. |
| src/models/gpt.cpp | Uses component-aware session creation for decoder. |
| src/models/gemma4_multimodal_processor.cpp | Resolves vision/audio processor configs via shared_assets_path. |
| src/models/gemma_image_processor.cpp | Resolves vision processor config via shared_assets_path. |
| src/models/decoder_only.cpp | Uses component-aware session creation for decoder. |
| src/models/decoder_only_pipeline.cpp | Adds package external initializer registration for pipeline stages; makes stage path resolution component-aware. |
| src/json.h | Adds JSON DOM types and declares ParseDocument/SerializeDocument/MergePatch. |
| src/json.cpp | Implements JSON DOM, serializer, RFC 7386 MergePatch, and improves numeric parsing/overflow handling. |
| src/java/src/main/native/ai_onnxruntime_genai_Model.cpp | Adds JNI binding for creating a Model with explicit EP (null-safe). |
| src/java/src/main/native/ai_onnxruntime_genai_Config.cpp | Adds JNI binding for creating a Config with explicit EP (null-safe). |
| src/java/src/main/java/ai/onnxruntime/genai/Model.java | Adds Java constructor overload for explicit EP selection. |
| src/java/src/main/java/ai/onnxruntime/genai/Config.java | Adds Java constructor overload for explicit EP selection. |
| src/generators.h | Extends CreateModel signature to accept optional user_ep. |
| src/csharp/NativeMethods.cs | Adds P/Invoke declarations for OgaCreateConfigWithEp / OgaCreateModelWithEp. |
| src/csharp/Model.cs | Adds C# Model(string path, string ep) overload. |
| src/csharp/Config.cs | Adds C# Config(string path, string ep) overload. |
| src/constrained_logits_processor.cpp | Resolves tokenizer path via shared_assets_path for package compatibility. |
| src/config.h | Extends Config with role component fields, shared_assets_path, package state, and EffectiveSessionOptions declaration. |
| src/config.cpp | Implements v4 package load/merge/defaulting/validation path; adds ParseConfigFromText and package-aware config construction. |
Comments suppressed due to low confidence (1)
src/objectivec/include/ort_genai_objc.h:84
- There’s a stray
*line (and dangling doc text) after the newinitWithPath:ep:error:declaration for OGAConfig (lines 81-83). It’s outside of a comment block, so this header won’t compile. Remove those lines or wrap them back into a proper/** ... */comment (likely the doc for clearProvidersWithError).
* `SelectComponent`: a device-pinned `ep_compatibility` entry now requires the caller's priority entry to also pin the same device. Previously a device-less caller silently matched a device-pinned variant (could pick an NPU-only build for a CPU caller). Extended the existing OpenVINO GPU/NPU test with a negative case. * `Model::CreateSession`: now invokes `ApplyPackageExternalInitializers` itself before `OrtSession::Create` (no-op in flat-dir / when no shared_files declared). This makes the registration apply to every session — decoder, encoder, vision, speech, embedding, and recreated pipeline stages — not just the initial DecoderOnlyPipelineModel constructor calls. * `IntermediatePipelineState::Run`: lazy session recreation now routes through `Model::CreateSession` so it inherits the same external-initializer registration as the initial creation. * `Config` flat-dir + non-empty user_ep error message: dropped the `directories containing manifest.json` qualification. Manifest-less v4 packages are detected via component-dir `metadata.json` too, so the old wording was misleading. * Removed the now-redundant `ApplyPackageExternalInitializers` call from the `DecoderOnlyPipelineModel` constructor (one path now).
Real-world v4 model packages from current producer toolchains ship a
richer manifest than the canonical
{ "schema_version": 1, "components": ["decoder"] }:
{
"schema_version": "1.0",
"components": [{"name": "decoder", "metadata": "decoder/metadata.json"}]
}
Forgive the parser:
* schema_version: spec-canonical type is now a string ("1" or "1.0").
Numeric values are coerced to their integer-string form so historic
producers writing 1 still load. Strings other than "1" / "1.0"
remain rejected; non-integer numbers also remain rejected.
* components entries: accept either a bare string (canonical) or an
object with a name field (extra metadata / description / version
fields are ignored - on-disk layout stays conventional).
Anything else throws.
Tests: flipped the legacy-rejection test to positive coverage of
object-form components, and added 4 schema_version cases
(string "1.0", number 1, unsupported "2", unchanged number 99).
Adds ManifestRealWorldProducerShape mirroring the manifest currently produced for phi-4-mini-reasoning.v4.ortpackage (Olive / Foundry-Local toolchain). Combines string schema_version, object-form components, extra top-level fields, and a metadata.json carrying schema_version / component_name siblings to the variants map. This test would have failed against the parser before the previous commit's compatibility shim. Keeping it as a regression so future parser tightening stays compatible with packages already in the wild.
Reverts the earlier object-form tolerance for `components` entries. The spec is clear that components are bare strings; supporting an alternate object-with-name shape just because some producers emit it would lock in two-format ambiguity in the consumer indefinitely. The producer side is expected to align with the spec. The string `schema_version` tolerance and number-to-string coercion are kept (those are pure widening of accepted types for the same field, not a structurally different shape). Error message for a bad `components` entry now points the producer at the spec format with a concrete example.
…an.h
Three independent build breakages on the v4-package branch, all caught
by the post-merge CI on Linux GCC + Android Clang:
1. `Model::AssetFolder` was declared returning `std::filesystem::path`
in `model.h` but defined returning `fs::path` (the project's custom
class in `src/filesystem.h`). MSVC happened to accept the mismatch
in some configs but GCC/Clang correctly reject it as
"out-of-line definition differs from declaration", and downstream
`path / fs::path(...)` callsites get an "invalid operands" error
because the two types aren't interoperable. Align the declaration
with the definition: `fs::path`.
2. `fs::path::operator/(const path&)` was non-const while its
`operator/(const std::string&)` sibling was const. This made
`const fs::path asset = ...; asset / fs::path("x");` fail with
"passing const fs::path as 'this' discards qualifiers". Make the
overload const and rename the parameter to avoid shadowing.
3. `model_package.h` was including `<span>` directly. The Android NDK
build runs in `USE_CXX17` mode where `std::span` is provided by the
project's polyfill in `src/span.h` (under `namespace std`). Other
files reach `std::span` via that header (e.g. `generators.h`);
model_package.h needs to do the same so the public
`VariantEpCompatibility` signature compiles on Android.
The doc comment block before -clearProvidersWithError: was missing its opening /** marker, leaving stray ' *' lines that clang-format flagged as ill-formed Objective-C. Restore the canonical /** ... */ form so lint-cpp passes.
Config's two-arg ctor takes 'const fs::path&' (the project's custom filesystem::path class). Apple Clang correctly rejects passing std::filesystem::path because that's two user-defined conversions in one implicit sequence. Convert through .string() so we end up with the type Config actually expects. Was masked on Linux/Windows builds until commit 1fdf7b8 fixed the AssetFolder type mismatch in src/, which let test/ compilation start running (and surface this). Mirror change in both helper-defining test files.
The JSON streaming parser (src/json.cpp Parse_Value) seeds the root
element by calling OnObject / OnArray / OnValue with an empty name when
the document's top-level value is an object/array/scalar. Subsequent
keys inside that root object then arrive as named OnObject / OnValue
calls on whatever element OnObject("") returns.
MetadataRoot_Element only handled name == "variants" and threw
unknown_value_error for everything else, including the empty-name root
seed. As a result every metadata.json parse failed at line 1 index 1
with "Unknown value \"\"" before any real key was inspected.
Return *this for the empty-name seed so the existing OnObject("variants")
handler dispatches normally on the root object's first real key. Other
unknown root keys still throw, preserving strict schema enforcement.
This is the runtime regression that surfaced once the Linux ARM64 build
got past the AssetFolder / span / operator/ compile fixes and actually
ran unit_tests; all ConfigPackageTest and ModelPackageContextTest cases
that exercise metadata.json parsing now go through.
The new test files (config_helpers_test, config_package_test, json_merge_patch_test, model_package_test, runtime_settings_test) are the first unit tests in this repo to depend on internal Generators::* C++ symbols compiled into onnxruntime-genai.dll. All previously existing tests in the Generators::test namespace are header-only, template-only, or pure static_assert harnesses, so the question of exposing internal symbols never arose. On Linux/macOS the default ELF/Mach-O visibility makes those symbols linkable without ceremony. On Windows MSVC, however, only symbols explicitly marked __declspec(dllexport) appear in the import library; internal C++ helpers like Generators::Config's new (fs::path, string_view[, string_view]) constructors, JSON::ParseDocument, ModelPackageContext::Open, OverlayConfig, EnsurePackageProvider, RuntimeSettings::GenerateConfigOverlay, etc. were therefore unresolved at unit_tests link time: config_helpers_test.obj : error LNK2019: unresolved external symbol Generators::Config::Config(fs::path const &, string_view) ... unit_tests.exe : fatal error LNK1120: 12 unresolved externals Turn on CMake's WINDOWS_EXPORT_ALL_SYMBOLS for the onnxruntime-genai target on Windows. CMake auto-generates a .def file from the object files for non-templated, non-inline symbols that are not already explicitly marked __declspec(dllexport). The OGA_* C API entry points keep their explicit dllexport (so the public DLL contract is unchanged) while the import library now also resolves the internal C++ helpers the test binary needs. Affects only the developer-facing import library; runtime DLL contents are unchanged.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request adds support for loading and validating v4 model packages in the configuration system, including explicit execution provider (EP) selection and per-component overlays. It introduces new logic for merging and validating configuration data from model packages, ensures robust error handling, and updates the
Configstructure to support these new features.Support for v4 Model Packages and Execution Provider Selection:
Config Structure Enhancements:
Configstructure and its sub-structs (Encoder,Decoder,Vision,Speech,Embedding) to include a newcomponentfield, which maps each role to a specific package component. [1] [2] [3] [4] [5]shared_assets_pathtoConfig, distinguishing between flat-directory and package modes for asset resolution.JSON Parsing and Overlay Improvements:
ParseConfigFromTexthelper to centralize config parsing from text with improved error diagnostics and overlay support.OverlayConfigto allow applying overlays to existing configs.Codebase Maintenance:
<memory>,<unordered_map>, and<unordered_set>. [1] [2] [3]JSON Deserialization Updates:
componentfield for each relevant role during deserialization. [1] [2] [3] [4] [5]