More complete DMR support (#103)

krissetto · web-flow · commit 923996341b70 · 2025-09-04T18:15:13.000+02:00
The `dmr` provider now supports: - Proper context length setup using `max_tokens` - `temperature`, `top_p`, `frequency_penalty`, `presence_penalty` all get mapped into the proper runtime flags based on the engine in use (for now only `llama.cpp` mappings); - Raw runtime flags to pass to the inference engine in use, via `provider_opts:runtime_flags` Configuration example supported with these changes: ```yaml models: root: provider: dmr model: ai/qwen3:14B-Q6_K max_tokens: 32768 temperature: 0.7 top_p: 0.95 frequency_penalty: 0.2 presence_penalty: 0.1 provider_opts: runtime_flags: | --batch-size 1024 --ubatch-size 512 ``` closes #71 Signed-off-by: Christopher Petito <chrisjpetito@gmail.com>
diff --git a/README.md b/README.md
@@ -183,6 +183,22 @@ models:
 
 You'll find a curated list of agents examples, spread into 3 categories, [Basic](https://github.com/docker/cagent/tree/main/examples#basic-configurations), [Advanced](https://github.com/docker/cagent/tree/main/examples#advanced-configurations) and [multi-agents](https://github.com/docker/cagent/tree/main/examples#multi-agent-configurations) in the `/examples/` directory.
 
+### DMR provider options
+
+When using the `dmr` provider, you can use the `provider_opts` key for DMR runtime-specific (e.g. llama.cpp) options:
+
+```yaml
+models:
+  local-qwen:
+    provider: dmr
+    model: ai/qwen3
+    max_tokens: 8192
+    provider_opts:
+      runtime_flags: ["--ngl=33", "--repeat-penalty=1.2", ...]  # or comma/space-separated string
+```
+
+The default base_url `cagent` will use for dmr providers is `http://localhost:12434/engines/llama.cpp/v1`. DMR itself might need to be enabled via [Docker Desktop's settings](https://docs.docker.com/ai/model-runner/get-started/#enable-dmr-in-docker-desktop) on MacOS and Windows, and via command line on [Docker CE on Linux](https://docs.docker.com/ai/model-runner/get-started/#enable-dmr-in-docker-engine).
+
 ## Quickly generate agents and agent teams with `cagent new`
 
 Using the command `cagent new` you can quickly generate agents or multi-agent teams using a single prompt!  
diff --git a/docs/USAGE.md b/docs/USAGE.md
@@ -165,6 +165,104 @@ models:
     model: ai/qwen3
 ```
 
+#### DMR (Docker Model Runner) provider usage
+
+If `base_url` is omitted, cagent will use `http://localhost:12434/engines/llama.cpp/v1` by default
+
+You can pass DMR runtime (e.g. llama.cpp) options using  
+```
+models:
+  provider: dmr
+  provider_opts: 
+    runtime_flags: 
+```  
+The context length is taken from `max_tokens` at the model level:
+
+```yaml
+models:
+  local-qwen:
+    provider: dmr
+    model: ai/qwen3
+    max_tokens: 8192
+    # base_url: omitted -> auto-discovery via Docker Model plugin
+    provider_opts:
+      runtime_flags: ["--ngl=33", "--top-p=0.9"]
+```
+
+`runtime_flags` also accepts a single string with comma or space separation:
+
+```yaml
+models:
+  local-qwen:
+    provider: dmr
+    model: ai/qwen3
+    max_tokens: 8192
+    provider_opts:
+      runtime_flags: "--ngl=33 --top-p=0.9"
+```
+
+Explicit `base_url` example with multiline runtime_flags string:
+
+```yaml
+models:
+  local-qwen:
+    provider: dmr
+    model: ai/qwen3
+    base_url: http://127.0.0.1:12434/engines/llama.cpp/v1
+    provider_opts:
+      runtime_flags: |
+        --ngl=33
+        --top-p=0.9
+```
+
+Requirements and notes:
+
+- Docker Model plugin must be available for auto-configure/auto-discovery
+  - Verify with: `docker model status --json`
+- Configuration is best-effort; failures fall back to the default base URL
+- `provider_opts` is currently scoped to the `dmr` provider only
+- `runtime_flags` are passed after `--` to the inference runtime (e.g., llama.cpp)
+
+Parameter mapping and precedence (DMR):
+
+- `ModelConfig` fields are translated into engine-specific runtime flags. For e.g. with the `llama.cpp` backend:
+  - `temperature` → `--temp`
+  - `top_p` → `--top-p`
+  - `frequency_penalty` → `--frequency-penalty`
+  - `presence_penalty` → `--presence-penalty`
+  ...
+- `provider_opts.runtime_flags` always take priority over derived flags on conflict. When a conflict is detected, cagent logs a warning indicating the overridden flag. `max_tokens` is the only exception for now
+
+Examples:
+
+```yaml
+models:
+  local-qwen:
+    provider: dmr
+    model: ai/qwen3
+    temperature: 0.5            # derives --temp 0.5
+    top_p: 0.9                  # derives --top-p 0.9
+    max_tokens: 8192            # sets --context-size=8192
+    provider_opts:
+      runtime_flags: ["--temp", "0.7", "--threads", "8"]  # overrides derived --temp, sets --threads
+```
+
+```yaml
+models:
+  local-qwen:
+    provider: dmr
+    model: ai/qwen3
+    provider_opts:
+      runtime_flags: "--ngl=33 --repeat-penalty=1.2"  # string accepted as well
+```
+
+Troubleshooting:
+
+- Plugin not found: cagent will log a debug message and use the default base URL
+- Endpoint empty in status: ensure the Model Runner is running, or set `base_url` manually
+- Flag parsing: if using a single string, quote properly in YAML; you can also use a list
+
+
 ### Alloy models
 
 "Alloy models" essentially means using more than one model in the same chat context. Not at the same time, but "randomly" throughout the conversation to try to take advantage of the strong points of each model.
diff --git a/pkg/config/config.go b/pkg/config/config.go
@@ -130,9 +130,11 @@ func migrateToLatestConfig(c any) v1.Config {
 }
 
 func validateConfig(cfg *v1.Config) error {
-	for _, model := range cfg.Models {
-		if model.ParallelToolCalls == nil {
-			model.ParallelToolCalls = boolPtr(true)
+	for name := range cfg.Models {
+		if cfg.Models[name].ParallelToolCalls == nil {
+			m := cfg.Models[name]
+			m.ParallelToolCalls = boolPtr(true)
+			cfg.Models[name] = m
 		}
 	}
 
diff --git a/pkg/config/v1/types.go b/pkg/config/v1/types.go
@@ -114,6 +114,8 @@ type ModelConfig struct {
 	ParallelToolCalls *bool             `json:"parallel_tool_calls,omitempty" yaml:"parallel_tool_calls,omitempty"`
 	Env               map[string]string `json:"env,omitempty" yaml:"env,omitempty"`
 	TokenKey          string            `json:"token_key,omitempty" yaml:"token_key,omitempty"`
+	// ProviderOpts allows provider-specific options. Currently used for "dmr" provider only.
+	ProviderOpts map[string]any `json:"provider_opts,omitempty" yaml:"provider_opts,omitempty"`
 }
 
 // Config represents the entire configuration file
diff --git a/pkg/model/provider/dmr/client.go b/pkg/model/provider/dmr/client.go
diff --git a/pkg/model/provider/dmr/client_test.go b/pkg/model/provider/dmr/client_test.go

Original file line number	Diff line number	Diff line change
`@@ -130,9 +130,11 @@ func migrateToLatestConfig(c any) v1.Config {`
`130`	`130`	`}`
`131`	`131`
`132`	`132`	`func validateConfig(cfg *v1.Config) error {`
`133`		`- for _, model := range cfg.Models {`
`134`		`- if model.ParallelToolCalls == nil {`
`135`		`- model.ParallelToolCalls = boolPtr(true)`
	`133`	`+ for name := range cfg.Models {`
	`134`	`+ if cfg.Models[name].ParallelToolCalls == nil {`
	`135`	`+ m := cfg.Models[name]`
	`136`	`+ m.ParallelToolCalls = boolPtr(true)`
	`137`	`+ cfg.Models[name] = m`
`136`	`138`	`}`
`137`	`139`	`}`
`138`	`140`
Original file line number	Diff line number	Diff line change
`@@ -114,6 +114,8 @@ type ModelConfig struct {`
`114`	`114`	ParallelToolCalls *bool `json:"parallel_tool_calls,omitempty" yaml:"parallel_tool_calls,omitempty"`
`115`	`115`	Env map[string]string `json:"env,omitempty" yaml:"env,omitempty"`
`116`	`116`	TokenKey string `json:"token_key,omitempty" yaml:"token_key,omitempty"`
	`117`	`+ // ProviderOpts allows provider-specific options. Currently used for "dmr" provider only.`
	`118`	+ ProviderOpts map[string]any `json:"provider_opts,omitempty" yaml:"provider_opts,omitempty"`
`117`	`119`	`}`
`118`	`120`
`119`	`121`	`// Config represents the entire configuration file`