Skip to content

Commit 9239963

Browse files
authored
More complete DMR support (#103)
The `dmr` provider now supports: - Proper context length setup using `max_tokens` - `temperature`, `top_p`, `frequency_penalty`, `presence_penalty` all get mapped into the proper runtime flags based on the engine in use (for now only `llama.cpp` mappings); - Raw runtime flags to pass to the inference engine in use, via `provider_opts:runtime_flags` Configuration example supported with these changes: ```yaml models: root: provider: dmr model: ai/qwen3:14B-Q6_K max_tokens: 32768 temperature: 0.7 top_p: 0.95 frequency_penalty: 0.2 presence_penalty: 0.1 provider_opts: runtime_flags: | --batch-size 1024 --ubatch-size 512 ``` closes #71 Signed-off-by: Christopher Petito <chrisjpetito@gmail.com>
1 parent 2d87ce9 commit 9239963

6 files changed

Lines changed: 484 additions & 18 deletions

File tree

README.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -183,6 +183,22 @@ models:
183183
184184
You'll find a curated list of agents examples, spread into 3 categories, [Basic](https://github.com/docker/cagent/tree/main/examples#basic-configurations), [Advanced](https://github.com/docker/cagent/tree/main/examples#advanced-configurations) and [multi-agents](https://github.com/docker/cagent/tree/main/examples#multi-agent-configurations) in the `/examples/` directory.
185185

186+
### DMR provider options
187+
188+
When using the `dmr` provider, you can use the `provider_opts` key for DMR runtime-specific (e.g. llama.cpp) options:
189+
190+
```yaml
191+
models:
192+
local-qwen:
193+
provider: dmr
194+
model: ai/qwen3
195+
max_tokens: 8192
196+
provider_opts:
197+
runtime_flags: ["--ngl=33", "--repeat-penalty=1.2", ...] # or comma/space-separated string
198+
```
199+
200+
The default base_url `cagent` will use for dmr providers is `http://localhost:12434/engines/llama.cpp/v1`. DMR itself might need to be enabled via [Docker Desktop's settings](https://docs.docker.com/ai/model-runner/get-started/#enable-dmr-in-docker-desktop) on MacOS and Windows, and via command line on [Docker CE on Linux](https://docs.docker.com/ai/model-runner/get-started/#enable-dmr-in-docker-engine).
201+
186202
## Quickly generate agents and agent teams with `cagent new`
187203

188204
Using the command `cagent new` you can quickly generate agents or multi-agent teams using a single prompt!

docs/USAGE.md

Lines changed: 98 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -165,6 +165,104 @@ models:
165165
model: ai/qwen3
166166
```
167167

168+
#### DMR (Docker Model Runner) provider usage
169+
170+
If `base_url` is omitted, cagent will use `http://localhost:12434/engines/llama.cpp/v1` by default
171+
172+
You can pass DMR runtime (e.g. llama.cpp) options using
173+
```
174+
models:
175+
provider: dmr
176+
provider_opts:
177+
runtime_flags:
178+
```
179+
The context length is taken from `max_tokens` at the model level:
180+
181+
```yaml
182+
models:
183+
local-qwen:
184+
provider: dmr
185+
model: ai/qwen3
186+
max_tokens: 8192
187+
# base_url: omitted -> auto-discovery via Docker Model plugin
188+
provider_opts:
189+
runtime_flags: ["--ngl=33", "--top-p=0.9"]
190+
```
191+
192+
`runtime_flags` also accepts a single string with comma or space separation:
193+
194+
```yaml
195+
models:
196+
local-qwen:
197+
provider: dmr
198+
model: ai/qwen3
199+
max_tokens: 8192
200+
provider_opts:
201+
runtime_flags: "--ngl=33 --top-p=0.9"
202+
```
203+
204+
Explicit `base_url` example with multiline runtime_flags string:
205+
206+
```yaml
207+
models:
208+
local-qwen:
209+
provider: dmr
210+
model: ai/qwen3
211+
base_url: http://127.0.0.1:12434/engines/llama.cpp/v1
212+
provider_opts:
213+
runtime_flags: |
214+
--ngl=33
215+
--top-p=0.9
216+
```
217+
218+
Requirements and notes:
219+
220+
- Docker Model plugin must be available for auto-configure/auto-discovery
221+
- Verify with: `docker model status --json`
222+
- Configuration is best-effort; failures fall back to the default base URL
223+
- `provider_opts` is currently scoped to the `dmr` provider only
224+
- `runtime_flags` are passed after `--` to the inference runtime (e.g., llama.cpp)
225+
226+
Parameter mapping and precedence (DMR):
227+
228+
- `ModelConfig` fields are translated into engine-specific runtime flags. For e.g. with the `llama.cpp` backend:
229+
- `temperature` → `--temp`
230+
- `top_p` → `--top-p`
231+
- `frequency_penalty` → `--frequency-penalty`
232+
- `presence_penalty` → `--presence-penalty`
233+
...
234+
- `provider_opts.runtime_flags` always take priority over derived flags on conflict. When a conflict is detected, cagent logs a warning indicating the overridden flag. `max_tokens` is the only exception for now
235+
236+
Examples:
237+
238+
```yaml
239+
models:
240+
local-qwen:
241+
provider: dmr
242+
model: ai/qwen3
243+
temperature: 0.5 # derives --temp 0.5
244+
top_p: 0.9 # derives --top-p 0.9
245+
max_tokens: 8192 # sets --context-size=8192
246+
provider_opts:
247+
runtime_flags: ["--temp", "0.7", "--threads", "8"] # overrides derived --temp, sets --threads
248+
```
249+
250+
```yaml
251+
models:
252+
local-qwen:
253+
provider: dmr
254+
model: ai/qwen3
255+
provider_opts:
256+
runtime_flags: "--ngl=33 --repeat-penalty=1.2" # string accepted as well
257+
```
258+
259+
Troubleshooting:
260+
261+
- Plugin not found: cagent will log a debug message and use the default base URL
262+
- Endpoint empty in status: ensure the Model Runner is running, or set `base_url` manually
263+
- Flag parsing: if using a single string, quote properly in YAML; you can also use a list
264+
265+
168266
### Alloy models
169267

170268
"Alloy models" essentially means using more than one model in the same chat context. Not at the same time, but "randomly" throughout the conversation to try to take advantage of the strong points of each model.

pkg/config/config.go

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -130,9 +130,11 @@ func migrateToLatestConfig(c any) v1.Config {
130130
}
131131

132132
func validateConfig(cfg *v1.Config) error {
133-
for _, model := range cfg.Models {
134-
if model.ParallelToolCalls == nil {
135-
model.ParallelToolCalls = boolPtr(true)
133+
for name := range cfg.Models {
134+
if cfg.Models[name].ParallelToolCalls == nil {
135+
m := cfg.Models[name]
136+
m.ParallelToolCalls = boolPtr(true)
137+
cfg.Models[name] = m
136138
}
137139
}
138140

pkg/config/v1/types.go

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -114,6 +114,8 @@ type ModelConfig struct {
114114
ParallelToolCalls *bool `json:"parallel_tool_calls,omitempty" yaml:"parallel_tool_calls,omitempty"`
115115
Env map[string]string `json:"env,omitempty" yaml:"env,omitempty"`
116116
TokenKey string `json:"token_key,omitempty" yaml:"token_key,omitempty"`
117+
// ProviderOpts allows provider-specific options. Currently used for "dmr" provider only.
118+
ProviderOpts map[string]any `json:"provider_opts,omitempty" yaml:"provider_opts,omitempty"`
117119
}
118120

119121
// Config represents the entire configuration file

0 commit comments

Comments
 (0)