Skip to content

Latest commit

 

History

History
152 lines (95 loc) · 5.92 KB

File metadata and controls

152 lines (95 loc) · 5.92 KB

Backend selection

stable-diffusion.cpp has two backend assignments:

  • --backend selects the runtime backend used to execute model graphs.
  • --params-backend selects where model parameters are kept.

If --params-backend is not set, parameters use the same backend as their module runtime backend.

Syntax

A backend assignment can be a single backend name:

sd-cli -m model.safetensors -p "a cat" --backend cpu

This applies to every module that does not have a more specific assignment.

Assignments can also target individual modules:

sd-cli -m model.safetensors -p "a cat" --backend te=cpu,vae=cuda0,diffusion=vulkan0

The same syntax is used for parameter placement:

sd-cli -m model.safetensors -p "a cat" --backend cuda0 --params-backend te=cpu,vae=cpu

--params-backend also accepts the special value disk:

sd-cli -m model.safetensors -p "a cat" --backend cuda0 --params-backend disk

--max-vram can target resolved backend/device names:

sd-cli -m model.safetensors -p "a cat" --backend diffusion=cuda0,vae=vulkan0 --max-vram cuda0=6,vulkan0=2

The budget applies to every module running on that backend.

Module names are case-insensitive. Hyphens and underscores in module names are ignored, so clip_vision, clip-vision, and clipvision are equivalent.

all=, default=, and *= can be used to set the default backend inside a mixed assignment:

sd-cli -m model.safetensors -p "a cat" --backend all=cuda0,te=cpu

Modules

Module Purpose Accepted names
diffusion UNet, DiT, MMDiT, Flux, Wan, Qwen Image, and other diffusion models diffusion, model, unet, dit
te Text encoders and conditioners te, clip, text, textencoder, textencoders, conditioner, cond, llm, t5, t5xxl
clip_vision CLIP vision encoder clip_vision, clipvision, clip-vision, vision
vae VAE and TAE vae, firststage, autoencoder, tae
controlnet ControlNet controlnet, control
photomaker PhotoMaker ID encoder and PhotoMaker LoRA photomaker, photomakerid, pmid, photo
upscaler ESRGAN upscaler upscaler, esrgan, hires

te is the preferred module name for text encoders. clip is kept as an accepted alias because many existing commands and model names use CLIP terminology.

Backend names

Backend names are resolved against the GGML backend device list. Matching is case-insensitive and accepts exact names or unique prefixes, so common values include names such as:

  • cpu
  • cuda0
  • vulkan0
  • metal

The special values auto, default, and an empty backend name select the default backend. The default preference is GPU, then integrated GPU, then CPU.

The special value gpu selects the first GPU backend, falling back to the first integrated GPU backend.

The special value disk is accepted only by --params-backend. --backend disk is invalid because disk is a parameter residency mode, not a runtime compute backend.

Runtime backend vs. parameter backend

The runtime backend controls where graph execution runs. The parameter backend controls where model weights are allocated or whether they are reloaded from disk on demand.

For example:

sd-cli -m model.safetensors -p "a cat" --backend cuda0 --params-backend cpu

This runs all modules on cuda0, but stores parameters in CPU RAM. During execution, parameters are moved to the runtime backend as needed.

For example:

sd-cli -m model.safetensors -p "a cat" --backend cuda0 --params-backend disk

This runs all modules on cuda0, reloads parameters from the model file as needed, and releases those parameter buffers after use.

disk is never selected implicitly. If --params-backend is not set, parameters use the runtime backend.

Per-module assignments can be mixed:

sd-cli -m model.safetensors -p "a cat" --backend diffusion=cuda0,te=cpu,vae=cpu --params-backend diffusion=cuda0,te=cpu,vae=cpu

This keeps text encoding and VAE execution on CPU while the diffusion model runs on GPU.

Backend sharing and lifetime

Backends are managed by SDBackendManager.

Within one manager, backend instances are cached by resolved backend device name. If multiple modules request the same backend, they share the same ggml_backend_t.

For example:

--backend te=cpu,vae=cpu

uses one shared CPU backend for both te and vae runtime execution.

Runtime and parameter assignments also share the same backend cache. If --backend diffusion=cuda0 and --params-backend diffusion=cuda0 resolve to the same device, both use the same backend instance.

--params-backend disk does not create a separate backend instance. Parameters are loaded lazily using the module runtime backend.

SDBackendManager owns the backend instances and frees them when the context or upscaler is destroyed. Model runners receive non-owning runtime and parameter backend pointers and do not free them.

Compatibility flags

The example CLI/server still accepts these older CPU placement flags as compatibility aliases:

  • --clip-on-cpu
  • --vae-on-cpu
  • --control-net-cpu
  • --offload-to-cpu

--clip-on-cpu, --vae-on-cpu, and --control-net-cpu are deprecated. The example argument layer prepends te=cpu, vae=cpu, and controlnet=cpu to --backend before creating the context.

--offload-to-cpu prepends a CPU default to the parameter assignment in the caller before creating the context:

--params-backend '*=cpu'

Because this default is inserted first, later explicit --params-backend entries can still override it, for example --offload-to-cpu --params-backend te=disk keeps non-TE parameters on CPU and reloads TE parameters from disk.

Library callers should set backend and params_backend directly. The old CPU/offload fields are no longer part of the C API. Explicit --backend and --params-backend assignments are preferred for new commands.