Skip to content

[Bug] regression: Vulkan memory management error after master-691-563137a #1659

@wbruna

Description

@wbruna

Git commit

563137a (master-691-563137a)

Operating System & Version

Debian 13, radv 25.2.6

GGML backends

Vulkan

Command-line arguments used

./sd-cli --backend Vulkan1 --diffusion-model z_image_turbo_bf16.safetensors --llm Qwen3-4B-UD-Q4_K_XL.gguf --vae ./ae_bf16.safetensors -p flower --cfg-scale 1 --steps 4 --offload-to-cpu --mmap

Steps to reproduce

From release master-691-563137a, the command-line above (standard Z-Image Turbo and Flux.1 VAE bf16 weights, Qwen3-4b quant from Unsloth) fails on Vulkan. Same parameters and models work fine on the previous commit.

What you expected to happen

offloading working as before; this is master-690-3a54597:

[INFO ] model_loader.cpp:913  - memory-mapped 606 tensors in 3 files (13856.51 MB), taking 0.00s
  |====================>                             | 453/1095 - 17.32MB/s
  |======================================>           | 851/1095 - 208.93MB/s
  |==================================================| 1095/1095 - 239.24MB/s
[INFO ] model_loader.cpp:1167 - loading tensors completed, taking 1.67s (read: 0.16s, memcpy: 0.00s, convert: 0.22s, copy_to_backend: 0.00s)
[INFO ] stable-diffusion.cpp:1149 - total params memory size = 1583.87MB (VRAM 0.00MB, RAM 1583.87MB): text_encoders 1483.75MB(RAM), diffusion_model 7.30MB(RAM), vae 92.82MB(RAM), controlnet 0.00MB(N/A), extensions 0.00MB(N/A)
[INFO ] stable-diffusion.cpp:1254 - running in FLOW mode
[INFO ] stable-diffusion.cpp:4407 - generate_image 512x512
[INFO ] denoiser.hpp:579  - get_sigmas with discrete scheduler
[INFO ] stable-diffusion.cpp:3461 - sampling using Euler method
[INFO ] ggml_extend.hpp:2158 - qwen3 offload params (3602.16 MB, 398 tensors) to runtime backend (Vulkan1), taking 5.53s
[INFO ] stable-diffusion.cpp:4164 - get_learned_condition completed, taking 5.99s
[INFO ] stable-diffusion.cpp:4441 - generating image: 1/1 - seed 42
[INFO ] ggml_extend.hpp:2158 - z_image offload params (11743.02 MB, 453 tensors) to runtime backend (Vulkan1), taking 11.90s
  |==================================================| 4/4 - 6.39s/it
[INFO ] stable-diffusion.cpp:4473 - sampling completed, taking 25.76s
[INFO ] stable-diffusion.cpp:4491 - generating 1 latent images completed, taking 25.76s

Peak VRAM usage is around 11G (16G card).

What actually happened

an out-of-memory crash:

[INFO ] stable-diffusion.cpp:520  - Weight type stat:                      f32: 145  |    q4_K: 154  |    q5_K: 30   |    q6_K: 49   |  iq4_xs: 20   |    bf16: 697  
[INFO ] stable-diffusion.cpp:521  - Conditioner weight type stat:          f32: 145  |    q4_K: 154  |    q5_K: 30   |    q6_K: 49   |  iq4_xs: 20   
[INFO ] stable-diffusion.cpp:522  - Diffusion model weight type stat:     bf16: 453  
[INFO ] stable-diffusion.cpp:523  - VAE weight type stat:                 bf16: 244  
[INFO ] stable-diffusion.cpp:930  - using VAE for encoding / decoding
[INFO ] auto_encoder_kl.hpp:525  - vae decoder: ch = 128
[INFO ] stable-diffusion.cpp:1151 - total params memory size = 15439.76MB (VRAM 0.00MB, RAM 15439.76MB): text_encoders 3602.16MB(RAM), diffusion_model 11743.02MB(RAM), vae 94.57MB(RAM), controlnet 0.00MB(N/A), extensions 0.00MB(N/A)
[INFO ] stable-diffusion.cpp:1251 - running in FLOW mode
[INFO ] stable-diffusion.cpp:4364 - generate_image 512x512
[INFO ] denoiser.hpp:579  - get_sigmas with discrete scheduler
[INFO ] stable-diffusion.cpp:3417 - sampling using Euler method
[ERROR] model_manager.cpp:581  - model manager tensor 'text_encoders.llm.model.embed_tokens.weight' is too large for params buffer: 1555824640 > 1073741824
[ERROR] ggml_extend.hpp:1893 - qwen3 prepare graph weights failed
src/conditioning/conditioner.hpp:1719: GGML_ASSERT(!hidden_states.empty()) failed
[New LWP 1908342]
[New LWP 1908341]
[New LWP 1908340]
[New LWP 1908339]
[New LWP 1908334]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
__syscall_cancel_arch () at ../sysdeps/unix/sysv/linux/x86_64/syscall_cancel.S:56
warning: 56     ../sysdeps/unix/sysv/linux/x86_64/syscall_cancel.S: Arquivo ou diretório inexistente
#0  __syscall_cancel_arch () at ../sysdeps/unix/sysv/linux/x86_64/syscall_cancel.S:56
56      in ../sysdeps/unix/sysv/linux/x86_64/syscall_cancel.S
#1  0x00007fdc9d49b668 in __internal_syscall_cancel (a1=<optimized out>, a2=<optimized out>, a3=<optimized out>, a4=<optimized out>, a5=a5@entry=0, a6=a6@entry=0, nr=61) at ./nptl/cancellation.c:49
warning: 49     ./nptl/cancellation.c: Arquivo ou diretório inexistente
#2  0x00007fdc9d49b6ad in __syscall_cancel (a1=<optimized out>, a2=<optimized out>, a3=<optimized out>, a4=<optimized out>, a5=a5@entry=0, a6=a6@entry=0, nr=61) at ./nptl/cancellation.c:75
75      in ./nptl/cancellation.c
#3  0x00007fdc9d5067c7 in __GI___wait4 (pid=<optimized out>, stat_loc=<optimized out>, options=<optimized out>, usage=<optimized out>) at ../sysdeps/unix/sysv/linux/wait4.c:30
warning: 30     ../sysdeps/unix/sysv/linux/wait4.c: Arquivo ou diretório inexistente
#4  0x0000559eb1beb82b in ggml_print_backtrace ()
#5  0x0000559eb1beb97e in ggml_abort ()
#6  0x0000559eb0fd30db in LLMEmbedder::encode_prompt(int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<int, int> const&, int, int, std::vector<std::pair<int, sd::Tensor<float> >, std::allocator<std::pair<int, sd::Tensor<float> > > > const&, std::set<int, std::less<int>, std::allocator<int> > const&, int, bool, int) [clone .isra.0] ()
#7  0x0000559eb0fd3b8e in LLMEmbedder::get_learned_condition(int, ConditionerParams const&) ()
#8  0x0000559eb0e47d2e in generate_image ()
#9  0x0000559eb0d04006 in main ()
[Inferior 1 (process 1908333) detached]

master-691-563137a works with ROCm on the same card.

Logs / error messages / stack trace

No response

Additional context / environment details

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions