[Bug] regression: Vulkan memory management error after master-691-563137a

### Git commit

563137a5926ac9455420240c2a7d0f3f15eb9bd0 (master-691-563137a)

### Operating System & Version

Debian 13, radv 25.2.6

### GGML backends

Vulkan

### Command-line arguments used

./sd-cli --backend Vulkan1 --diffusion-model z_image_turbo_bf16.safetensors --llm Qwen3-4B-UD-Q4_K_XL.gguf --vae ./ae_bf16.safetensors -p flower --cfg-scale 1 --steps 4 --offload-to-cpu --mmap

### Steps to reproduce

From release `master-691-563137a`, the command-line above (standard Z-Image Turbo and Flux.1 VAE bf16 weights, Qwen3-4b quant from Unsloth) fails on Vulkan. Same parameters and models work fine on the previous commit.













### What you expected to happen

offloading working as before; this is `master-690-3a54597`:

```
[INFO ] model_loader.cpp:913  - memory-mapped 606 tensors in 3 files (13856.51 MB), taking 0.00s
  |====================>                             | 453/1095 - 17.32MB/s
  |======================================>           | 851/1095 - 208.93MB/s
  |==================================================| 1095/1095 - 239.24MB/s
[INFO ] model_loader.cpp:1167 - loading tensors completed, taking 1.67s (read: 0.16s, memcpy: 0.00s, convert: 0.22s, copy_to_backend: 0.00s)
[INFO ] stable-diffusion.cpp:1149 - total params memory size = 1583.87MB (VRAM 0.00MB, RAM 1583.87MB): text_encoders 1483.75MB(RAM), diffusion_model 7.30MB(RAM), vae 92.82MB(RAM), controlnet 0.00MB(N/A), extensions 0.00MB(N/A)
[INFO ] stable-diffusion.cpp:1254 - running in FLOW mode
[INFO ] stable-diffusion.cpp:4407 - generate_image 512x512
[INFO ] denoiser.hpp:579  - get_sigmas with discrete scheduler
[INFO ] stable-diffusion.cpp:3461 - sampling using Euler method
[INFO ] ggml_extend.hpp:2158 - qwen3 offload params (3602.16 MB, 398 tensors) to runtime backend (Vulkan1), taking 5.53s
[INFO ] stable-diffusion.cpp:4164 - get_learned_condition completed, taking 5.99s
[INFO ] stable-diffusion.cpp:4441 - generating image: 1/1 - seed 42
[INFO ] ggml_extend.hpp:2158 - z_image offload params (11743.02 MB, 453 tensors) to runtime backend (Vulkan1), taking 11.90s
  |==================================================| 4/4 - 6.39s/it
[INFO ] stable-diffusion.cpp:4473 - sampling completed, taking 25.76s
[INFO ] stable-diffusion.cpp:4491 - generating 1 latent images completed, taking 25.76s
```

Peak VRAM usage is around 11G (16G card).


### What actually happened

an out-of-memory crash:

```
[INFO ] stable-diffusion.cpp:520  - Weight type stat:                      f32: 145  |    q4_K: 154  |    q5_K: 30   |    q6_K: 49   |  iq4_xs: 20   |    bf16: 697  
[INFO ] stable-diffusion.cpp:521  - Conditioner weight type stat:          f32: 145  |    q4_K: 154  |    q5_K: 30   |    q6_K: 49   |  iq4_xs: 20   
[INFO ] stable-diffusion.cpp:522  - Diffusion model weight type stat:     bf16: 453  
[INFO ] stable-diffusion.cpp:523  - VAE weight type stat:                 bf16: 244  
[INFO ] stable-diffusion.cpp:930  - using VAE for encoding / decoding
[INFO ] auto_encoder_kl.hpp:525  - vae decoder: ch = 128
[INFO ] stable-diffusion.cpp:1151 - total params memory size = 15439.76MB (VRAM 0.00MB, RAM 15439.76MB): text_encoders 3602.16MB(RAM), diffusion_model 11743.02MB(RAM), vae 94.57MB(RAM), controlnet 0.00MB(N/A), extensions 0.00MB(N/A)
[INFO ] stable-diffusion.cpp:1251 - running in FLOW mode
[INFO ] stable-diffusion.cpp:4364 - generate_image 512x512
[INFO ] denoiser.hpp:579  - get_sigmas with discrete scheduler
[INFO ] stable-diffusion.cpp:3417 - sampling using Euler method
[ERROR] model_manager.cpp:581  - model manager tensor 'text_encoders.llm.model.embed_tokens.weight' is too large for params buffer: 1555824640 > 1073741824
[ERROR] ggml_extend.hpp:1893 - qwen3 prepare graph weights failed
src/conditioning/conditioner.hpp:1719: GGML_ASSERT(!hidden_states.empty()) failed
[New LWP 1908342]
[New LWP 1908341]
[New LWP 1908340]
[New LWP 1908339]
[New LWP 1908334]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
__syscall_cancel_arch () at ../sysdeps/unix/sysv/linux/x86_64/syscall_cancel.S:56
warning: 56     ../sysdeps/unix/sysv/linux/x86_64/syscall_cancel.S: Arquivo ou diretório inexistente
#0  __syscall_cancel_arch () at ../sysdeps/unix/sysv/linux/x86_64/syscall_cancel.S:56
56      in ../sysdeps/unix/sysv/linux/x86_64/syscall_cancel.S
#1  0x00007fdc9d49b668 in __internal_syscall_cancel (a1=<optimized out>, a2=<optimized out>, a3=<optimized out>, a4=<optimized out>, a5=a5@entry=0, a6=a6@entry=0, nr=61) at ./nptl/cancellation.c:49
warning: 49     ./nptl/cancellation.c: Arquivo ou diretório inexistente
#2  0x00007fdc9d49b6ad in __syscall_cancel (a1=<optimized out>, a2=<optimized out>, a3=<optimized out>, a4=<optimized out>, a5=a5@entry=0, a6=a6@entry=0, nr=61) at ./nptl/cancellation.c:75
75      in ./nptl/cancellation.c
#3  0x00007fdc9d5067c7 in __GI___wait4 (pid=<optimized out>, stat_loc=<optimized out>, options=<optimized out>, usage=<optimized out>) at ../sysdeps/unix/sysv/linux/wait4.c:30
warning: 30     ../sysdeps/unix/sysv/linux/wait4.c: Arquivo ou diretório inexistente
#4  0x0000559eb1beb82b in ggml_print_backtrace ()
#5  0x0000559eb1beb97e in ggml_abort ()
#6  0x0000559eb0fd30db in LLMEmbedder::encode_prompt(int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<int, int> const&, int, int, std::vector<std::pair<int, sd::Tensor<float> >, std::allocator<std::pair<int, sd::Tensor<float> > > > const&, std::set<int, std::less<int>, std::allocator<int> > const&, int, bool, int) [clone .isra.0] ()
#7  0x0000559eb0fd3b8e in LLMEmbedder::get_learned_condition(int, ConditionerParams const&) ()
#8  0x0000559eb0e47d2e in generate_image ()
#9  0x0000559eb0d04006 in main ()
[Inferior 1 (process 1908333) detached]
```

`master-691-563137a` works with ROCm on the same card. 


### Logs / error messages / stack trace

_No response_

### Additional context / environment details

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] regression: Vulkan memory management error after master-691-563137a #1659

Git commit

Operating System & Version

GGML backends

Command-line arguments used

Steps to reproduce

What you expected to happen

What actually happened

Logs / error messages / stack trace

Additional context / environment details

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Bug] regression: Vulkan memory management error after master-691-563137a #1659

Description

Git commit

Operating System & Version

GGML backends

Command-line arguments used

Steps to reproduce

What you expected to happen

What actually happened

Logs / error messages / stack trace

Additional context / environment details

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions