support kv quant/offload by gushiqiao · Pull Request #1035 · ModelTC/LightX2V

gushiqiao · 2026-04-23T03:17:22Z

No description provided.

gemini-code-assist

Code Review

This pull request introduces a comprehensive KV cache management system for autoregressive transformer inference, supporting rolling windows, quantization (int8/fp8), and CPU offloading via asynchronous CUDA streams. Key additions include a new KVCacheManager, specialized cache pools (e.g., OffloadQuantRollingKVCachePool), and Triton kernels for efficient quantization and rescaling. The feedback identifies critical issues regarding device-agnostic code, specifically hardcoded device strings and global capability checks that could lead to runtime errors in multi-GPU environments. Additionally, there are performance concerns regarding synchronous CPU-GPU transfers caused by calling .item() on GPU tensors within the inference loop.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

gushiqiao and others added 5 commits April 23, 2026 03:15

support kv quant/offload

4c75be0

Update transformer_infer.py

36f378a

Update lingbot_fast_model.py

75ed4f8

Update wan_audio_runner.py

dba299c

Uncomment torchaudio import in ops.py

5f076f3

gemini-code-assist Bot reviewed Apr 23, 2026

View reviewed changes

Comment thread lightx2v/common/kvcache/quant.py Outdated

Comment thread lightx2v/common/ops/attn/sage_attn.py Outdated

Comment thread lightx2v/common/kvcache/offload.py Outdated

Comment thread lightx2v/common/kvcache/rolling.py

gushiqiao and others added 3 commits April 23, 2026 12:17

Update lightx2v/common/kvcache/quant.py

da4ffff

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update lightx2v/common/ops/attn/sage_attn.py

a31578f

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update lightx2v/common/kvcache/offload.py

02d3fbc

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

llmc-reviewer approved these changes Apr 23, 2026

View reviewed changes

gushiqiao merged commit d7ec87f into main Apr 23, 2026
2 checks passed

gushiqiao deleted the gsq/kvcache branch April 23, 2026 04:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support kv quant/offload#1035

support kv quant/offload#1035
gushiqiao merged 8 commits into
mainfrom
gsq/kvcache

gushiqiao commented Apr 23, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

gushiqiao commented Apr 23, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants