audio server improve. by hiworldwzj · Pull Request #1254 · ModelTC/LightLLM

hiworldwzj · 2026-04-02T10:57:55Z

No description provided.

gemini-code-assist

Code Review

This pull request refactors the audio server to support data and tensor parallelism, moving embedding cache management from the models to background worker threads in the RPC server. It also adds CLI configuration for audio resources and performance testing utilities. Feedback focuses on addressing potential resource leaks in the cleanup process, handling silent failures in background threads, removing redundant loops, improving concurrency when processing multiple data-parallel groups, and adding error handling for process initialization timeouts.

gemini-code-assist · 2026-04-02T10:59:14Z

lightllm/server/audioserver/manager.py

    def clean_up(self):
-        for model_rpc in self.model_rpcs:
-            model_rpc.rpc_server_process.kill()
-        for model_rpc in self.model_rpcs:
-            model_rpc.rpc_server_process.join()
        return


The clean_up method is now empty, but it should terminate the model RPC processes started in wait_to_model_ready. Additionally, the start_model_process function in model_infer/__init__.py no longer returns the process object, making it difficult to track and kill these processes. This could lead to resource leaks or zombie processes when the server is stopped.

gemini-code-assist · 2026-04-02T10:59:14Z

lightllm/server/audioserver/model_infer/model_rpc.py

+            except Exception as e:
+                logger.exception(str(e))
+                raise e


Exceptions in the _infer_worker or _store_worker threads will cause the threads to terminate. Since these are critical background workers, their failure will cause the audio server to stop functioning silently. Consider adding a mechanism to detect thread failure and either restart them or shut down the process gracefully.

gemini-code-assist · 2026-04-02T10:59:14Z

lightllm/models/qwen3_omni_moe_thinker/qwen3_omni_audio.py

+        all_embeds = []
+        for i in range(len(audio_items)):
            cur_embed = per_audio_features[i]
-            cpu_embed_cache_client.copy_to_cache(
-                embed_tensor=cur_embed, start_index_in_cache=item.start_index_in_embed_cache
-            )
-            assert (
-                item.token_num == cur_embed.shape[0]
-            ), f"audio token num not match {item.token_num} vs {cur_embed.shape[0]} "
-            ids_to_set.append(uid)
-
-        if ids_to_set:
-            self.cache_client.root.set_items_embed(ids=ids_to_set)
-            torch.cuda.current_stream().synchronize()
+            all_embeds.append(cur_embed)
+
+        return all_embeds, audio_items


The loop to build all_embeds is redundant because per_audio_features is already a list containing the embeddings for each item in audio_items in the same order.

Suggested change

all_embeds = []

for i in range(len(audio_items)):

cur_embed = per_audio_features[i]

cpu_embed_cache_client.copy_to_cache(

embed_tensor=cur_embed, start_index_in_cache=item.start_index_in_embed_cache

)

assert (

item.token_num == cur_embed.shape[0]

), f"audio token num not match {item.token_num} vs {cur_embed.shape[0]} "

ids_to_set.append(uid)

if ids_to_set:

self.cache_client.root.set_items_embed(ids=ids_to_set)

torch.cuda.current_stream().synchronize()

all_embeds.append(cur_embed)

return all_embeds, audio_items

return per_audio_features, audio_items

gemini-code-assist · 2026-04-02T10:59:14Z

lightllm/server/audioserver/manager.py

+        for dp_index in range(self.audio_dp):
+            _audios = dp_to_handle_audios[dp_index]
+            if _audios:
+                await asyncio.to_thread(_audios[-1][1].wait)


The current implementation waits for the completion of each DP group sequentially. If multiple DP groups are processing audio items from the same request, they should be waited on concurrently to improve performance.

Suggested change

for dp_index in range(self.audio_dp):

_audios = dp_to_handle_audios[dp_index]

if _audios:

await asyncio.to_thread(_audios[-1][1].wait)

wait_tasks = []

for dp_index in range(self.audio_dp):

_audios = dp_to_handle_audios[dp_index]

if _audios:

wait_tasks.append(asyncio.to_thread(_audios[-1][1].wait))

if wait_tasks:

await asyncio.gather(*wait_tasks)

gemini-code-assist · 2026-04-02T10:59:14Z

lightllm/server/audioserver/model_infer/__init__.py

+    await asyncio.to_thread(success_event.wait, timeout=40)
+    assert proc.is_alive()


The return value of success_event.wait is not checked. If the process fails to start within the 40-second timeout, the code will proceed to attempt a connection, which will likely fail or hang. It's better to explicitly check for the timeout.

Suggested change

await asyncio.to_thread(success_event.wait, timeout=40)

assert proc.is_alive()

if not await asyncio.to_thread(success_event.wait, timeout=40):

proc.terminate()

raise RuntimeError("Audio model inference process failed to start within timeout")

hiworldwzj added 11 commits April 2, 2026 05:35

fix

6794142

fix

b61e223

fix

d689ef1

fix

068a73a

fix

a4e574b

fix

4fc90cf

fix

ce10007

fix

cdca34e

fix

b082a0c

fix

067c2e9

fix

e50e4a0

gemini-code-assist bot reviewed Apr 2, 2026

View reviewed changes

fix

37bdc38

hiworldwzj merged commit 93dd2a6 into main Apr 2, 2026
1 check passed

hiworldwzj deleted the wzj_fix branch April 2, 2026 11:34

hiworldwzj restored the wzj_fix branch April 2, 2026 11:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

audio server improve.#1254

audio server improve.#1254
hiworldwzj merged 12 commits intomainfrom
wzj_fix

hiworldwzj commented Apr 2, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 2, 2026

Uh oh!

gemini-code-assist bot Apr 2, 2026

Uh oh!

gemini-code-assist bot Apr 2, 2026

Uh oh!

gemini-code-assist bot Apr 2, 2026

Uh oh!

gemini-code-assist bot Apr 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		await asyncio.to_thread(success_event.wait, timeout=40)
		assert proc.is_alive()

-    await asyncio.to_thread(success_event.wait, timeout=40)
-    assert proc.is_alive()
+    if not await asyncio.to_thread(success_event.wait, timeout=40):
+        proc.terminate()
+        raise RuntimeError("Audio model inference process failed to start within timeout")

Conversation

hiworldwzj commented Apr 2, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant