Skip to content

add http sse cosyvocie tts api and omni createItem function#203

Closed
songguocola wants to merge 2 commits into
dashscope:mainfrom
songguocola:dev/omni_0331
Closed

add http sse cosyvocie tts api and omni createItem function#203
songguocola wants to merge 2 commits into
dashscope:mainfrom
songguocola:dev/omni_0331

Conversation

@songguocola
Copy link
Copy Markdown
Contributor

No description provided.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces HTTP-based Text-to-Speech (TTS) synthesis support using Server-Sent Events (SSE), including new synthesizer implementations, data models, and usage examples. It also adds tool-calling functionality to the Omni Realtime service and implements connection timeouts for WebSocket-based clients. Review feedback highlights a high-severity issue where manual audio accumulation conflicts with the SDK's output mode, potentially corrupting audio streams. Additionally, the reviewer recommends addressing memory leak risks from ThreadLocal usage, using more specific exception types for timeouts, and adding null validation for tool-call items.

.protocol(Protocol.HTTP)
.httpMethod(HttpMethod.POST)
.streamingMode(StreamingMode.OUT)
.outputMode(OutputMode.ACCUMULATE)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Setting .outputMode(OutputMode.ACCUMULATE) while also manually accumulating audio data in a ByteArrayOutputStream (lines 82 and 165) may lead to incorrect results. In the DashScope SDK, OutputMode.ACCUMULATE typically causes each event to contain the full accumulated output so far. If the underlying API follows this behavior, manual accumulation via audioBuffer.write() will result in duplicated data and a corrupted audio stream. If you need to process chunks (deltas) for real-time playback, consider using the default output mode. If you need the full result at the end, rely on the SDK's accumulation instead of a manual buffer.

@Slf4j
public class HttpSpeechSynthesizer {

private final ThreadLocal<ByteBuffer> accumulatedAudioData = new ThreadLocal<>();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The use of ThreadLocal<ByteBuffer> to store accumulated audio data poses a significant memory leak risk, especially in environments using thread pools (e.g., web servers). Audio data can be large, and these buffers will remain attached to the threads until the next synthesis call occurs on the same thread. Additionally, since getAccumulatedAudioData() is public, it may lead to confusion if called from a different thread than the one that performed the synthesis, as it would return null. Since callAndReturnAudio already returns the ByteBuffer and streamCall uses a callback, this per-thread state is largely redundant. Consider removing it and providing the final result via the callback or letting the user handle accumulation.

connectLatch.set(new CountDownLatch(1));
connectLatch.get().await();
if (!connectLatch.get().await(60, TimeUnit.SECONDS)) {
throw new RuntimeException("Connection timed out after 60 seconds");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Throwing a generic RuntimeException for a connection timeout is not ideal for a library. It is better to throw a more specific exception, such as a custom ApiException or a standard java.util.concurrent.TimeoutException, to allow callers to handle this specific failure case more effectively.

*
* @param item item pass to server
*/
public void createItem(JsonObject item) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The item parameter should be validated for nullity before processing. Passing a null item to the protocol event could lead to unexpected server-side errors or NullPointerException during serialization.

  public void createItem(JsonObject item) {
    if (item == null) {
      throw new IllegalArgumentException("item cannot be null");
    }
    checkStatus();

connectLatch.set(new CountDownLatch(1));
connectLatch.get().await();
if (!connectLatch.get().await(60, TimeUnit.SECONDS)) {
throw new RuntimeException("Connection timed out after 60 seconds");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Throwing a generic RuntimeException for a connection timeout is not ideal for a library. It is better to throw a more specific exception, such as a custom ApiException or a standard java.util.concurrent.TimeoutException, to allow callers to handle this specific failure case more effectively.

@songguocola songguocola deleted the dev/omni_0331 branch April 14, 2026 09:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant