add http sse cosyvocie tts api and omni createItem function#203
add http sse cosyvocie tts api and omni createItem function#203songguocola wants to merge 2 commits into
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces HTTP-based Text-to-Speech (TTS) synthesis support using Server-Sent Events (SSE), including new synthesizer implementations, data models, and usage examples. It also adds tool-calling functionality to the Omni Realtime service and implements connection timeouts for WebSocket-based clients. Review feedback highlights a high-severity issue where manual audio accumulation conflicts with the SDK's output mode, potentially corrupting audio streams. Additionally, the reviewer recommends addressing memory leak risks from ThreadLocal usage, using more specific exception types for timeouts, and adding null validation for tool-call items.
| .protocol(Protocol.HTTP) | ||
| .httpMethod(HttpMethod.POST) | ||
| .streamingMode(StreamingMode.OUT) | ||
| .outputMode(OutputMode.ACCUMULATE) |
There was a problem hiding this comment.
Setting .outputMode(OutputMode.ACCUMULATE) while also manually accumulating audio data in a ByteArrayOutputStream (lines 82 and 165) may lead to incorrect results. In the DashScope SDK, OutputMode.ACCUMULATE typically causes each event to contain the full accumulated output so far. If the underlying API follows this behavior, manual accumulation via audioBuffer.write() will result in duplicated data and a corrupted audio stream. If you need to process chunks (deltas) for real-time playback, consider using the default output mode. If you need the full result at the end, rely on the SDK's accumulation instead of a manual buffer.
| @Slf4j | ||
| public class HttpSpeechSynthesizer { | ||
|
|
||
| private final ThreadLocal<ByteBuffer> accumulatedAudioData = new ThreadLocal<>(); |
There was a problem hiding this comment.
The use of ThreadLocal<ByteBuffer> to store accumulated audio data poses a significant memory leak risk, especially in environments using thread pools (e.g., web servers). Audio data can be large, and these buffers will remain attached to the threads until the next synthesis call occurs on the same thread. Additionally, since getAccumulatedAudioData() is public, it may lead to confusion if called from a different thread than the one that performed the synthesis, as it would return null. Since callAndReturnAudio already returns the ByteBuffer and streamCall uses a callback, this per-thread state is largely redundant. Consider removing it and providing the final result via the callback or letting the user handle accumulation.
| connectLatch.set(new CountDownLatch(1)); | ||
| connectLatch.get().await(); | ||
| if (!connectLatch.get().await(60, TimeUnit.SECONDS)) { | ||
| throw new RuntimeException("Connection timed out after 60 seconds"); |
There was a problem hiding this comment.
| * | ||
| * @param item item pass to server | ||
| */ | ||
| public void createItem(JsonObject item) { |
There was a problem hiding this comment.
The item parameter should be validated for nullity before processing. Passing a null item to the protocol event could lead to unexpected server-side errors or NullPointerException during serialization.
public void createItem(JsonObject item) {
if (item == null) {
throw new IllegalArgumentException("item cannot be null");
}
checkStatus();| connectLatch.set(new CountDownLatch(1)); | ||
| connectLatch.get().await(); | ||
| if (!connectLatch.get().await(60, TimeUnit.SECONDS)) { | ||
| throw new RuntimeException("Connection timed out after 60 seconds"); |
There was a problem hiding this comment.
No description provided.