add http sse cosyvocie tts api and omni createItem function#202
add http sse cosyvocie tts api and omni createItem function#202songguocola wants to merge 2 commits into
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces HTTP-based text-to-speech synthesis support through the new HttpSpeechSynthesizer class and provides comprehensive usage examples for both HTTP TTS and Qwen3 Omni tool calling. Key updates include the addition of metadata classes for audio results, enhanced WebSocket connection management with explicit timeouts, and the implementation of conversation item creation for real-time interactions. Review feedback highlights several critical improvement areas: addressing thread-safety concerns in the synthesizer, correcting a type mismatch for a protocol constant, and properly handling connection timeouts by checking await return values. Furthermore, recommendations were provided to refine exception handling to maintain specific error types and to refactor redundant logic for parsing usage statistics.
| * @author songsong.sss | ||
| */ | ||
| @Slf4j | ||
| public class HttpSpeechSynthesizer { |
There was a problem hiding this comment.
The HttpSpeechSynthesizer class is not thread-safe. It maintains state in fields like accumulatedAudioData and modifies the shared serviceOption object (e.g., calling setIsSSE) during method calls. If multiple threads use the same instance concurrently, it will lead to race conditions and unpredictable behavior. Consider making the class stateless by moving request-specific data to local variables or using a per-request configuration.
| public static final String PROTOCOL_EVENT_TYPE_CREATE_RESPONSE = "response.create"; | ||
| public static final String PROTOCOL_EVENT_TYPE_CANCEL_RESPONSE = "response.cancel"; | ||
| public static final String PROTOCOL_EVENT_TYPE_FINISH_SESSION = "session.finish"; | ||
| public static final Object PROTOCOL_EVENT_TYPE_ITEM_CREATE = "conversation.item.create"; |
There was a problem hiding this comment.
The constant PROTOCOL_EVENT_TYPE_ITEM_CREATE is declared as Object, while all other protocol constants in this class are String. This inconsistency can lead to type mismatches and is likely a typo.
| public static final Object PROTOCOL_EVENT_TYPE_ITEM_CREATE = "conversation.item.create"; | |
| public static final String PROTOCOL_EVENT_TYPE_ITEM_CREATE = "conversation.item.create"; |
| websocktetClient = client.newWebSocket(request, this); | ||
| connectLatch.set(new CountDownLatch(1)); | ||
| connectLatch.get().await(); | ||
| connectLatch.get().await(60, java.util.concurrent.TimeUnit.SECONDS); |
There was a problem hiding this comment.
The return value of await is not checked. If the connection attempt times out after 60 seconds, the method will proceed as if it succeeded, which can lead to subsequent operations failing silently or throwing unexpected errors. You should verify if the connection was successful.
if (!connectLatch.get().await(60, java.util.concurrent.TimeUnit.SECONDS)) {
throw new RuntimeException("Connection timed out after 60 seconds");
}| websocktetClient = client.newWebSocket(request, this); | ||
| connectLatch.set(new CountDownLatch(1)); | ||
| connectLatch.get().await(); | ||
| connectLatch.get().await(60, TimeUnit.SECONDS); |
There was a problem hiding this comment.
The result of the await call is ignored. If the connection fails to establish within the 60-second timeout, the client will proceed in an uninitialized state. It is safer to check the result and throw an exception on timeout.
if (!connectLatch.get().await(60, TimeUnit.SECONDS)) {
throw new RuntimeException("Connection timed out after 60 seconds");
}| } catch (Exception e) { | ||
| log.error("Speech synthesis failed", e); | ||
| throw new ApiException(e); | ||
| } |
There was a problem hiding this comment.
Catching a broad Exception and wrapping it in a new ApiException loses the specific type of checked exceptions like NoApiKeyException or InputRequiredException that are declared in the method signature. These should be caught specifically and rethrown to preserve the API contract.
} catch (ApiException | NoApiKeyException | InputRequiredException e) {
throw e;
} catch (Exception e) {
log.error("Speech synthesis failed", e);
throw new ApiException(e);
}| if (dashScopeResult.getUsage() != null) { | ||
| try { | ||
| SpeechSynthesisUsage usage = | ||
| JsonUtils.fromJsonObject( | ||
| dashScopeResult.getUsage().getAsJsonObject(), SpeechSynthesisUsage.class); | ||
| result.setUsage(usage); | ||
| } catch (Exception e) { | ||
| log.debug("Failed to parse usage information", e); | ||
| } | ||
| } |
No description provided.