-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathllms.txt
More file actions
147 lines (101 loc) · 3.54 KB
/
llms.txt
File metadata and controls
147 lines (101 loc) · 3.54 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
# Kortexa TTS Server
Small OpenAI-compatible text-to-speech server for macOS Apple Silicon using MLX-Audio.
## Primary status
- Primary backend: `mlx-audio`
- Primary model repo: `mlx-community/Qwen3-TTS-12Hz-1.7B-CustomVoice-bf16`
- Public model id: `qwen3-tts-customvoice-1.7b`
- Linux/CUDA parity is still in development
## Public HTTP interface
- `GET /health`
- `GET /v1/models`
- `GET /v1/voices`
- `POST /v1/voices/reload`
- `POST /v1/audio/speech`
## `GET /health`
Returns:
- `status`
- `ready`
- `backend`
- `platform.system`
- `platform.machine`
- `model.id`
- `model.repo`
- `sample_rate`
- `voice_count`
- `default_voice`
- `load_error`
## `GET /v1/models`
OpenAI-style model discovery.
Returns one configured public model id.
## `GET /v1/voices`
Custom voice discovery endpoint.
Top-level fields:
- `object`
- `default_voice`
- `data`
Each voice item includes:
- `id`
- `object`
- `name`
- `model`
- `default`
- `languages`
Use `id` in `POST /v1/audio/speech`.
Voice ids are lowercase stable identifiers and are accepted case-insensitively on input.
## `POST /v1/audio/speech`
Request JSON:
- `model` string, required
- `input` string, required, max 4096 chars
- `voice` string or object `{ "id": "voice_id" }`, required
- `instructions` string, optional
- `response_format` string, optional
- `speed` number, optional, range `0.25` to `4.0`
- `stream_format` string, optional, `audio` or `sse`
Supported `response_format` values:
- `mp3`
- `wav`
- `flac`
- `pcm`
- `aac`
- `opus`
Notes:
- Non-streaming default `response_format` is `mp3`
- Streaming default `response_format` is `pcm`
- Streaming currently supports `response_format="pcm"` only
- Blank `input` is rejected with `400`
## Streaming behavior
`stream_format="audio"`:
- HTTP chunked audio bytes
- best paired with `response_format="pcm"`
`stream_format="sse"`:
- `text/event-stream`
- emits `audio.chunk` messages with base64 PCM
- ends with `audio.done`
- also sets named SSE events: `audio.chunk` and `audio.done`
## `POST /v1/voices/reload`
Re-scans `voices/` directory, loads new custom voices without restart.
Returns `voice_count` and `custom_count`.
## Voice Designer
Standalone tool for creating custom TTS voices using the VoiceDesign model.
Components:
- `scripts/voice_designer.py` — FastAPI server on port 4010, loads VoiceDesign model
- `client/` — Vite React app for auditioning and saving voices
- `voices/` — saved `.wav` files, loaded by main server as additional voices
Start both: `./design.sh`
Voice Designer API (port 4010):
- `POST /generate` — `{ instruct, text }` → audio sample
- `POST /save` — `{ name, audio_b64 }` → saves to `voices/{name}.wav`
- `GET /voices` — list saved voices
- `GET /voices/{name}/audio` — serve saved voice audio
- `DELETE /voices/{name}` — delete a voice
Custom voices are stored as `.wav` files. At synthesis time, the speaker embedding is extracted from the saved audio (~50ms) and injected into the generation pipeline. Custom voice names are case-insensitive in the API.
Each voice item in `GET /v1/voices` includes a `custom` boolean field indicating whether it is a built-in or custom voice.
## Setup notes
- Run `./setup.sh`
- macOS Apple Silicon installs MLX-Audio from GitHub and `ffmpeg`
- Linux installs CUDA-side Python deps and `ffmpeg`, but the new API path is not ready there yet
- OpenAPI is available at `/openapi.json` and `/docs`
## Smoke test
- `node tests/test.js --list-voices`
- `node tests/test.js --voice aiden --format wav`
- `node tests/test.js --voice aiden --stream --out tests/output/stream.pcm`