Speech-to-text and audio intelligence SDK for Node.js, Deno, and Bun. Supports pre-recorded transcription, real-time streaming, and audio analysis features.
npm install assemblyaiimport { AssemblyAI } from "assemblyai";
const client = new AssemblyAI({
apiKey: process.env.ASSEMBLYAI_API_KEY,
});
const transcript = await client.transcripts.transcribe({
audio: "https://example.com/audio.mp3",
speech_models: ["universal-3-pro", "universal-2"],
speaker_labels: true,
});
console.log(transcript.text);
for (const utterance of transcript.utterances) {
console.log(`Speaker ${utterance.speaker}: ${utterance.text}`);
}const client = new AssemblyAI({
apiKey: process.env.ASSEMBLYAI_API_KEY,
});client.transcripts.transcribe(params)— Transcribe and poll until completeclient.transcripts.submit(params)— Submit without waiting (fire-and-forget)client.transcripts.get(id)— Retrieve a transcript by IDclient.transcripts.list()— List transcripts with paginationclient.transcripts.delete(id)— Delete a transcriptclient.streaming.transcriber(params)— Create a real-time streaming session
Transcribe a local file:
const transcript = await client.transcripts.transcribe({
audio: "./recording.mp3",
});With multiple features:
const transcript = await client.transcripts.transcribe({
audio: audioUrl,
speech_models: ["universal-3-pro", "universal-2"],
speaker_labels: true,
sentiment_analysis: true,
entity_detection: true,
auto_chapters: true,
language_detection: true,
});Streaming:
const transcriber = client.streaming.transcriber({
speechModel: "u3-rt-pro",
sampleRate: 16_000,
});
transcriber.on("turn", (turn) => {
console.log(turn.text);
});
await transcriber.connect();
// Send audio chunks: transcriber.sendAudio(chunk)
await transcriber.close();Subtitles:
const srt = await client.transcripts.subtitles(id, "srt");
const vtt = await client.transcripts.subtitles(id, "vtt");.transcribe()polls until complete — use.submit()for fire-and-forgetspeech_modelstakes an array with fallback ordering:["universal-3-pro", "universal-2"]- Streaming uses
u3-rt-proas the speech model - Never expose API keys client-side — use temporary auth tokens for browser streaming
- Node >= 18 required
- Only runtime dependency:
ws(WebSocket library) - Multi-runtime support: Works in Node.js, Deno, Bun, Cloudflare Workers, and browsers
- Full documentation
- API reference
- llms-full.txt (TypeScript-filtered docs for LLMs)