Skip to content

Commit cfed655

Browse files
agalyanmannBearclawclaude
authored
Add CLAUDE.md for Claude Code context (#187)
Co-authored-by: Bearclaw <bearclaw@assemblyai.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 9b28d9d commit cfed655

2 files changed

Lines changed: 120 additions & 0 deletions

File tree

CLAUDE.md

Lines changed: 116 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,116 @@
1+
# AssemblyAI Python SDK
2+
3+
Speech-to-text and audio intelligence SDK. Supports pre-recorded transcription, real-time streaming, and audio analysis features.
4+
5+
## Quick start
6+
7+
```bash
8+
pip install -U assemblyai
9+
```
10+
11+
```python
12+
import os
13+
import assemblyai as aai
14+
15+
aai.settings.api_key = os.environ["ASSEMBLYAI_API_KEY"]
16+
17+
transcript = aai.Transcriber().transcribe(
18+
"https://example.com/audio.mp3",
19+
config=aai.TranscriptionConfig(
20+
speech_models=["universal-3-pro", "universal-2"],
21+
speaker_labels=True,
22+
),
23+
)
24+
25+
print(transcript.text)
26+
for utterance in transcript.utterances:
27+
print(f"Speaker {utterance.speaker}: {utterance.text}")
28+
```
29+
30+
## Auth
31+
32+
Set `ASSEMBLYAI_API_KEY` as an environment variable, or:
33+
34+
```python
35+
aai.settings.api_key = "your-key"
36+
```
37+
38+
## Key classes
39+
40+
- `aai.Transcriber` — Transcribe files, URLs, or streams. Methods: `transcribe()`, `transcribe_async()`, `submit()`, `list_transcripts()`
41+
- `aai.TranscriptionConfig` — All transcription options: `speech_models`, `speaker_labels`, `sentiment_analysis`, `entity_detection`, `auto_chapters`, `content_safety`, `language_detection`, `summarization`, `word_boost`, `disfluencies`
42+
- `aai.Transcript` — Result object with `.text`, `.status`, `.utterances`, `.words`, `.chapters`, `.entities`, `.sentiment_analysis`. Methods: `get_sentences()`, `get_paragraphs()`, `export_subtitles_srt()`, `export_subtitles_vtt()`
43+
- `assemblyai.streaming.v3.StreamingClient` — Real-time streaming with event-based API
44+
45+
## Common patterns
46+
47+
**Transcribe a local file:**
48+
```python
49+
transcript = aai.Transcriber().transcribe("./recording.mp3")
50+
```
51+
52+
**With multiple features:**
53+
```python
54+
config = aai.TranscriptionConfig(
55+
speech_models=["universal-3-pro", "universal-2"],
56+
speaker_labels=True,
57+
sentiment_analysis=True,
58+
entity_detection=True,
59+
auto_chapters=True,
60+
language_detection=True,
61+
)
62+
transcript = aai.Transcriber().transcribe(audio_url, config=config)
63+
```
64+
65+
**PII redaction** (uses setter, not constructor):
66+
```python
67+
config = aai.TranscriptionConfig()
68+
config.set_redact_pii(
69+
policies=[aai.PIIRedactionPolicy.email_address, aai.PIIRedactionPolicy.phone_number],
70+
substitution=aai.PIISubstitutionPolicy.hash,
71+
)
72+
```
73+
74+
**Streaming v3:**
75+
```python
76+
from assemblyai.streaming.v3 import (
77+
StreamingClient, StreamingClientOptions,
78+
StreamingParameters, StreamingEvents,
79+
)
80+
81+
client = StreamingClient(StreamingClientOptions(
82+
api_key=os.environ["ASSEMBLYAI_API_KEY"],
83+
api_host="streaming.assemblyai.com",
84+
))
85+
client.on(StreamingEvents.Turn, lambda turn: print(turn.text))
86+
client.connect(StreamingParameters(
87+
sample_rate=16000,
88+
speech_model="u3-rt-pro",
89+
))
90+
```
91+
92+
**Retrieve existing transcript:**
93+
```python
94+
transcript = aai.Transcript.get_by_id("transcript-id")
95+
```
96+
97+
## Important gotchas
98+
99+
- **Always check status**: `if transcript.status == aai.TranscriptStatus.error` — accessing `.text` on a failed transcript returns None, not an exception
100+
- **`speech_models` takes a list** with fallback ordering: `["universal-3-pro", "universal-2"]`
101+
- **PII redaction uses `set_redact_pii()`**, not a constructor parameter
102+
- **Streaming v3 is a separate module**: `assemblyai.streaming.v3`, not the legacy `RealtimeTranscriber`
103+
- **Microphone streaming needs extras**: `pip install "assemblyai[extras]"` for `pyaudio`
104+
- **`transcribe_async()` returns a `concurrent.futures.Future`**, not an asyncio coroutine
105+
- **Timestamps are in milliseconds** throughout the SDK
106+
- **Minimum Python**: 3.8+
107+
108+
## Dependencies
109+
110+
`httpx`, `pydantic`, `typing-extensions`, `websockets`. Optional: `pyaudio` via `[extras]`.
111+
112+
## Docs
113+
114+
- [Full documentation](https://www.assemblyai.com/docs)
115+
- [API reference](https://www.assemblyai.com/docs/api-reference)
116+
- [llms-full.txt](https://www.assemblyai.com/docs/llms-full.txt?lang=python) (Python-filtered docs for LLMs)

README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -928,3 +928,7 @@ transcript_group = aai.TranscriptGroup.get_by_ids(["<TRANSCRIPT_ID_1>", "<TRANSC
928928
Both `Transcript.get_by_id` and `TranscriptGroup.get_by_ids` have asynchronous counterparts, `Transcript.get_by_id_async` and `TranscriptGroup.get_by_ids_async`, respectively. These functions immediately return a `Future` object, rather than blocking until the transcript(s) are retrieved.
929929

930930
See the above section on [Synchronous vs Asynchronous](#synchronous-vs-asynchronous) for more information.
931+
932+
## Claude Code
933+
934+
This repository includes a [`CLAUDE.md`](CLAUDE.md) file that provides context to [Claude Code](https://docs.anthropic.com/en/docs/claude-code) about this SDK — key classes, common patterns, and gotchas. When you open this repo in Claude Code, it automatically reads this file to give better assistance.

0 commit comments

Comments
 (0)