Integration: Haystack Audio Transcription Pipeline
What this should show
A Python example demonstrating how to use Deepgram as a custom Haystack 2.x component for audio transcription in a RAG pipeline. The example should:
- Create a custom Haystack
@component that accepts audio file paths or URLs
- Transcribe audio via Deepgram Pre-recorded STT (Nova-3)
- Output Haystack
Document objects with transcript text + metadata (speaker labels, timestamps, confidence)
- Include a retrieval demo using an in-memory document store
- Support both single-file and batch audio transcription
Credentials likely needed
Original request:
What to build
A working example demonstrating Deepgram as a Haystack component for audio transcription in a RAG pipeline — loading audio files, transcribing with Deepgram STT, and feeding transcripts into a Haystack retrieval pipeline.
Why this matters
Haystack (by deepset) is a leading enterprise NLP/RAG framework used by teams building production search and retrieval systems. Developers building audio-aware RAG pipelines need a reference integration showing how to use Deepgram as an audio ingestion source. There is currently no example of Deepgram + Haystack working together, despite Haystack's growing adoption for enterprise AI.
Suggested scope
- Language: Python
- Framework: Haystack 2.x (
haystack-ai package)
- Deepgram APIs: Pre-recorded STT (Nova-3)
- What it does: Custom Haystack
@component that accepts audio file paths or URLs, transcribes via Deepgram, and outputs Haystack Document objects with transcript text + metadata (speaker labels, timestamps, confidence)
- Includes: Pipeline YAML config, example audio file, retrieval demo with an in-memory document store
- Complexity: Medium — single Python file + pipeline config
Acceptance criteria
Raised by the DX intelligence system.
Integration: Haystack Audio Transcription Pipeline
What this should show
A Python example demonstrating how to use Deepgram as a custom Haystack 2.x component for audio transcription in a RAG pipeline. The example should:
@componentthat accepts audio file paths or URLsDocumentobjects with transcript text + metadata (speaker labels, timestamps, confidence)Credentials likely needed
DEEPGRAM_API_KEYOriginal request:
What to build
A working example demonstrating Deepgram as a Haystack component for audio transcription in a RAG pipeline — loading audio files, transcribing with Deepgram STT, and feeding transcripts into a Haystack retrieval pipeline.
Why this matters
Haystack (by deepset) is a leading enterprise NLP/RAG framework used by teams building production search and retrieval systems. Developers building audio-aware RAG pipelines need a reference integration showing how to use Deepgram as an audio ingestion source. There is currently no example of Deepgram + Haystack working together, despite Haystack's growing adoption for enterprise AI.
Suggested scope
haystack-aipackage)@componentthat accepts audio file paths or URLs, transcribes via Deepgram, and outputs HaystackDocumentobjects with transcript text + metadata (speaker labels, timestamps, confidence)Acceptance criteria
Raised by the DX intelligence system.