A simple Proof of Concept (PoC) application that records audio from the microphone in a web front-end, sends it to a Fastapi API, and transcribes it using OpenAI's Whisper model.
- Record audio directly from the browser
- Send recorded audio to a Fastapi backend
- Use OpenAI Whisper for speech-to-text transcription
- Display the transcribed text in the UI
- Frontend: JavaScript (Vanilla JS) + HTML + CSS
- Backend: Python + Fastapi
- Transcription Model: OpenAI Whisper
- Audio Processing: ffmpeg
Make sure you have Python 3.8+ installed.
git clone https://github.com/abonvalle/PoC-OpenAI-Whisper.git
cd PoC-OpenAI-Whisperpython -m venv .poc_openai_whisper_env
source .poc_openai_whisper_env/bin/activate # On Windows: .poc_openai_whisper_env\Scripts\activatepip install -r requirements.txtLinux (Ubuntu/Debian):
sudo apt install ffmpegmacOS:
brew install ffmpegWindows (Chocolatey):
choco install ffmpegfastapi dev app.py --port 5000The API should now be running at: http://localhost:5000
Use the live server extension https://marketplace.visualstudio.com/items?itemName=ritwickdey.LiveServer (recommended) or open index.html in a browser.
POST /transcribe- Content-Type: multipart/form-data
- Body:
- file: The recorded audio file (WebM/WAV)
{
"message": "Success",
"transcription": "Hello, this is a test transcription."
}/PoC-OpenAI-Whisper
│── index.html # Main UI
│── script.js # Handles recording & API calls
│── app.py # Fastapi application
└── requirements.txt # Python dependenciesfastapi dev app.py --port 5000;