- A
POST /transcribeendpoint (multipart/form-data, fieldaudio). - Uses open-source
whisper.cppfor transcription (spawned via Node.jschild_process). - Converts any accepted input to 16kHz mono WAV using
ffmpeg. - Temporary upload files are cleaned up after each request.
- Optional ngrok tunnel to expose the local server over HTTPS.
- Node.js 18+
- Git
- CMake and a C/C++ toolchain
- Windows: install CMake + MSVC (Visual Studio Build Tools) or use MinGW
- ffmpeg available on PATH (for audio conversion)
- Windows (Chocolatey):
choco install ffmpeg
- Windows (Chocolatey):
# from the project root
npm install
# clone & build whisper.cpp and download the base English model
npm run setup:whisperManual alternative:
git clone https://github.com/ggml-org/whisper.cpp.git
cd whisper.cpp
sh ./models/download-ggml-model.sh base.en
cmake -B build
cmake --build build -j --config Release# from project root
npm start
# or, for auto-reload
npm run devYou should see:
[server] Listening on http://localhost:3000
[server] POST /transcribe with multipart/form-data field "audio"
Accepted types: mp3, wav, m4a, ogg, webm, mp4.
curl -X POST -F "audio=@path/to/audio.mp3" http://localhost:3000/transcribePowerShell:
curl.exe -X POST http://localhost:3000/transcribe -F "audio=@C:\path\to\audio.mp3"Example response:
{ "text": "hello world", "elapsedMs": 1234 }Open a second terminal while the server is running.
# one-time: set your authtoken (or put it in .env as NGROK_AUTHTOKEN)
ngrok config add-authtoken <YOUR_TOKEN>
# start a tunnel to PORT (defaults to 3000)
npm run ngrokYou’ll see something like:
[ngrok] Public URL: https://abcd-12-34-56-78.ngrok-free.app
[ngrok] Example: https://abcd-12-34-56-78.ngrok-free.app/transcribe
Use that HTTPS URL in your mobile app or any external client.
Copy .env.example to .env (or create manually) and adjust as needed:
PORT: server port (default: 3000)WHISPER_BINARY_PATH: absolute or relative path towhisper-cli(default:./whisper.cpp/build/bin/[Release/]whisper-cli[.exe])WHISPER_MODEL: path to the model file (default:./whisper.cpp/models/ggml-base.en.bin)NGROK_AUTHTOKEN: ngrok auth token (optional if already configured globally)NGROK_REGION: ngrok region (e.g.,us,eu)
Notes:
- The server captures transcription from
whisper.cppstdout; it does not persist transcript files. - Uploads are stored temporarily in
uploads/and cleaned up after each request.
- ffmpeg not found: ensure
ffmpeg -versionworks in your terminal and that it’s on PATH. - whisper binary not found: build via
npm run setup:whisper, or setWHISPER_BINARY_PATHto yourwhisper-cli. - model not found: ensure
./whisper.cpp/models/ggml-base.en.binexists or setWHISPER_MODEL. - ngrok not authorized: run
ngrok config add-authtoken <YOUR_TOKEN>or setNGROK_AUTHTOKEN.
npm start: start the server.npm run dev: start with auto-reload (nodemon).npm run setup:whisper: clone/buildwhisper.cppand download the base model.npm run ngrok: start an ngrok tunnel and print the public HTTPS URL (CLI-based).