Skip to content

asimbhdr96/node-whispercpp-stt-server

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Local STT server with whisper.cpp + Express (+ ngrok)

What this provides

  • A POST /transcribe endpoint (multipart/form-data, field audio).
  • Uses open-source whisper.cpp for transcription (spawned via Node.js child_process).
  • Converts any accepted input to 16kHz mono WAV using ffmpeg.
  • Temporary upload files are cleaned up after each request.
  • Optional ngrok tunnel to expose the local server over HTTPS.

Requirements

  • Node.js 18+
  • Git
  • CMake and a C/C++ toolchain
    • Windows: install CMake + MSVC (Visual Studio Build Tools) or use MinGW
  • ffmpeg available on PATH (for audio conversion)
    • Windows (Chocolatey): choco install ffmpeg

Install & setup

# from the project root
npm install

# clone & build whisper.cpp and download the base English model
npm run setup:whisper

Manual alternative:

git clone https://github.com/ggml-org/whisper.cpp.git
cd whisper.cpp
sh ./models/download-ggml-model.sh base.en
cmake -B build
cmake --build build -j --config Release

Start the server

# from project root
npm start
# or, for auto-reload
npm run dev

You should see:

[server] Listening on http://localhost:3000
[server] POST /transcribe with multipart/form-data field "audio"

Test the API

Accepted types: mp3, wav, m4a, ogg, webm, mp4.

curl -X POST -F "audio=@path/to/audio.mp3" http://localhost:3000/transcribe

PowerShell:

curl.exe -X POST http://localhost:3000/transcribe -F "audio=@C:\path\to\audio.mp3"

Example response:

{ "text": "hello world", "elapsedMs": 1234 }

Expose with ngrok (HTTPS)

Open a second terminal while the server is running.

# one-time: set your authtoken (or put it in .env as NGROK_AUTHTOKEN)
ngrok config add-authtoken <YOUR_TOKEN>

# start a tunnel to PORT (defaults to 3000)
npm run ngrok

You’ll see something like:

[ngrok] Public URL: https://abcd-12-34-56-78.ngrok-free.app
[ngrok] Example: https://abcd-12-34-56-78.ngrok-free.app/transcribe

Use that HTTPS URL in your mobile app or any external client.

Configuration (.env)

Copy .env.example to .env (or create manually) and adjust as needed:

  • PORT: server port (default: 3000)
  • WHISPER_BINARY_PATH: absolute or relative path to whisper-cli (default: ./whisper.cpp/build/bin/[Release/]whisper-cli[.exe])
  • WHISPER_MODEL: path to the model file (default: ./whisper.cpp/models/ggml-base.en.bin)
  • NGROK_AUTHTOKEN: ngrok auth token (optional if already configured globally)
  • NGROK_REGION: ngrok region (e.g., us, eu)

Notes:

  • The server captures transcription from whisper.cpp stdout; it does not persist transcript files.
  • Uploads are stored temporarily in uploads/ and cleaned up after each request.

Troubleshooting

  • ffmpeg not found: ensure ffmpeg -version works in your terminal and that it’s on PATH.
  • whisper binary not found: build via npm run setup:whisper, or set WHISPER_BINARY_PATH to your whisper-cli.
  • model not found: ensure ./whisper.cpp/models/ggml-base.en.bin exists or set WHISPER_MODEL.
  • ngrok not authorized: run ngrok config add-authtoken <YOUR_TOKEN> or set NGROK_AUTHTOKEN.

Scripts

  • npm start: start the server.
  • npm run dev: start with auto-reload (nodemon).
  • npm run setup:whisper: clone/build whisper.cpp and download the base model.
  • npm run ngrok: start an ngrok tunnel and print the public HTTPS URL (CLI-based).

About

Speech-to-text HTTP API powered by whisper.cpp (via Node/Express), with ffmpeg conversion and ngrok tunneling.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors