Skip to content

Commit 25a6ed9

Browse files
author
odrec
committed
update README to better reflect current state
1 parent d9964a8 commit 25a6ed9

1 file changed

Lines changed: 45 additions & 40 deletions

File tree

readme.md

Lines changed: 45 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -1,87 +1,92 @@
1-
# Whisperx API Wrapper
1+
# WhisperX API with Asynchronous Transcription
22

3-
An API Wrapper for [Whisperx Library](https://github.com/m-bain/whisperX)
3+
A FastAPI-based web API for asynchronous audio and video transcription using [WhisperX](https://github.com/m-bain/whisperX).
44

55
## Overview
66

7-
This is a FastAPI application that provides an endpoint for video/audio transcription using the `whisperx` command. The application supports multiple audio and video formats. It performs the transcription, alignment, and diarization of the uploaded media files.
7+
This project provides an API to upload media files and receive transcriptions, including alignment and speaker diarization. It leverages Celery task queues and RabbitMQ to handle transcription jobs asynchronously, allowing the API to remain responsive while processing resource-intensive tasks in the background.
88

99
## Features
1010

11-
- User Authentication with JWT
11+
- Asynchronous transcription processing with Celery
12+
- RabbitMQ message broker integration
1213
- Support for multiple audio and video formats
13-
- Diarization support
14+
- Speaker diarization support
1415
- Customizable language and model settings
16+
- Built-in logging
17+
- Job status tracking via API
1518

1619
## Requirements
1720

18-
- whisperx
1921
- Python 3.8+
22+
- [WhisperX](https://github.com/m-bain/whisperX)
2023
- FastAPI
2124
- ffmpeg
22-
- SQLite
23-
- pyjwt
24-
- dotenv
25+
- SQLite (for internal use, not user management)
26+
- python-dotenv
27+
- Celery
28+
- RabbitMQ server
2529

26-
Follow the instructions on how to install Whisperx [in the official repository](https://github.com/m-bain/whisperX#3-install-this-repo)
27-
You can install these dependencies using the `requirements.txt` file:
30+
### Installing dependencies
31+
32+
Follow the WhisperX installation instructions: [WhisperX repo](https://github.com/m-bain/whisperX#3-install-this-repo)
33+
34+
Then install Python dependencies:
2835

2936
```bash
3037
pip install -r requirements.txt
3138
```
3239

40+
### RabbitMQ installation
41+
42+
RabbitMQ is required as the message broker for Celery. On Ubuntu, install it via:
43+
44+
```bash
45+
sudo apt-get update
46+
sudo apt-get install rabbitmq-server -y
47+
sudo systemctl enable --now rabbitmq-server
48+
```
49+
50+
Ensure RabbitMQ is running before starting the application.
51+
3352
## Environment Variables
3453

35-
Create a `.env` file in your root directory and add the following variables:
54+
Create a `.env` file in your project root with:
3655

3756
```env
38-
SECRET_KEY=your_secret_key
39-
MASTER_KEY=your_master_key
4057
HUGGING_FACE_TOKEN=your_hugging_face_token
4158
API_PORT=11300
4259
```
4360

44-
## Database Setup
45-
46-
SQLite is used for storing user information. The database is created automatically when the application runs.
47-
4861
## Running the Application
4962

50-
Run the application using:
63+
### 1. Start the FastAPI server
5164

5265
```bash
53-
python api_whisperx.py
66+
python start.py
5467
```
5568

56-
Replace `main` with the name of your Python file if it's not `main.py`.
69+
This launches the API server (default on port 11300).
5770

58-
## API Endpoints
5971

60-
### POST `/auth`
72+
## API Endpoints
6173

62-
Authenticate a user and return a JWT token.
74+
### POST `/jobs`
6375

64-
- `username`: The username of the user.
65-
- `password`: The password of the user.
76+
Submit a new transcription job with an uploaded media file.
6677

67-
### POST `/create_user`
78+
### GET `/jobs`
6879

69-
Create a new user.
80+
List all submitted transcription jobs.
7081

71-
- `username`: Desired username.
72-
- `password`: Desired password.
73-
- `master_key`: Master key for authorized user creation.
82+
### GET `/jobs/{task_id}`
7483

75-
### POST `/whisperx-transcribe/`
84+
Get the status and result of a specific transcription job.
7685

77-
Transcribe an uploaded audio or video file.
86+
## Logging
7887

79-
- `file`: The audio or video file to transcribe.
80-
- `lang`: Language for transcription (default is "pt").
81-
- `model`: Model to use for transcription (default is "large-v2").
82-
- `min_speakers`: Minimum number of speakers for diarization (default is 1).
83-
- `max_speakers`: Maximum number of speakers for diarization (default is 2).
88+
The application logs key events and errors during API requests and background task processing.
8489

85-
## Logging
90+
## Summary
8691

87-
The application has built-in logging that informs about the steps being performed and any errors that occur.
92+
This project provides a scalable, asynchronous API for audio/video transcription using WhisperX, with support for speaker diarization and job status tracking.

0 commit comments

Comments
 (0)