|
1 | | -# Whisperx API Wrapper |
| 1 | +# WhisperX API with Asynchronous Transcription |
2 | 2 |
|
3 | | -An API Wrapper for [Whisperx Library](https://github.com/m-bain/whisperX) |
| 3 | +A FastAPI-based web API for asynchronous audio and video transcription using [WhisperX](https://github.com/m-bain/whisperX). |
4 | 4 |
|
5 | 5 | ## Overview |
6 | 6 |
|
7 | | -This is a FastAPI application that provides an endpoint for video/audio transcription using the `whisperx` command. The application supports multiple audio and video formats. It performs the transcription, alignment, and diarization of the uploaded media files. |
| 7 | +This project provides an API to upload media files and receive transcriptions, including alignment and speaker diarization. It leverages Celery task queues and RabbitMQ to handle transcription jobs asynchronously, allowing the API to remain responsive while processing resource-intensive tasks in the background. |
8 | 8 |
|
9 | 9 | ## Features |
10 | 10 |
|
11 | | -- User Authentication with JWT |
| 11 | +- Asynchronous transcription processing with Celery |
| 12 | +- RabbitMQ message broker integration |
12 | 13 | - Support for multiple audio and video formats |
13 | | -- Diarization support |
| 14 | +- Speaker diarization support |
14 | 15 | - Customizable language and model settings |
| 16 | +- Built-in logging |
| 17 | +- Job status tracking via API |
15 | 18 |
|
16 | 19 | ## Requirements |
17 | 20 |
|
18 | | -- whisperx |
19 | 21 | - Python 3.8+ |
| 22 | +- [WhisperX](https://github.com/m-bain/whisperX) |
20 | 23 | - FastAPI |
21 | 24 | - ffmpeg |
22 | | -- SQLite |
23 | | -- pyjwt |
24 | | -- dotenv |
| 25 | +- SQLite (for internal use, not user management) |
| 26 | +- python-dotenv |
| 27 | +- Celery |
| 28 | +- RabbitMQ server |
25 | 29 |
|
26 | | -Follow the instructions on how to install Whisperx [in the official repository](https://github.com/m-bain/whisperX#3-install-this-repo) |
27 | | -You can install these dependencies using the `requirements.txt` file: |
| 30 | +### Installing dependencies |
| 31 | + |
| 32 | +Follow the WhisperX installation instructions: [WhisperX repo](https://github.com/m-bain/whisperX#3-install-this-repo) |
| 33 | + |
| 34 | +Then install Python dependencies: |
28 | 35 |
|
29 | 36 | ```bash |
30 | 37 | pip install -r requirements.txt |
31 | 38 | ``` |
32 | 39 |
|
| 40 | +### RabbitMQ installation |
| 41 | + |
| 42 | +RabbitMQ is required as the message broker for Celery. On Ubuntu, install it via: |
| 43 | + |
| 44 | +```bash |
| 45 | +sudo apt-get update |
| 46 | +sudo apt-get install rabbitmq-server -y |
| 47 | +sudo systemctl enable --now rabbitmq-server |
| 48 | +``` |
| 49 | + |
| 50 | +Ensure RabbitMQ is running before starting the application. |
| 51 | + |
33 | 52 | ## Environment Variables |
34 | 53 |
|
35 | | -Create a `.env` file in your root directory and add the following variables: |
| 54 | +Create a `.env` file in your project root with: |
36 | 55 |
|
37 | 56 | ```env |
38 | | -SECRET_KEY=your_secret_key |
39 | | -MASTER_KEY=your_master_key |
40 | 57 | HUGGING_FACE_TOKEN=your_hugging_face_token |
41 | 58 | API_PORT=11300 |
42 | 59 | ``` |
43 | 60 |
|
44 | | -## Database Setup |
45 | | - |
46 | | -SQLite is used for storing user information. The database is created automatically when the application runs. |
47 | | - |
48 | 61 | ## Running the Application |
49 | 62 |
|
50 | | -Run the application using: |
| 63 | +### 1. Start the FastAPI server |
51 | 64 |
|
52 | 65 | ```bash |
53 | | -python api_whisperx.py |
| 66 | +python start.py |
54 | 67 | ``` |
55 | 68 |
|
56 | | -Replace `main` with the name of your Python file if it's not `main.py`. |
| 69 | +This launches the API server (default on port 11300). |
57 | 70 |
|
58 | | -## API Endpoints |
59 | 71 |
|
60 | | -### POST `/auth` |
| 72 | +## API Endpoints |
61 | 73 |
|
62 | | -Authenticate a user and return a JWT token. |
| 74 | +### POST `/jobs` |
63 | 75 |
|
64 | | -- `username`: The username of the user. |
65 | | -- `password`: The password of the user. |
| 76 | +Submit a new transcription job with an uploaded media file. |
66 | 77 |
|
67 | | -### POST `/create_user` |
| 78 | +### GET `/jobs` |
68 | 79 |
|
69 | | -Create a new user. |
| 80 | +List all submitted transcription jobs. |
70 | 81 |
|
71 | | -- `username`: Desired username. |
72 | | -- `password`: Desired password. |
73 | | -- `master_key`: Master key for authorized user creation. |
| 82 | +### GET `/jobs/{task_id}` |
74 | 83 |
|
75 | | -### POST `/whisperx-transcribe/` |
| 84 | +Get the status and result of a specific transcription job. |
76 | 85 |
|
77 | | -Transcribe an uploaded audio or video file. |
| 86 | +## Logging |
78 | 87 |
|
79 | | -- `file`: The audio or video file to transcribe. |
80 | | -- `lang`: Language for transcription (default is "pt"). |
81 | | -- `model`: Model to use for transcription (default is "large-v2"). |
82 | | -- `min_speakers`: Minimum number of speakers for diarization (default is 1). |
83 | | -- `max_speakers`: Maximum number of speakers for diarization (default is 2). |
| 88 | +The application logs key events and errors during API requests and background task processing. |
84 | 89 |
|
85 | | -## Logging |
| 90 | +## Summary |
86 | 91 |
|
87 | | -The application has built-in logging that informs about the steps being performed and any errors that occur. |
| 92 | +This project provides a scalable, asynchronous API for audio/video transcription using WhisperX, with support for speaker diarization and job status tracking. |
0 commit comments