Local Transcribe

Important

Incubation Project: This project is in active development and a work in progress.

Local Transcribe is an application that is designed to simplify the transcription and minuting of meetings in the public sector. Built with modern web technologies and AI-powered transcription and summarisation services, Local Transcribe transforms how government organisations handle meeting documentation by automating the conversion of audio recordings into structured, professional minutes.

Key Features

AI-Powered Transcription: Local Transcribe integrates with multiple transcription services including Azure Speech-to-Text and AWS Transcribe, automatically selecting the most appropriate service based on audio duration and quality. The system handles various audio formats and automatically converts them to optimize transcription accuracy.

Professional Meeting Templates: The application includes specialized templates tailored for different types of government meetings, including Cabinet meetings, planning committees, care assessments, and general-purpose meetings. Each template follows specific formatting standards and style guides required for official documentation.

Intelligent Minute Generation: Beyond simple transcription, Local Transcribe uses AI to structure conversations into professional minute formats, applying proper grammar, tense conversion, and formatting rules specific to government documentation standards.

Multi-Format Audio Support: Upload recordings in various formats - the system automatically handles conversion and optimization for the best transcription results. Support for mono and multi-channel audio ensures compatibility with different recording setups.

Data Retention: Configurable data retention policies ensure compliance with government data handling requirements, with special provisions for different departments' retention policies.

Real-Time Processing: Asynchronous processing architecture ensures efficient handling of large audio files, with job status tracking and progress monitoring throughout the transcription and minute generation process.

Local Transcribe streamlines the traditionally time-intensive process of creating meeting minutes, allowing public sector organizations to focus on decision-making rather than documentation overhead.

Development

Run the app locally

Install Docker.
Make a copy of the .env.example file and name it .env.
Run docker compose up --build.

This will build and run 5 containers:

Frontend app hosted at http://localhost:3000
Backend api available at http://localhost:8080
Worker service, which process transcriptions and does not have a public facing url
Postgres database hosted at http:localhost:5432
ElasticMQ to simulate AWS SQS

LLM and Transcription Services

If you want to run these services locally, see LOCAL_SETUP.md and follow the instructions there.

If you have access to a supported LLM and Transcription provider, you will need to fill in the associated .env variables and configure common/settings.py accordingly. For example, to use transcription and LLM services via Azure APIM, update the following values:

In `.env`

Transcription: AZURE_SPEECH_KEY, AZURE_SPEECH_REGION
LLM: AZURE_APIM_URL, AZURE_APIM_API_VERSION, AZURE_APIM_ACCESS_TOKEN, and AZURE_APIM_SUBSCRIPTION_KEY.

Note:

These APIM values can be found on the Azure APIM Portal, including:
- AZURE_APIM_URL in the format https://{{host}}.gov.uk/{{product_name}}/
- AZURE_APIM_API_VERSION in the format yyyy-mm-dd
The AZURE_APIM_ACCESS_TOKEN is short lived and so must be regenerated every 2 hours.

In `common/settings.py`:

Update FAST_LLM_PROVIDER, FAST_LLM_MODEL_NAME, BEST_LLM_PROVIDER, and BEST_LLM_MODEL_NAME correspondingly.

This should be sufficient for local development. Keys related to 'AWS', 'Google cloud', and 'other' (Sentry/Posthog) are not required. After updating .env, restart the Docker container to apply changes

Set up your development environment:

We use dev containers to emulate the cloud environment in which Local Transcribe is usually deployed.

Running docker compose up --watch will sync local file changes to the docker containers and restart them as appropriate. Note that docker compose down will revert the containers to their base state. See this issue

To instead configure the environemnt locally:

Backend

Install Poetry.
In the root directory, run poetry install.
If using VS Code, open the command palette (Command+Shift+P), click 'Python: Select Interpreter' and select the 'minute-xxxxxxxxxx' env file Poetry has just created.

Frontend

Install node.
In the /frontend directory, run npm install.

Notes

User authenitcation and autherisation is turned off for local development, a 'dummy_user' is created for which every requested is authorised.

Project structure

`frontend/`

The frontend uses Next.js. Calls to the API are made from the client-side and proxied api using Next's middleware. All API calling code is auto-generated by Hey API, the config for this can be found in frontend/openapi-ts.config.ts. It uses the api running locally to get the openapi.json, so to regenerate the types run the docker compose, and then run npm run openapi-ts in frontend/.

`backend/`

The backend uses FastAPI and is responsible for making initial database writes and sending long running processes to a queue (typically SQS)

`worker/`

The worker reads from the queue and executes transcription/file conversion/llm calls, and updates the database with the results

Deployment

Architecture diagram

Local Transcribe was developed to run on AWS and/or Azure, with abstractions available for message queues and cloud storage.

Database Schema

Sentry setup (optional)

To set up sentry for telemetry, create an account at sentry.io.

Navigate to the projects page
Click Create project
Select FASTAPI as project type
Click create
On the following page, in the Configure SDK, copy the value for dsn= KEEP THIS SECRET
Navigate to the SSM parameter store entry for your deployed application
Replace SENTRY_DSN value with the value you copied

Posthog setup (optional)

To set up posthog for UX tracking, feature flags etc, create an account at eu.posthog.com.

create a project and obtain an API key (it should start phc_)
set the key POSTHOG_API_KEY value in your .env

Testing

To run unit tests:

make test

For transcription service evaluation, see evals/transcription/README.md.

Testing paid APIs and LLM prompt evaluations

A special set of tests are available to evaluate paid calls to LLM providers. Since we don't want to run this all the time, we enable these with:

ALLOW_TESTS_TO_ACCESS_PAID_APIS=1

is in your .env file.

In order to run some tests, you will need some preprocessed transcript .json files. These should be located in the top level .data dir in the repo. Within this directory, different subdirectories are routed to different tests (see test_queues_e2e.py for an example).

Adding custom templates

You can add your own templates by implementing either the SimpleTemplate or SectionTemplate protocols (see here) Simply put them in the templates directory, and they will automatically be discovered when the backend starts.

Type Checking

poetry install --with dev

poetry run mypy .
# check entire project

poetry run mypy path/to/file.py
# check a specific file

mypy analyses type hints to catch type-related bugs before runtime. Run it before committing (further validation occurs during the CI/CD process) changes.

Name		Name	Last commit message	Last commit date
Latest commit History 593 Commits
.github		.github
.vscode		.vscode
alembic		alembic
backend		backend
common		common
documentation		documentation
evals		evals
frontend		frontend
integration-tests		integration-tests
terraform		terraform
tests		tests
worker		worker
.cruft.json		.cruft.json
.dockerignore		.dockerignore
.env.example		.env.example
.env.local		.env.local
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
CODEOWNERS		CODEOWNERS
LOCAL_SETUP.md		LOCAL_SETUP.md
Makefile		Makefile
README.md		README.md
alembic.ini		alembic.ini
conftest.py		conftest.py
docker-compose.yaml		docker-compose.yaml
elasticmq.conf		elasticmq.conf
get-apim-token.sh		get-apim-token.sh
local_transcribe_database_schema.png		local_transcribe_database_schema.png
local_transcribe_mhclg_target_architecture.drawio		local_transcribe_mhclg_target_architecture.drawio
local_transcribe_mhclg_target_architecture.png		local_transcribe_mhclg_target_architecture.png
localstack-setup.sh		localstack-setup.sh
minute_architecture_diagram.png		minute_architecture_diagram.png
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
release.sh		release.sh
run-worker-local.sh		run-worker-local.sh
setup-ollama.sh		setup-ollama.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Local Transcribe

Key Features

Development

Run the app locally

LLM and Transcription Services

In `.env`

In `common/settings.py`:

Set up your development environment:

Backend

Frontend

Notes

Project structure

`frontend/`

`backend/`

`worker/`

Deployment

Architecture diagram

Database Schema

Sentry setup (optional)

Posthog setup (optional)

Testing

Testing paid APIs and LLM prompt evaluations

Adding custom templates

Type Checking

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Local Transcribe

Key Features

Development

Run the app locally

LLM and Transcription Services

In .env

In common/settings.py:

Set up your development environment:

Backend

Frontend

Notes

Project structure

frontend/

backend/

worker/

Deployment

Architecture diagram

Database Schema

Sentry setup (optional)

Posthog setup (optional)

Testing

Testing paid APIs and LLM prompt evaluations

Adding custom templates

Type Checking

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

In `.env`

In `common/settings.py`:

`frontend/`

`backend/`

`worker/`

Packages