Important
Incubation Project: This project is in active development and a work in progress.
Local Transcribe is an application that is designed to simplify the transcription and minuting of meetings in the public sector. Built with modern web technologies and AI-powered transcription and summarisation services, Local Transcribe transforms how government organisations handle meeting documentation by automating the conversion of audio recordings into structured, professional minutes.
AI-Powered Transcription: Local Transcribe integrates with multiple transcription services including Azure Speech-to-Text and AWS Transcribe, automatically selecting the most appropriate service based on audio duration and quality. The system handles various audio formats and automatically converts them to optimize transcription accuracy.
Professional Meeting Templates: The application includes specialized templates tailored for different types of government meetings, including Cabinet meetings, planning committees, care assessments, and general-purpose meetings. Each template follows specific formatting standards and style guides required for official documentation.
Intelligent Minute Generation: Beyond simple transcription, Local Transcribe uses AI to structure conversations into professional minute formats, applying proper grammar, tense conversion, and formatting rules specific to government documentation standards.
Multi-Format Audio Support: Upload recordings in various formats - the system automatically handles conversion and optimization for the best transcription results. Support for mono and multi-channel audio ensures compatibility with different recording setups.
Data Retention: Configurable data retention policies ensure compliance with government data handling requirements, with special provisions for different departments' retention policies.
Real-Time Processing: Asynchronous processing architecture ensures efficient handling of large audio files, with job status tracking and progress monitoring throughout the transcription and minute generation process.
Local Transcribe streamlines the traditionally time-intensive process of creating meeting minutes, allowing public sector organizations to focus on decision-making rather than documentation overhead.
- Install Docker.
- Make a copy of the
.env.examplefile and name it.env. - Run
docker compose up --build.
This will build and run 5 containers:
- Frontend app hosted at http://localhost:3000
- Backend api available at http://localhost:8080
- Worker service, which process transcriptions and does not have a public facing url
- Postgres database hosted at http:localhost:5432
- ElasticMQ to simulate AWS SQS
If you want to run these services locally, see LOCAL_SETUP.md and follow the instructions there.
If you have access to a supported LLM and Transcription provider, you will need to fill in the associated .env variables and configure common/settings.py accordingly. For example, to use transcription and LLM services via Azure APIM, update the following values:
- Transcription:
AZURE_SPEECH_KEY,AZURE_SPEECH_REGION - LLM:
AZURE_APIM_URL,AZURE_APIM_API_VERSION,AZURE_APIM_ACCESS_TOKEN, andAZURE_APIM_SUBSCRIPTION_KEY.
Note:
- These APIM values can be found on the Azure APIM Portal, including:
- AZURE_APIM_URL in the format
https://{{host}}.gov.uk/{{product_name}}/ - AZURE_APIM_API_VERSION in the format
yyyy-mm-dd
- AZURE_APIM_URL in the format
- The
AZURE_APIM_ACCESS_TOKENis short lived and so must be regenerated every 2 hours.
- Update
FAST_LLM_PROVIDER,FAST_LLM_MODEL_NAME,BEST_LLM_PROVIDER, andBEST_LLM_MODEL_NAMEcorrespondingly.
This should be sufficient for local development. Keys related to 'AWS', 'Google cloud', and 'other' (Sentry/Posthog) are not required. After updating .env, restart the Docker container to apply changes
We use dev containers to emulate the cloud environment in which Local Transcribe is usually deployed.
Running docker compose up --watch will sync local file changes to the docker containers and restart them as appropriate. Note that docker compose down will revert the containers to their base state. See this issue
To instead configure the environemnt locally:
- Install Poetry.
- In the root directory, run
poetry install. - If using VS Code, open the command palette (
Command+Shift+P), click 'Python: Select Interpreter' and select the 'minute-xxxxxxxxxx' env file Poetry has just created.
- Install node.
- In the
/frontenddirectory, runnpm install.
- User authenitcation and autherisation is turned off for local development, a 'dummy_user' is created for which every requested is authorised.
The frontend uses Next.js. Calls to the API are made from the client-side and proxied api using Next's middleware. All API calling code is auto-generated by Hey API, the config for this can be found in frontend/openapi-ts.config.ts. It uses the api running locally to get the openapi.json, so to regenerate the types run the docker compose, and then run npm run openapi-ts in frontend/.
The backend uses FastAPI and is responsible for making initial database writes and sending long running processes to a queue (typically SQS)
The worker reads from the queue and executes transcription/file conversion/llm calls, and updates the database with the results
Local Transcribe was developed to run on AWS and/or Azure, with abstractions available for message queues and cloud storage.
To set up sentry for telemetry, create an account at sentry.io.
- Navigate to the
projectspage - Click
Create project - Select
FASTAPIas project type - Click create
- On the following page, in the
Configure SDK, copy the value fordsn=KEEP THIS SECRET - Navigate to the SSM parameter store entry for your deployed application
- Replace
SENTRY_DSNvalue with the value you copied
To set up posthog for UX tracking, feature flags etc, create an account at eu.posthog.com.
- create a project and obtain an API key (it should start
phc_) - set the key
POSTHOG_API_KEYvalue in your.env
To run unit tests:
make testFor transcription service evaluation, see evals/transcription/README.md.
A special set of tests are available to evaluate paid calls to LLM providers. Since we don't want to run this all the time, we enable these with:
ALLOW_TESTS_TO_ACCESS_PAID_APIS=1is in your .env file.
In order to run some tests, you will need some preprocessed transcript .json files. These should be located in
the top level .data dir in the repo. Within this directory, different subdirectories are routed to
different tests (see test_queues_e2e.py for an example).
You can add your own templates by implementing either the SimpleTemplate or SectionTemplate protocols (see here)
Simply put them in the templates directory, and they will automatically be discovered when the backend starts.
poetry install --with dev
poetry run mypy .
# check entire project
poetry run mypy path/to/file.py
# check a specific filemypy analyses type hints to catch type-related bugs before runtime. Run it before committing (further validation occurs during the CI/CD process) changes.

