Skip to content

Add NB-Whisper Norwegian transcription models#55

Open
androa wants to merge 1 commit into
roblibob:mainfrom
androa:add-nb-whisper-models
Open

Add NB-Whisper Norwegian transcription models#55
androa wants to merge 1 commit into
roblibob:mainfrom
androa:add-nb-whisper-models

Conversation

@androa

@androa androa commented Mar 12, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds NB-Whisper models (Small, Medium, Large) to the transcription model catalog for improved Norwegian speech recognition.

NB-Whisper is developed by NB AI-Lab (National Library of Norway), fine-tuned on 66,000 hours of Norwegian speech data. The models use the standard GGML format, so no infrastructure changes are needed — the existing download, validation, and whisper.cpp transcription pipelines work as-is.

Models added

Model Size Parameters
NB-Whisper Small ~465 MB 244M
NB-Whisper Medium ~1.4 GB 769M
NB-Whisper Large ~2.9 GB 1550M

Changes

  • TranscriptionModels.swift: Added 3 NB-Whisper model entries with pinned HuggingFace download URLs, SHA256 checksums, and file sizes. Models are always visible in settings (no feature flag).
  • TranscriptionModelCatalogTests.swift (new): Tests for catalog presence, unique IDs/filenames, metadata validity, and lookup.

Usage

Select an NB-Whisper model in Settings → Transcription Model, and set the language to Norwegian for best results.

Add NB-Whisper Small, Medium, and Large models to the transcription
model catalog. These are Norwegian-tuned Whisper models by NB AI-Lab
(National Library of Norway), trained on 66,000 hours of Norwegian
speech data.

Models use the standard GGML format and are downloaded from HuggingFace
with pinned SHA256 checksums. They are always visible in settings (no
feature flag).

- NB-Whisper Small (~465 MB): fast with good Norwegian accuracy
- NB-Whisper Medium (~1.4 GB): better accuracy, larger download
- NB-Whisper Large (~2.9 GB): best accuracy for Norwegian speech

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@roblibob

roblibob commented Apr 4, 2026

Copy link
Copy Markdown
Owner

Hi Androa. This is a bit too specific to merge in to the main, however I'm working on giving users more control on adding their own models for ASR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants