PyThaiASR

Python Thai Automatic Speech Recognition

PyThaiASR is a Python package for Automatic Speech Recognition with focus on Thai language. It have offline thai automatic speech recognition model.

License: Apache-2.0 License

Google Colab: Link Google colab

Model homepage: https://huggingface.co/airesearch/wav2vec2-large-xlsr-53-th

Install

pip install pythaiasr

For Wav2Vec2 with language model: if you want to use wannaphong/wav2vec2-large-xlsr-53-th-cv8-* model with language model, you needs to install by the step.

pip install pythaiasr[lm]
pip install https://github.com/kpu/kenlm/archive/refs/heads/master.zip

For live audio streaming: If you want to use live audio streaming from microphone/soundcard, you need to install PyAudio:

pip install pythaiasr[stream]

Usage

File-based ASR

from pythaiasr import asr

file = "a.wav"
print(asr(file))

Live Audio Streaming

Stream audio directly from your microphone/soundcard:

from pythaiasr import stream_asr

# Stream audio and print transcriptions in real-time
for transcription in stream_asr(chunk_duration=5.0):
    print(transcription)
    # Press Ctrl+C to stop

API

asr

asr(data: str, model: str = _model_name, lm: bool=False, device: str=None, sampling_rate: int=16_000)

data: path of sound file or numpy array of the voice
model: The ASR model
lm: Use language model (except airesearch/wav2vec2-large-xlsr-53-th model)
device: device
sampling_rate: The sample rate
return: thai text from ASR

stream_asr

stream_asr(model: str = _model_name, lm: bool=False, device: str=None, chunk_duration: float=5.0, sampling_rate: int=16_000)

model: The ASR model
lm: Use language model (except airesearch/wav2vec2-large-xlsr-53-th model)
device: device
chunk_duration: Duration of each audio chunk in seconds (default: 5.0)
sampling_rate: The sample rate (default: 16000)
yield: Thai text transcription from each audio chunk

Options for model

airesearch/wav2vec2-large-xlsr-53-th (default) - AI RESEARCH - PyThaiNLP model
wannaphong/wav2vec2-large-xlsr-53-th-cv8-newmm - Thai Wav2Vec2 with CommonVoice V8 (newmm tokenizer)
wannaphong/wav2vec2-large-xlsr-53-th-cv8-deepcut - Thai Wav2Vec2 with CommonVoice V8 (deepcut tokenizer)
biodatlab/whisper-small-th-combined - Thai Whisper small model
biodatlab/whisper-th-medium-combined - Thai Whisper medium model
biodatlab/whisper-th-large-combined - Thai Whisper large model

You can read about models from the list:

Docker

To use this inside of Docker do the following:

docker build -t <Your Tag name> .
docker run docker run --entrypoint /bin/bash -it <Your Tag name>

You will then get access to a interactive shell environment where you can use python with all packages installed.

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
.github/workflows		.github/workflows
examples		examples
pythaiasr		pythaiasr
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PyThaiASR

Install

Usage

File-based ASR

Live Audio Streaming

API

asr

stream_asr

Docker

About

Uh oh!

Releases 9

Packages

Uh oh!

Uh oh!

Contributors 4

Languages

Folders and files

Latest commit

History

Repository files navigation

PyThaiASR

Install

Usage

File-based ASR

Live Audio Streaming

API

asr

stream_asr

Docker

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 9

Packages 0

Uh oh!

Uh oh!

Contributors 4

Languages

Packages