Skip to content

cikay/awesome-kurdish-tech

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 

Repository files navigation

Awesome Kurdish Tech

A curated list of Kurdish language AI models, datasets and packages

AI Models

Language Identification

Speech

Text To Speech(TTS)

Forced Alignment

  • MahmoudAshraf/mms-300m-1130-forced-aligner — Forced aligner based on Meta MMS that gives word level timestamps and helps split long audio when preparing TTS and ASR datasets. It supports Kurdish through MMS language codes like Central Kurdish Sorani ckb and Kurmanji kmr-script_latin kmr-script_arabic kmr-script_cyrillic.

Datasets

Text datasets

Speech datasets

  • azadiya-welat-kurdish-kurmanji-voice — Paired Kurmanji audio-text corpus with 15,284 segments (25.9h) of 16 kHz mono WAV from Azadiya Welat news readings, aligned to Latin script transcripts for TTS and ASR fine-tuning.

Packages

Data Collectors

  • kurdish_scrapy — Scrapy-based crawler that collects Kurdish text from websites, extracts article content (Trafilatura), and filters by language (FastText) including kmr_Latn, ckb_Arab, and diq_Latn or any other language.

Text Preprocessing

Contributing

PRs are welcome. If you add a link, please include a short, concrete description and keep the format:

- [name](link) — one-line description

About

A curated list of Kurdish language AI models, datasets and packages

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages