Skip to content

openlanguagedata/awesome-new-languages-in-machine-translation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 

Repository files navigation

Awesome New Languages in Machine Translation

This is a list of initiatives for adding new languages to opensource machine translation models (such as NLLB).

Also, some notable projects for increasing the translation quality for an already supported low-resourced language would be highlighted.

The first part of the document lists individual languages in the alphabetic order of their English names.

The second part of the document lists multilingual initiatives.

Any new additions are welcome (in the form of pull requests or issues)!

Single-language projects

Ainu

Amis

Aromanian

Awajun

Bambara

Buryat

Circassian (Kabardian)

Chechen

Emakhuwa:

Erzya

Additionally, see TartuNLP.

Fula

Ge’ez

Hill Mari

See TartuNLP

Interslavic

Karakalpak

Komi

See TartuNLP

Kpelle

Ladin:

Lezgian

lez, lezg1247

Livonian

See TartuNLP

Livvi Karelian

See TartuNLP

Mansi

Mari (Meadow)

See TartuNLP

Moksha

See TartuNLP

Ngambay

Qarachay Malqar

Tamazight

Tyvan

Udmurt

See TartuNLP

Wu Chinese

Zarma

Multilingual projects

Finno-Ugric languages (TartuNLP)

Multiple Finno-Ugric languages (including Komi, Udmurt, Hill and Meadow Mari, Erzya, Livonian, Mansi, Moksha and Livvi Karelian)

Indigenous languages of the Americas (AmericasNLP Shared Tasks)

Indigenous languages of the Americas (including Ashaninka, Aymara, Bribri, Chatino, Guarani, Hñähñu, Nahuatl, Quechua, Raramuri, Shipibo-Konibo, and Wixarika from the AmericasNLP Mt shared task, and Wayuunaiki, Arhuaco, Inga, and Nasa – additionally)

Multiple Iberian languages

Aragonese, Aranese, Asturian, Valencian

Multiple Ethiopian languages

Namely: Afar, Afaan Oromo, Awngi, Amharic, Basketo, Dawuro, Dashenech, Geez, Gamo, Gofa, Gurage, Hadiya, Kafa, Korate, Majang, Male, Murule, Nuer, Shakicho, Sidama, Somali, Tigrinya, Wolaytta.

Hundreds of diverse languages (Apertium)

Apertium is a system of rule-based machine translation.

Currently, it has linguistic tools (such as dictionaries and morphological parsers) for an insane number of languages, but only few of them (51 language pairs) have been developed to a state considered stable enough for publicly releasing a translation service.

About

A list of initiatives for adding new languages to opensource machine translation models

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors