Skip to content

Guidelines for editing dictionary

neverRare edited this page Jan 21, 2026 · 145 revisions

⚠️ Warning: the dictionary is still influx. This document might be outdated.

ilo Token make use of dictionary. It is like a typical dictionary, but it's made for computers to read. Although it should be fairly human-readable. The dictionary defines word to word or word to phrase translation of each word.

ilo Token has two dictionaries: global dictionary and custom dictionary.

The global dictionary contains definitions that everyone will see. The global dictionary lives within the source code and may be edited by anyone through GitHub. ko Koko will accept submission if you don't feel like using GitHub.

The custom dictionary lives within the website and is different for everyone. The main purpose of custom dictionary is to be able to customize the dictionary and test it without bothering to use GitHub. Although the custom dictionary can be used to extend ilo Token with more words. However, it comes with limitations.

Note to readers

The author of this document want this to be as approachable, accessible, and with the least technicality as much as possible. If something is confusing, please report it.

Target vocabulary and supported words

Word support is tiered:

  • From Linku core to common: These are fully supported words. The parser and translator will be modified to accommodate these words.
  • From Linku uncommon to obscure: These are merely accepted words and are not actively maintained by ko Koko and will solely rely on contributors. It can be added if it happens to function like one of the fully supported words. If it however needs special attention e.g. it's a particle, it may be better to exclude these words than have a broken definition. If there are major changes to the dictionary that requires changing every definition, ko Koko may disable these words with the expectation that a contributor will fix it and enable it again.

When a word becomes a sandbox word, it is immediately removed. However, the information shall be retained in case it returned.

Using the custom dictionary

In order to use the custom dictionary, you need to be familiar with the syntax of the dictionary. You can read the syntax guidelines further below; Make use of the table of contents found in the sidebar. There are quick custom dictionary setup.

The interface should be fairly straightforward. At the top of the modal window is the Import Word text box. It is used to import existing definitions from the global dictionary so you can then modify it.

At the center is the custom dictionary. The custom dictionary inherits from the global dictionary so you don't have to import every word.

At the bottom counts how many errors there are in the custom dictionary, click it to reveal error messages so you can fix it. You can click the error messages to point you where the error occurred.

If you think your custom dictionary setup is better than the global dictionary. Please tell so it can be added to the global dictionary.

Code structure

Please take a look at the global dictionary first to get a sense of how it is written down. You'll see that it is defined like the following:

word:
    definitions;
    definitions;

another word:
    definitions;
    definitions;

The syntax is important, please don't forget the semicolon.

Sometimes, words are considered synonyms like "ale" and "ali". In these cases, these are merged:

ale, ali:
    definitions;
    definitions;

Each definitions may contain any of these: word unit, tag, and placeholder. Consider the following.

seli:
    burn(v) [object];

burn is the word unit, (v) is the tag, and [object] is the placeholder. Word units and tags always comes together, the tag represents what kind the word is, usually it's part of speech. Placeholders represents a place that ilo Token may fill in. Although placeholders are actually mainly used to keep the definitions as unambiguous as possible.

A word unit may span multiple words:

jan:
    human being(n);

Sometimes, word units are separated by forward slash /. The function of these are dependent on what the tag is, but it tends to be for defining different inflections.

ona:
    they/them(personal pronoun plural);
    it/it(personal pronoun singular);

A tag may contain more information which are sometimes needed depending on the tag.

pan:
    baked(adj qualifier) goods(n plural);

A definition may have multiple word units, tags, and placeholders, forming a phrase.

olin:
    have(v) strong(adj opinion) emotional(adj opinion) bond(n singular) with(prep) [object];

All of these syntax isn't free-form, it must follow a certain pattern. ilo Token isn't going to magically understand it all. Definitions may be rewritten or simplified in order to fit within the limitations.

"kokosila" for example has to be written like the following, there is no "in an environment where Toki Pona is more appropriate" part.

kokosila:
    speak(v) a(d article singular) non-Toki Pona(adj qualifier) language(n singular);

The patterns are explained further below.

Adding comments

This is a way to tell computers "just ignore what I've written here". In the dictionary, it is denoted by hash sign #. Whatever followed by # are ignored. This is useful for disabling pieces of codes as well as writing notes meant for contributors instead of computers.

You may find a couple of comments in the code.

Escaping

If a word contains a special symbol, wrap it inside backticks `, ilo Token will not include the backticks.

pu:
    interact(v) with(prep) the(d article) book(n singular) titled(adj) Toki Pona`:` The Language of Good(proper n);

You can wrap backticks itself in case you're wondering: ```.

The following symbols are needed to be escaped. Not all are currently in use but are reserved for future use in case the syntax changes. #, (, ), *, +, /, :, ;, <, =, >, @, [, \, ], ^, `, {, |, }, ~.

Defining nouns

Use the tag (n) to define nouns. With some exceptions, you may also use this for pronouns since pronouns tend to act like a noun.

kasi:
    plant(n);

You may add determiners and adjectives before it.

sewi:
    highest(adj origin) part(n);

Adjectives before nouns may not be compounded. Just removing the word and is a good workaround.

palisa:
    # bad
    long(adj size) and(c) hard(adj material) thing(n);

    # good
    long(adj size) hard(adj material) thing(n);

You may add an adjective and proper noun after the noun.

pu:
    the(d article) book(n) titled(adj) Toki Pona`:` The Language of Good(n proper);

ilo Token will automatically apply declensions e.g. singular and plural forms, but if you wish to force it to be singular only or plural only, add singular or plural to the tag.

telo:
    liquid(n singular);

mani:
    savings(n plural);

In some cases, automatic declension can fail, these tend to happen with pronouns. In these cases, use slash / and manually define the singular and plural forms, or limit it as singular only or plural only if needed as explained above.

ni:
    this/these(n);
    that/those(n);

seme:
    what/what(n);
    which/which(n);

A note about gerund

If you used gerund forms of verb, please label it as such. These are filtered out in cases it is used as part of a predicate after "is". This is so that continuous tenses e.g. "is searching" are not displayed. ilo Token currently prefers only simple tenses. If ever it adds more tenses, it will use the verb definitions instead.

alasa:
    searching(n gerund);

Defining personal pronouns

Because personal pronouns has different forms when used as subject or object. These are defined separately from nouns. Use the tag (personal pronoun) to define them.

There is no automatic declension. Use slashes / and define them as follows: singular subject, singular object, plural subject, and then plural object.

You also need to specify whether it is first, second, or third person.

mi:
    I/me/we/us(personal pronoun first);

Sometimes, pronouns only have a singular form or a plural form. In these cases, include singular or plural in the tag. You'll only need to write the subject and object form.

ona:
    they/them(personal pronoun third plural);
    it/it(personal pronoun third singular);

Just an amusing side note: to ilo Token, it is it/it not it/its.

Remember to only consider the grammatical number and not the semantic number: they/them, while can refer to a singular person, is always grammatically plural as it always follows are when used as a subject.

Remember to define possessives as well, these are determiners.

Defining adjectives

Use the (adj) tag to define adjective. You'll need to classify what kind of adjective it is which is needed for reordering chains of adjectives. Apparently, it's "Big Red Balloon" and not "Red Big Balloon"

pona:
    good(adj opinion)

Here are the classification for adjectives and will be ordered from left to right. These are based on the list found on Wikipedia.

  • opinion
  • size
  • physical quality – Particularly a visible quality e.g. flat, circular
  • age
  • color
  • origin – Where it comes from or where it is located e.g. "nearby object"
  • material – Including the property of the material e.g. "hard object"
  • qualifier – Particularly a modifier of compound nouns e.g. "transgender person"

These are just rough categories to aid in sorting adjective and are not set in stone. If new categories are needed, please tell.

Some adjectives may belong in two or more categories. In these cases, test it out. Here's an example: the "land" in "land animal", it can be origin or qualifier. Try it with another adjective whose category is in the middle of origin and qualifier. Let's say "hard" which is material. Then test it: "hard land animal" or "land hard animal", the former feels less awkward, and so it can be determined "land" in "land animal" is a qualifier.

Adjectives may be followed by adverb.

jelo:
    lime(av) yellow(adj color);

Adjectives may be compounded using and(c). This form is currently limited: there can't be adverbs; there can't be more than 2 adjectives; and there can't be conjunctions other than "and". If lifting these limitations is needed please tell.

linja:
    long(adj size) and(c) flexible(adj material);

ilo Token may remove the word "and" when translating: "moku linja" becomes "long flexible food".

If the adjective have the same form as the continuous form of the verb, please label it as "gerund-like" for the same reason noun definitions have gerund labeling.

alasa:
    searching(adj qualifier gerund-like);

Defining determiners

Use the tag (d). You'll need to specify its classification:

ale, ali:
    every(d distributive);

Here are the classification of determiners:

  • article e.g. "the", "a", and "an"
  • demonstrative e.g. "that balloon"
  • distributive e.g. "every balloon" or "each balloon"
  • interrogative e.g. "which balloon"
  • possessive e.g. "my balloon"
  • quantifier e.g. "few balloons" or "many balloons"
  • negative e.g. "not balloon"
  • numeral e.g. "1 balloon" – Use numerals instead

Sometimes, determiners limits what grammatical number the noun can be. In these cases, define them inside the tag as well using keywords singular or plural after the determiner classification.

ale, ali:
    all(d distributive plural);

The determiner "all" forces the noun to be plural e.g. "all apples".

Remember to only consider the grammatical number:

ala:
    zero(d quantifier plural);

Giving an example noun for an explanation: "zero apples", while this refers to zero apples, it is grammatically plural by its form.

Take note that "zero" is used as an example here, don't actually use this example. "zero" is better defined as 0(num) using numeral definition.

Sometimes, determiners itself has singular or plural forms. In these cases, use slash /. There is no automatic declension for this.

ni:
    this/these(d demonstrative);
    that/those(d demonstrative);

Defining numerals

Numerals are technically part of determiner or noun. But since numbers in Toki Pona has interesting grammatical functions, numerals are defined separately. Remember these are for exact numbers, like actual integers. For words describing a rough number e.g. "few", "many", use determiner instead.

Use the tag (num). Use Arabic numerals instead of English word in Latin. Use 5, not five.

luka:
    5(num);

Defining verbs

Use the tag (v) to define verbs. Verbs may follow particles treating as the two as a singular word.

awen:
    continue(v);

open:
    turn on(v) [object];

Verb definitions may come with direct or indirect objects.

musi:
    have(v) fun(n singular);

ku:
    interact(v) with(prep) the(d article) Toki Pona(adj qualifier) Dictionary(n singular);

Use the placeholder [object] for transitive verbs. You may use prepositions.

ante:
    change(v) [object];

lukin, oko:
    look(v) at(prep) [object];

Verb definitions are also used for preverbs. ilo Token supports translating preverbs into linking verbs, catenative verbs, and modal verbs. Make sure to use the [predicate] placeholder.

awen:
    remain(v linking) [predicate];

alasa:
    try to(v) [predicate];

ken:
    can(v modal) [predicate];

These preverbial definitions are also used for non-preverbs: "mi alasa" will translate to "I try to".

The automatic conjugation may fail. In these cases, provide all the needed conjugations with the following order: present plural or infinitive, present singular, then finally past.

mu:
    hiss/hisses/hissed(v);

Defining adverbs

Defining adverbs are as easy as it can get. Use the tag (adv).

pona:
    nicely(adv);

Mark "not" as negative. When encountered with verb, ilo Token will perform special hardcoded verb negation e.g. "I do not continue".

ala:
    not(adv negative);

Don't add so(adv) to the word "a", this is hardcoded instead. (adv) definitions are for content words.

Defining fillers

These are for words "a", "n" and other similar words. Use the tag (f).

Fillers are permitted to be elongated. You'll have to provide different length elongation in a strict pattern: Only one letter can be repeated, and it must be in a consistent increasing pattern only increasing by one.

a:
    ah/aah/aaah(f);

n:
    hm/hmm/hmmm(f);

You may provide just 2 forms, but sticking to 3 is recommended.

a:
    ah/aah(f);

You may not provide any elongation at all, these won't be used when "a" or "n" are elongated.

a:
    ah(f);

Translating "a a a" to "hahaha" is hardcoded in the code. You don't need to define them.

See also interjection.

Defining interjections

Defining interjections are as easy as it can get. Use the tag (i).

mu:
    bark(i);

Interjection definitions are only used when the Toki Pona word is used alone or with "a" in the sentence.

Don't use interjection for particles "a" and "n", use filler instead. (i) definitions are for content words.

Defining preposition

These are for Toki Pona prepositions. Toki Pona preposition happens to be translatable into English preposition. Use the tag (prep). Placeholder [indirect object] are needed.

lon:
    in(prep) [indirect object];

A bit of laziness on the developer's part: You may define adjective-preposition phrase as well as nested preposition as a single preposition.

sama:
    similar to(prep) [indirect object];

kepeken:
    by means of(prep) [indirect object];

Defining noun-preposition phrase

For example "kili lili" can mean "part of fruit". You may define this kind of definition like the following.

lili:
    part(n) of(prep) [headword];

Defining particle definition

These are for Toki Pona particles. The functionality of particles are hardcoded and cannot be customized with the dictionary alone. These definitions are only used in dictionary mode when ilo Token is queried with a single word. Use the closest English word that the word can translate to. Use the tag (particle def).

anu:
    or(particle def);

You may instead describe how the word is used, wrap it in square brackets [], you'll have to wrap it in backticks ` too because square brackets are special characters used for placeholders.

a:
    `[`placed after something for emphasis or emotion`]`(particle def);

Removing words using custom dictionary.

You can "delete" words in which ilo Token will no longer recognize it. This is done by adding the definition head but leaving the body blank. Add comments so you know it's intentional.

kokosila:
    # (deleted)

Particles are hardcoded and therefore cannot be completely removed.

Definition order

Order matters, ilo Token will try to use the first definition and output it first, although not always. So please reorder the definitions from most-likely definition to least.

Avoiding calques and confusing definitions

ilo Token borrows definitions from lipu Linku which itself avoids calques. However, words that generally has multiple meaning that could be confused at should be avoided. For example, the word "cool", which is simultaneously the word for "lete" and "epiku" which have different meaning, so the word "cool" should be avoided.

Shrinking down definition number

ilo Token will show many output, and it may be very numerous. To counteract this, please reduce the number of definition if possible, try to use words with broad meaning that aligns well with the Toki Pona word.

Using lipu Linku

Using lipu Linku as a reference is recommended. lipu Linku is very high quality. You may borrow definitions from it. You may deviate from lipu Linku if needed. Consider contributing to lipu Linku as well.

Using another dictionary

lipu Linku is the assumed source. However. If you're basing the definition from another dictionary, attribute it and mention its license within the definition itself. It needs to be an open-source license where anyone is free to modify it!

a:
    # definitions here

    # Taken from lipu Linku
    # sona Linku is dual licensed under...

This is just an example, lipu Linku doesn't need to be attributed as it is already attributed at the top of the dictionary.

It is possible to release ilo Token's dictionary with different licenses for each definition. However, please avoid this. If the license is compatible with the license of ilo Token's dictionary, adopt it instead.