|
| 1 | +# Botok β Python Tibetan Tokenizer |
| 2 | + |
| 3 | +[](https://github.com/OpenPecha/botok/releases) |
| 4 | +[](https://botok.readthedocs.io/en/latest/?badge=latest) |
| 5 | +[](https://coveralls.io/github/OpenPecha/botok?branch=master) |
| 6 | +[](https://black.readthedocs.io/en/stable/) |
| 7 | +[](https://github.com/OpenPecha/botok/blob/master/LICENSE) |
| 8 | +[](https://github.com/OpenPecha/Botok) |
| 9 | + |
| 10 | +## Project Description |
| 11 | + |
| 12 | +Botok is a powerful Python library for tokenizing Tibetan text. It segments text into words with high accuracy and provides optional attributes such as lemma, part-of-speech (POS) tags, and clean forms. The library supports various text formats, custom dialects, and multiple tokenization modes, making it a versatile tool for Tibetan Natural Language Processing (NLP). |
| 13 | + |
| 14 | +## Who This Project Is For |
| 15 | + |
| 16 | +This project is intended for developers and researchers working with Tibetan language text processing who need accurate word segmentation and tokenization. |
| 17 | + |
| 18 | +## Project Dependencies |
| 19 | + |
| 20 | +Before using Botok, ensure you have: |
| 21 | + |
| 22 | +* Python 3.7+ |
| 23 | +* pip (Python package installer) |
| 24 | + |
| 25 | +## Instructions for Installing Botok |
| 26 | + |
| 27 | +### Install Botok |
| 28 | + |
| 29 | +1. **Install via pip:** |
| 30 | + |
| 31 | + ```bash |
| 32 | + pip install botok |
| 33 | + ``` |
| 34 | + |
| 35 | +2. **Install from source:** |
| 36 | + |
| 37 | + ```bash |
| 38 | + git clone https://github.com/OpenPecha/Botok.git |
| 39 | + cd Botok |
| 40 | + pip install -e . |
| 41 | + ``` |
| 42 | + |
| 43 | +### Run Botok |
| 44 | + |
| 45 | +```python |
| 46 | +from botok import Botok |
| 47 | + |
| 48 | +botok = Botok() |
| 49 | +text = "ΰ½ΰ½Όΰ½ΰΌΰ½‘ΰ½²ΰ½" |
| 50 | +tokens = botok.tokenize(text) |
| 51 | +print(tokens) |
| 52 | +``` |
| 53 | + |
| 54 | +## Directory Structure |
| 55 | + |
| 56 | +``` |
| 57 | +Botok/ |
| 58 | +βββ .github/ # GitHub configuration |
| 59 | +βββ botok/ # Main package source code |
| 60 | +βββ docs/ # Documentation |
| 61 | +βββ tests/ # Test suite |
| 62 | +βββ requirements.txt # Development dependencies |
| 63 | +βββ setup.py # Package setup configuration |
| 64 | +βββ setup.cfg # Package metadata |
| 65 | +βββ README.md # Project documentation |
| 66 | +βββ CHANGELOG.md # Release history |
| 67 | +βββ LICENSE # License file |
| 68 | +βββ usage.py # Usage examples |
| 69 | +``` |
| 70 | + |
| 71 | +## Contributing Guidelines |
| 72 | + |
| 73 | +Contributions are welcome! Please read the [contributing guidelines](https://github.com/OpenPecha/Botok/blob/master/README.md#contributing) for details on how to submit pull requests. |
| 74 | + |
| 75 | +## Additional Documentation |
| 76 | + |
| 77 | +* [Full Documentation](https://botok.readthedocs.io/) |
| 78 | +* [API Reference](https://botok.readthedocs.io/en/latest/) |
| 79 | + |
| 80 | +## How to Get Help |
| 81 | + |
| 82 | +* [GitHub Issues](https://github.com/OpenPecha/Botok/issues) |
| 83 | +* [OpenPecha Community](https://openpecha.org) |
| 84 | + |
| 85 | +## Terms of Use |
| 86 | + |
| 87 | +Botok is licensed under the MIT License. |
0 commit comments