Releases: OpenPecha/BEC
Releases · OpenPecha/BEC
v1.9.0
🚀 Sprint Release Notes: Sprint 9 - BEC
Date: 2026-04-29 | Version: v1.9.0 | Status: Shipped | Repository: OpenPecha/BEC
Summary
This sprint delivered 3 item(s) in OpenPecha/BEC.
✨ New Features
- ai text outline detection with toc images.: ### Description
We have currently implemented a ai assisted toc detection package. The package performs well on well organised text samples. - Rule based outline detection in text without TOC. (pecha style mostly): ### Description
we have to add rule based mechanism to detect outline breakpoints in text with pecha style which does'nt have toc at all in their text. - glyph matching for unicoded fonts in pdf to xml conversion.: ### Description
The issue is when pdf are converted to xml the fonts with uncicode encodings tends to get misconverted in the output xml due to glyph conversion issues.
[The issue](https://github.
Generated from GitHub Project Board - Sprint 9
v1.8.0
🚀 Sprint Release Notes: Sprint 8 - BEC
Date: 2026-04-09 | Version: v1.8.0 | Status: Shipped | Repository: OpenPecha/BEC
Summary
This sprint delivered 7 item(s) in OpenPecha/BEC.
✨ New Features
- Outline detecting using toc from text.: # AI Text Outline - Project Overview
Description
ai-text-outline is a Python package designed to extract Table of Contents (དཀར་ཆག) from Tibetan texts with high accuracy using Google's Gemini
- Namgyal Pagexml Dataset: # Description
In this card we are going to update the previous pipeline to handle Namgyal and Tibschool dataset.Namgyal and TibSchool datasets are already available as Transkribus PageXML with image-t - Verified Lhasa kangyur pipeline and publish: # Description
In this we are going to verify the pipeline is producing the align files in correct folder and the aligned files correctly matches the [benchmark format](https://github.com/buda-base/bec - Verify Lithang Kangyur Pipeline and publish.: # Description
In this we are going to verify the pipeline is producing the align files in correct folder and the aligned files correctly matches the [benchmark format](https://github.com/buda-base/bec - Publish source Etext on s3: # Description
In this card we are going to upload the source Etext for each aligned collection on s3.
Resources
- To read the documents and implement conversion of 1 file: ##Description
Understanding the conversion flow step by step and implementation of conversion
##Subtask
- Read the Doc
- Setup Git and GitHub
- Conversion of 1 file (pdf to xml)
##Reviewer
- Update outline tool detection package return value.: ### Description
We need to update he return value of package to send both text break points and the toc we have found using ai.
this will help outline tool to check the acccuracy of toc content on th
Generated from GitHub Project Board - Sprint 8