Skip to content

Releases: OpenPecha/BEC

v1.9.0

29 Apr 10:02
1834417

Choose a tag to compare

🚀 Sprint Release Notes: Sprint 9 - BEC

Date: 2026-04-29 | Version: v1.9.0 | Status: Shipped | Repository: OpenPecha/BEC


Summary

This sprint delivered 3 item(s) in OpenPecha/BEC.

✨ New Features

  • ai text outline detection with toc images.: ### Description
    We have currently implemented a ai assisted toc detection package. The package performs well on well organised text samples.
  • Rule based outline detection in text without TOC. (pecha style mostly): ### Description
    we have to add rule based mechanism to detect outline breakpoints in text with pecha style which does'nt have toc at all in their text.
  • glyph matching for unicoded fonts in pdf to xml conversion.: ### Description
    The issue is when pdf are converted to xml the fonts with uncicode encodings tends to get misconverted in the output xml due to glyph conversion issues.
    [The issue](https://github.

Generated from GitHub Project Board - Sprint 9

v1.8.0

09 Apr 10:32
1834417

Choose a tag to compare

🚀 Sprint Release Notes: Sprint 8 - BEC

Date: 2026-04-09 | Version: v1.8.0 | Status: Shipped | Repository: OpenPecha/BEC


Summary

This sprint delivered 7 item(s) in OpenPecha/BEC.

✨ New Features

  • Outline detecting using toc from text.: # AI Text Outline - Project Overview

Description

ai-text-outline is a Python package designed to extract Table of Contents (དཀར་ཆག) from Tibetan texts with high accuracy using Google's Gemini

  • Namgyal Pagexml Dataset: # Description
    In this card we are going to update the previous pipeline to handle Namgyal and Tibschool dataset.Namgyal and TibSchool datasets are already available as Transkribus PageXML with image-t
  • Verified Lhasa kangyur pipeline and publish: # Description
    In this we are going to verify the pipeline is producing the align files in correct folder and the aligned files correctly matches the [benchmark format](https://github.com/buda-base/bec
  • Verify Lithang Kangyur Pipeline and publish.: # Description
    In this we are going to verify the pipeline is producing the align files in correct folder and the aligned files correctly matches the [benchmark format](https://github.com/buda-base/bec
  • Publish source Etext on s3: # Description
    In this card we are going to upload the source Etext for each aligned collection on s3.

Resources

  • To read the documents and implement conversion of 1 file: ##Description
    Understanding the conversion flow step by step and implementation of conversion
    ##Subtask
  • Read the Doc
  • Setup Git and GitHub
  • Conversion of 1 file (pdf to xml)
    ##Reviewer
  • Update outline tool detection package return value.: ### Description
    We need to update he return value of package to send both text break points and the toc we have found using ai.
    this will help outline tool to check the acccuracy of toc content on th

Generated from GitHub Project Board - Sprint 8