Skip to content

modify usfm for chapter-level drafting to avoid import issues; move remarks to chapters#285

Draft
mshannon-sil wants to merge 1 commit intomainfrom
incremental_draft
Draft

modify usfm for chapter-level drafting to avoid import issues; move remarks to chapters#285
mshannon-sil wants to merge 1 commit intomainfrom
incremental_draft

Conversation

@mshannon-sil
Copy link
Copy Markdown
Collaborator

@mshannon-sil mshannon-sil commented Mar 26, 2026

This PR addresses issue #284.

Mostly looking for high-level feedback about the approach at the moment. As we were discussing, is the right place for this functionality in the get_usfm() method as essentially a post-processing step? Or should we look to implement this feature in process_tokens() (and maybe move the remark logic here as well)?

Some initial thoughts:
Pros for putting it in get_usfm():

  • The code is together in a cohesive unit making it potentially easier to maintain, rather than spread across process_token().
  • If it's just for the purposes of importing, then it can be thought of as a kind of "view" that Paratext needs to avoid import issues while the true model is kept unmodified in handler._tokens. This allows for the option to access the unmodified usfm if needed in the future.

Pros for putting it in process_token():

  • Faster execution time since it's all part of the same iteration
  • If thought of as an essential change to the usfm structure such that alternative views are unnecessary, it could make more structural sense to include it here.

This change is Reviewable

Copy link
Copy Markdown
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ddaspit reviewed 2 files and all commit messages, and made 1 comment.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on Enkidu93 and mshannon-sil).


machine/corpora/update_usfm_parser_handler.py line 345 at r1 (raw file):

        tokens = list(self._tokens)
        if chapters is not None:
            tokens = self._get_incremental_draft_tokens(tokens, chapters)

I think we can do something similar, but before we parse instead of after. Instead of calling parse_usfm in update_usfm, we can do something like this:

tokenizer = UsfmTokenizer(self._settings.stylesheet)
tokens = tokenizer.tokenize(usfm)
tokens = filter_tokens_by_chapter(tokens, chapters)
parser = UsfmParser(tokens, handler, self._settings.stylesheet, self._settings.versification)
parser.process_tokens()

This would avoid updating the whole book.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants