fix: (cont.) Remove Extra Space Before and After Group Items using Inline Boundaries#605
Open
wanadzhar913 wants to merge 1 commit into
Conversation
…CO error Signed-off-by: wanadzhar913 <adzhar.faiq@gmail.com>
Contributor
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🔴 Require two reviewer for test updatesWaiting for
This rule is failing.When test data is updated, we require two reviewers
🟢 Enforce conventional commitWonderful, this rule succeeded.Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/
|
Contributor
Author
Contributor
|
✅ DCO Check Passed Thanks @wanadzhar913, all your commits are properly signed off. 🎉 |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Details
This is a continuation of the work in Pull Request: #458
which emoves extra space before and after group items to resolve the issue raised in #2745
Resolves #371
Resolves docling-project/docling#2745
Approach
Refactors inline spacing in docling_core/transforms/serializer/common.py into a clearer decision flow centered on
_classify_inline_boundary()instead the old approach in #458 where we just remove the space (" ") when joining all parts without separators.Control Flow
_join_inline_parts()is the entry point. It walks adjacent inline chunks, calls_classify_inline_boundary()for each boundary condition, and inserts a space only when that classifier returnsInlineBoundary.SPACE._classify_inline_boundary()now handles boundaries in a fixed order:JOINand avoids adding another space._classify_provenance_boundary()checks original text and provenance/source positions to decide whether the boundary should join or include a space.TextItems,_classify_text_boundary()applies rules for styled/plain text transitions, punctuation, and short word continuations._is_semantic_inline_atom()identifies code, formulas, and links so they can be visually separated from regular text when needed._classify_ambiguous_word_boundary()handles uncertain styled/plain word splits by joining likely continuations like Pars + ing, otherwise preferring readability with a space.Helper Roles
_is_styled_text()detects whether a text item has visible formatting or a hyperlink, which feeds the text-boundary logic._is_semantic_inline_atom()marks inline items that should usually stand apart from regular text, especially code, formulas, and links.