-
Notifications
You must be signed in to change notification settings - Fork 6
✨ add support for crop & split extractions #395
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 3 commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,4 @@ | ||
| from mindee.v2.file_operations.crop import Crop | ||
| from mindee.v2.file_operations.split import Split | ||
|
|
||
| __all__ = ["Crop", "Split"] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,69 @@ | ||
| from typing import List | ||
|
|
||
| from mindee.error import MindeeError | ||
| from mindee.extraction import ExtractedImage, extract_multiple_images_from_source | ||
| from mindee.geometry import Polygon | ||
| from mindee.input.sources.local_input_source import LocalInputSource | ||
| from mindee.parsing.v2.field import FieldLocation | ||
| from mindee.v2.product.crop.crop_box import CropBox | ||
|
|
||
|
|
||
| class Crop: | ||
|
sebastianMindee marked this conversation as resolved.
Outdated
|
||
| """Crop operations for V2.""" | ||
|
|
||
| @classmethod | ||
| def extract_single_crop( | ||
| cls, input_source: LocalInputSource, crop: FieldLocation | ||
| ) -> ExtractedImage: | ||
| """ | ||
| Extracts a single crop as complete PDFs from the document. | ||
|
|
||
| :param input_source: Local Input Source to extract sub-receipts from. | ||
| :param crop: Crop to extract. | ||
| :return: ExtractedImage. | ||
| """ | ||
|
|
||
| return extract_multiple_images_from_source( | ||
| input_source, crop.page, [crop.polygon] | ||
| )[0] | ||
|
|
||
| @classmethod | ||
| def extract_crops( | ||
| cls, input_source: LocalInputSource, crops: List[CropBox] | ||
| ) -> List[ExtractedImage]: | ||
| """ | ||
| Extracts individual receipts from multi-receipts documents. | ||
|
|
||
| :param input_source: Local Input Source to extract sub-receipts from. | ||
| :param crops: List of crops. | ||
| :return: Individual extracted receipts as an array of ExtractedImage. | ||
| """ | ||
| images: List[ExtractedImage] = [] | ||
| if not crops: | ||
| raise MindeeError("No possible candidates found for Crop extraction.") | ||
| polygons: List[List[Polygon]] = [[] for _ in range(input_source.page_count)] | ||
| for i, crop in enumerate(crops): | ||
| polygons[crop.location.page].append(crop.location.polygon) | ||
| for i, polygon in enumerate(polygons): | ||
| images.extend( | ||
| extract_multiple_images_from_source( | ||
| input_source, | ||
| i, | ||
| polygon, | ||
| ) | ||
| ) | ||
| return images | ||
|
|
||
| @classmethod | ||
| def apply( | ||
| cls, | ||
| input_source: LocalInputSource, | ||
| crops: List[CropBox], | ||
| ) -> List[ExtractedImage]: | ||
| """Crop a document into multiple pages. | ||
|
|
||
| :param input_source: Input source to crop. | ||
| :param crops: List of crops. | ||
| """ | ||
|
|
||
| return cls.extract_crops(input_source, crops) | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,49 @@ | ||
| from typing import List, Union | ||
|
|
||
| from mindee.error import MindeeError | ||
| from mindee.extraction import ExtractedPdf, PdfExtractor | ||
| from mindee.input.sources.local_input_source import LocalInputSource | ||
| from mindee.v2.product.split.split_range import SplitRange | ||
|
|
||
|
|
||
| class Split: | ||
|
sebastianMindee marked this conversation as resolved.
Outdated
|
||
| """Split operations for V2.""" | ||
|
|
||
| @classmethod | ||
| def extract_splits( | ||
| cls, | ||
| input_source: LocalInputSource, | ||
| splits: Union[List[SplitRange], List[List[int]]], | ||
| ) -> List[ExtractedPdf]: | ||
| """ | ||
| Extracts splits as complete PDFs from the document. | ||
|
|
||
| :param input_source: Input source to split. | ||
| :param splits: List of sub-lists of pages to keep. | ||
| :return: A list of extracted invoices. | ||
| """ | ||
| pdf_extractor = PdfExtractor(input_source) | ||
| page_groups = [] | ||
| for split in splits: | ||
| if isinstance(split, SplitRange): | ||
| lower_bound = split.page_range[0] | ||
| upper_bound = split.page_range[1] | ||
| else: | ||
| lower_bound = split[0] | ||
| upper_bound = split[1] | ||
| page_groups.append(list(range(lower_bound, upper_bound + 1))) | ||
| if len(splits) < 1: | ||
| raise MindeeError("No indexes provided.") | ||
| return pdf_extractor.extract_sub_documents(page_groups) | ||
|
|
||
| @classmethod | ||
| def apply( | ||
| cls, input_source: LocalInputSource, splits: List[SplitRange] | ||
| ) -> List[ExtractedPdf]: | ||
| """Split a document into multiple pages. | ||
|
|
||
| :param input_source: Input source to split. | ||
| :param splits: List of splits. | ||
| """ | ||
|
|
||
| return cls.extract_splits(input_source, splits) | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Submodule data
updated
3 files
| +1 −1 | v2/products/crop/crop_multiple.json | |
| +1 −1 | v2/products/crop/crop_multiple.rst | |
| +2 −2 | v2/products/split/split_multiple.json |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.