Feat: Implement hierarchical split functionality for PDF bookmarks by Chapoly1305 · Pull Request #799 · torakiki/pdfsam

Chapoly1305 · 2025-11-13T17:51:26Z

This pull request introduces support for hierarchical splitting of PDF files by bookmarks, enabling the creation of directory structures based on bookmark levels. The main changes include the addition of a new task and parameters to handle hierarchical output, enhancements to the task execution controller to route hierarchical split requests, and updates to module dependencies.

Demo
In Settings.

Exported PDF Destination Folders.

Tested on macOS.

Hierarchical Split Functionality

Added HierarchicalSplitByBookmarksTask, which splits PDFs by bookmarks and creates directories for parent bookmarks, placing child splits within them. It supports overlap and auto-detection of overlapping pages for more accurate splits.
Introduced HierarchicalSplitByOutlineLevelParameters, an extension of SplitByOutlineLevelParameters, to configure hierarchical output, overlap pages, and auto-detect overlap options.
Updated SplitByOutlineLevelParametersBuilder to include options for hierarchical output, overlap pages, and auto-detect overlap.

Task Execution Routing

Modified TaskExecutionController to detect and route hierarchical split requests to the new task using reflection, avoiding direct dependencies and circular references. [1] [2]

Dependency and Module Configuration

Added necessary dependencies (sejda-core, sejda-commons, sejda-sambox) to pdfsam-split-by-bookmarks and updated the module descriptor to require these modules, ensuring the new functionality can access required PDF processing features. [1] [2]

…rlap options

Copilot

Pull request overview

Adds a new “hierarchical output” mode to the Split by bookmarks tool, including optional overlap-page handling, by extending the parameters and introducing a custom split task that creates a folder per parent bookmark.

Changes:

Adds UI controls to enable hierarchical directory output and configure overlap pages, persisting these in workspace state.
Extends the split-by-bookmarks parameters builder to emit a new HierarchicalSplitByOutlineLevelParameters type.
Introduces HierarchicalSplitByBookmarksTask and routes execution via a special-case path in TaskExecutionController using reflection.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 13 comments.

Show a summary per file

File	Description
pdfsam-tools/pdfsam-split-by-bookmarks/src/main/java/org/pdfsam/tools/splitbybookmarks/SplitOptionsPane.java	Adds hierarchical output + overlap UI, reset/apply logic, and workspace persistence.
pdfsam-tools/pdfsam-split-by-bookmarks/src/main/java/org/pdfsam/tools/splitbybookmarks/SplitByOutlineLevelParametersBuilder.java	Adds hierarchical/overlap fields and builds hierarchical parameters when enabled.
pdfsam-tools/pdfsam-split-by-bookmarks/src/main/java/org/pdfsam/tools/splitbybookmarks/HierarchicalSplitByOutlineLevelParameters.java	New parameters subtype carrying hierarchical + overlap settings.
pdfsam-tools/pdfsam-split-by-bookmarks/src/main/java/org/pdfsam/tools/splitbybookmarks/HierarchicalSplitByBookmarksTask.java	New task implementing hierarchical splitting and overlap logic.
pdfsam-tools/pdfsam-split-by-bookmarks/src/main/java/module-info.java	Adds module requirements for Sejda/SAMBox + logging needed by the new task.
pdfsam-tools/pdfsam-split-by-bookmarks/pom.xml	Adds Sejda dependencies needed by the new task implementation.
pdfsam-service/src/main/java/org/pdfsam/service/task/TaskExecutionController.java	Special-cases hierarchical params and executes the hierarchical task via reflection.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-12T17:16:50Z

+        var overlapLabel = new Label(i18n().tr("Additional overlap pages:"));
+        GridPane.setValignment(overlapLabel, VPos.BOTTOM);
+        GridPane.setHalignment(overlapLabel, HPos.LEFT);
+        add(overlapLabel, 0, 3);
+        overlapPagesField.setId("overlapPages");
+        overlapPagesField.setPromptText("0");
+        overlapPagesField.setPrefWidth(100);


overlapLabel is not associated with overlapPagesField (missing setLabelFor), which makes the new input harder to use with assistive technologies and breaks the pattern used for the other fields in this pane (e.g., regexLabel.setLabelFor(regexpField)).

Copilot · 2026-03-12T17:16:50Z

+        builder.hierarchicalOutput(hierarchicalOutputCheck.isSelected());
+
+        // Auto-detect is now always enabled when hierarchical output is selected
+        builder.autoDetectOverlap(hierarchicalOutputCheck.isSelected());
+
+        if (isNotBlank(overlapPagesField.getText())) {
+            try {
+                int overlap = Integer.parseInt(overlapPagesField.getText().trim());
+                if (overlap >= 0) {
+                    builder.overlapPages(overlap);
+                } else {


The overlap pages input is parsed and applied to the builder even when hierarchical output is not selected. However, SplitByOutlineLevelParametersBuilder.build() ignores overlapPages unless hierarchicalOutput is true, so user input can be silently dropped. Consider disabling/hiding this field unless hierarchical output is enabled, or only calling builder.overlapPages(...) when hierarchicalOutputCheck.isSelected() is true (and possibly surfacing a validation error if the user entered a value while hierarchical mode is off).

Copilot · 2026-03-12T17:16:50Z

+                if (overlap >= 0) {
+                    builder.overlapPages(overlap);
+                } else {
+                    onError.accept(i18n().tr("Overlap pages must be a positive number"));


The validation/message for negative overlap values is inconsistent: the code accepts overlap >= 0 (so 0 is valid) but the error says "must be a positive number" (which usually excludes 0). Either change the check to > 0 or update the message to explicitly allow 0 (e.g., non-negative / zero or positive).

Suggested change

onError.accept(i18n().tr("Overlap pages must be a positive number"));

onError.accept(i18n().tr("Overlap pages must be zero or a positive number"));

Copilot · 2026-03-12T17:16:50Z

-        SplitByOutlineLevelParameters params = new SplitByOutlineLevelParameters(level);
+        SplitByOutlineLevelParameters params;
+        if (hierarchicalOutput) {
+            params = new HierarchicalSplitByOutlineLevelParameters(level, true, overlapPages, autoDetectOverlap);


build() passes a constant true for the hierarchicalOutput constructor parameter (new HierarchicalSplitByOutlineLevelParameters(level, true, ...)). This makes the builder state and constructor API inconsistent and can hide bugs if the constructor argument is later used for behavior. Pass the hierarchicalOutput field instead, or remove the constructor parameter/field entirely since the type already implies hierarchical mode.

Suggested change

params = new HierarchicalSplitByOutlineLevelParameters(level, true, overlapPages, autoDetectOverlap);

params = new HierarchicalSplitByOutlineLevelParameters(level, hierarchicalOutput, overlapPages, autoDetectOverlap);

Copilot · 2026-03-12T17:16:51Z

+    void hierarchicalOutput(boolean hierarchicalOutput) {
+        this.hierarchicalOutput = hierarchicalOutput;
+    }
+
+    boolean isHierarchicalOutput() {
+        return hierarchicalOutput;
+    }
+
+    void overlapPages(int overlapPages) {
+        this.overlapPages = overlapPages;
+    }
+
+    int getOverlapPages() {
+        return overlapPages;
+    }
+
+    void autoDetectOverlap(boolean autoDetectOverlap) {
+        this.autoDetectOverlap = autoDetectOverlap;
+    }
+
+    boolean isAutoDetectOverlap() {
+        return autoDetectOverlap;
+    }
+
    @Override
    public SplitByOutlineLevelParameters build() {
-        SplitByOutlineLevelParameters params = new SplitByOutlineLevelParameters(level);
+        SplitByOutlineLevelParameters params;
+        if (hierarchicalOutput) {
+            params = new HierarchicalSplitByOutlineLevelParameters(level, true, overlapPages, autoDetectOverlap);
+        } else {
+            params = new SplitByOutlineLevelParameters(level);
+        }


There is existing unit coverage for this builder, but the newly added hierarchical settings are untested. Please add tests that verify: (1) hierarchicalOutput(true) causes build() to return HierarchicalSplitByOutlineLevelParameters, and (2) overlapPages/autoDetectOverlap are propagated as expected.

Copilot · 2026-03-12T17:16:52Z

+            // Calculate the actual end page with overlap
+            int totalPages = documentHandler.getUnderlyingPDDocument().getNumberOfPages();
+            int actualEndPage = Math.min(section.endPage + overlapPages, totalPages);
+
+            LOG.debug("Splitting section '{}' from page {} to {} (with {} pages overlap)",
+                    section.title, section.startPage, actualEndPage, overlapPages);
+
+            // Extract pages for this section (including overlap)
+            for (int pageNum = section.startPage; pageNum < actualEndPage; pageNum++) {
+                handler.addPage(documentHandler.getUnderlyingPDDocument().getPage(pageNum - 1)); // SAMBox uses 0-based indexing
+            }


The end-page math in splitSection is inconsistent with the loop bounds: actualEndPage is clamped to totalPages, and the loop uses < actualEndPage, so page totalPages is never included. This becomes visible for the last section and also when overlap pushes the end past the last page. Use a consistent convention (exclusive end bound usually implies clamping to totalPages + 1 and iterating < endExclusive, or inclusive bound implies clamping to totalPages and iterating <= endInclusive).

Copilot · 2026-03-12T17:16:52Z

+            File outputDirectory = parameters.getOutput().getDestination();
+
+            // Get overlap configuration if this is a hierarchical split
+            int overlapPages = 0;
+            boolean autoDetectOverlap = false;
+            if (parameters instanceof HierarchicalSplitByOutlineLevelParameters) {
+                HierarchicalSplitByOutlineLevelParameters hierarchicalParams = (HierarchicalSplitByOutlineLevelParameters) parameters;
+                overlapPages = hierarchicalParams.getOverlapPages();
+                autoDetectOverlap = hierarchicalParams.isAutoDetectOverlap();
+                LOG.debug("Using overlap of {} pages, auto-detect: {}", overlapPages, autoDetectOverlap);
+            }


This task writes output PDFs directly to parentDir using sanitizeFilename(section.title) and does not apply several SplitByOutlineLevelParameters settings that the UI/builder sets (e.g., ExistingOutputPolicy, outputPrefix, and parameters.getOutput().accept(...) handling). This can lead to overwriting files even when the policy is SKIP/FAIL and to filenames that ignore the configured prefix. Consider using the same output writing/name-generation approach used by other tasks in pdfsam-service (e.g., OutputWriters.newMultipleOutputWriter(...) + NameGenerator), or otherwise explicitly honor the parameters here.

Copilot · 2026-03-12T17:16:52Z

+    private void splitSection(BookmarkSection section, File parentDir, SplitByOutlineLevelParameters parameters,
+            int currentStep, int overlapPages) throws TaskException {
+        try {
+            PDDocumentHandler handler = new PDDocumentHandler();
+            handler.setCreatorOnPDDocument();
+            handler.setVersionOnPDDocument(
+                    parameters.getVersion() != null ? parameters.getVersion() : PdfVersion.VERSION_1_6);
+            handler.setCompress(parameters.isCompress());
+
+            // Calculate the actual end page with overlap
+            int totalPages = documentHandler.getUnderlyingPDDocument().getNumberOfPages();
+            int actualEndPage = Math.min(section.endPage + overlapPages, totalPages);
+
+            LOG.debug("Splitting section '{}' from page {} to {} (with {} pages overlap)",
+                    section.title, section.startPage, actualEndPage, overlapPages);
+
+            // Extract pages for this section (including overlap)
+            for (int pageNum = section.startPage; pageNum < actualEndPage; pageNum++) {
+                handler.addPage(documentHandler.getUnderlyingPDDocument().getPage(pageNum - 1)); // SAMBox uses 0-based indexing
+            }
+
+            String filename = sanitizeFilename(section.title) + ".pdf";
+            File outputFile = new File(parentDir, filename);
+
+            handler.savePDDocument(outputFile);
+            handler.close();
+
+            LOG.debug("Created file: {}", outputFile);
+
+            // Report progress
+            BigDecimal percentComplete = BigDecimal.valueOf((currentStep / (double) totalSteps) * 100);
+            GlobalNotificationContext.getContext()
+                    .notifyListeners(new PercentageOfWorkDoneChangedEvent(percentComplete, NotifiableTaskMetadata.NULL));
+        } catch (Exception e) {
+            throw new TaskException("Failed to split section: " + section.title, e);
+        }
+    }


splitSection only closes the PDDocumentHandler handler on the success path. If any exception is thrown while adding pages or saving, the handler/document may leak. Use a try/finally (or try-with-resources if supported) to ensure the handler is always closed.

Copilot · 2026-03-12T17:16:52Z

+        // Replace invalid filename characters with underscores
+        String sanitized = filename.replaceAll("[\\\\/:*?\"<>|]", "_");
+        // Remove leading/trailing spaces and dots
+        sanitized = sanitized.trim().replaceAll("^\\.+", "");


The comment in sanitizeFilename says it removes "leading/trailing spaces and dots", but the implementation only trims spaces and removes leading dots (replaceAll("^\\.+", "")). Either update the implementation to also strip trailing dots (and possibly trailing spaces already handled by trim()), or correct the comment so it matches behavior.

Suggested change

sanitized = sanitized.trim().replaceAll("^\\.+", "");

sanitized = sanitized.trim().replaceAll("^\\.+", "").replaceAll("\\.+$", "");

Copilot · 2026-03-12T17:16:52Z

+            // Use reflection to instantiate and execute the hierarchical task
+            Class<?> taskClass = Class.forName(
+                    "org.pdfsam.tools.splitbybookmarks.HierarchicalSplitByBookmarksTask");
+            Object task = taskClass.getDeclaredConstructor().newInstance();
+
+            // Execute the task
+            java.lang.reflect.Method executeMethod = taskClass.getMethod("execute",
+                    SplitByOutlineLevelParameters.class);
+            executeMethod.invoke(task, params);
+        } catch (Exception e) {
+            LOG.error("Failed to execute hierarchical split task", e);
+            GlobalNotificationContext.getContext()
+                    .notifyListeners(new TaskExecutionFailedEvent(e, NotifiableTaskMetadata.NULL));


executeHierarchicalSplit invokes the task via reflection but only calls execute(...). Since HierarchicalSplitByBookmarksTask relies on BaseTask.before(...) to compile the regex (titleMatchingPattern) and on after() to close the opened documentHandler, skipping the lifecycle means: regex filtering won’t work and the input document may never be closed (resource leak). Also, failures will likely emit duplicate TaskExecutionFailedEvents because the task already notifies failures and the controller notifies again when the reflective call throws. Please route hierarchical execution through the standard TaskExecutionService (preferred) or, if reflection is unavoidable, explicitly call before(...) and after() in a finally block and avoid double-notifying on failures (unwrap InvocationTargetException).

Suggested change

// Use reflection to instantiate and execute the hierarchical task

Class<?> taskClass = Class.forName(

"org.pdfsam.tools.splitbybookmarks.HierarchicalSplitByBookmarksTask");

Object task = taskClass.getDeclaredConstructor().newInstance();

// Execute the task

java.lang.reflect.Method executeMethod = taskClass.getMethod("execute",

SplitByOutlineLevelParameters.class);

executeMethod.invoke(task, params);

} catch (Exception e) {

LOG.error("Failed to execute hierarchical split task", e);

GlobalNotificationContext.getContext()

.notifyListeners(new TaskExecutionFailedEvent(e, NotifiableTaskMetadata.NULL));

// Route hierarchical split through the standard TaskExecutionService so that

// task lifecycle (before/after) and notifications are handled correctly.

executionService.execute(params);

} catch (Exception e) {

// Let TaskExecutionService / task infrastructure handle failure notifications

// to avoid duplicate TaskExecutionFailedEvent emissions.

LOG.error("Failed to execute hierarchical split task", e);

Chapoly1305 and others added 2 commits November 13, 2025 12:30

Implement hierarchical split functionality for PDF bookmarks with ove…

d253ded

…rlap options

Merge branch 'torakiki:master' into pdfsplit-tree

a26c1df

Copilot AI review requested due to automatic review settings March 12, 2026 17:10

Copilot started reviewing on behalf of Chapoly1305 March 12, 2026 17:11 View session

Copilot AI reviewed Mar 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feat: Implement hierarchical split functionality for PDF bookmarks#799

Feat: Implement hierarchical split functionality for PDF bookmarks#799
Chapoly1305 wants to merge 2 commits intotorakiki:masterfrom
Chapoly1305:pdfsplit-tree

Chapoly1305 commented Nov 13, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 12, 2026

Uh oh!

Copilot AI Mar 12, 2026

Uh oh!

Copilot AI Mar 12, 2026

Uh oh!

Copilot AI Mar 12, 2026

Uh oh!

Copilot AI Mar 12, 2026

Uh oh!

Copilot AI Mar 12, 2026

Uh oh!

Copilot AI Mar 12, 2026

Uh oh!

Copilot AI Mar 12, 2026

Uh oh!

Copilot AI Mar 12, 2026

Uh oh!

Copilot AI Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	onError.accept(i18n().tr("Overlap pages must be a positive number"));
	onError.accept(i18n().tr("Overlap pages must be zero or a positive number"));

	params = new HierarchicalSplitByOutlineLevelParameters(level, true, overlapPages, autoDetectOverlap);
	params = new HierarchicalSplitByOutlineLevelParameters(level, hierarchicalOutput, overlapPages, autoDetectOverlap);

	sanitized = sanitized.trim().replaceAll("^\\.+", "");
	sanitized = sanitized.trim().replaceAll("^\\.+", "").replaceAll("\\.+$", "");

Uh oh!

Conversation

Chapoly1305 commented Nov 13, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants