feat(components): add MinerU Document Loader with flash and precision modes by chaserRen · Pull Request #6063 · FlowiseAI/Flowise

chaserRen · 2026-03-26T03:11:00Z

Summary
Add MinerU Document Loader to Flowise for parsing documents via MinerU APIs in flash (token-free) and precision (token-required) modes.

Changes
Add new node implementation at packages/components/nodes/documentloaders/MinerU/MinerU.ts and icon at packages/components/nodes/documentloaders/MinerU/mineru.svg
Support two input modes: URL and file upload
Support MinerU flash and precision workflows
Support precision options: model (vlm/pipeline/html), OCR, formula, table, language, page range, timeout
Support split-pages behavior for PDF sources when page range is provided
Return Flowise Document/Text outputs with metadata (source, mode, language, model, page info)
Add env fallbacks for configuration: MINERU_TOKEN, MINERU_FLASH_BASE_URL, MINERU_API_BASE_URL, MINERU_SOURCE_HEADER
Include polling, timeout, and explicit error handling for MinerU task lifecycle

Why
This integration enables Flowise users to parse PDF/image/office/html documents with MinerU directly in Document Loaders, covering both fast extraction and higher-accuracy extraction
scenarios.

Testing
Confirmed branch diff only adds MinerU loader source and icon files
Verified code paths enforce mode-specific token/file-type requirements and produce clear errors for invalid inputs/timeouts
Verified node outputs and metadata mapping logic for both flash and precision modes

gemini-code-assist · 2026-03-26T03:11:15Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new MinerU Document Loader into Flowise, significantly expanding its document processing capabilities. Users can now leverage MinerU's powerful API to extract content from a wide array of document formats, choosing between a rapid, token-free 'Flash' mode or a more detailed, token-required 'Precision' mode with advanced configuration options. This integration streamlines the ingestion of complex documents into Flowise workflows, providing greater flexibility and accuracy in data extraction.

Highlights

New Document Loader: Added the MinerU Document Loader to Flowise, enabling parsing of various document types (PDF, image, office, HTML) via MinerU APIs.
Dual Modes Supported: Implemented support for both 'Flash' (token-free, fast extraction) and 'Precision' (token-required, higher-accuracy extraction) modes.
Flexible Input Options: Provided options for document input via URL or file upload.
Advanced Precision Configuration: Included extensive configuration options for precision mode, such as model selection (VLM, Pipeline, HTML), OCR, formula and table extraction, language hints, page range specification, and timeout settings.
Enhanced Document Output: Documents are returned as Flowise Document/Text outputs, enriched with metadata including source, mode, language, model, and page information.
Robust API Integration: Incorporated environment variable fallbacks for API configuration (token, base URLs, source header) and implemented polling, timeout, and explicit error handling for the MinerU task lifecycle.
PDF Page Splitting: Added functionality to split PDF sources into individual documents per page when a page range is provided.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a new MinerU Document Loader, enabling document parsing in both 'flash' and 'precision' modes, supporting file uploads and URL inputs for various document types. The review identifies a high-severity security concern due to the custom ZIP parsing implementation, recommending the use of a robust, battle-tested library. Additionally, it suggests improving performance by parallelizing task processing and addressing the non-deterministic generation of filenames.

packages/components/nodes/documentloaders/MinerU/MinerU.ts

gemini-code-assist · 2026-03-26T03:13:09Z

packages/components/nodes/documentloaders/MinerU/MinerU.ts

+        const results: MinerUTaskResult[] = []
+        for (const task of tasks) {
+            const result = await this.runTask(task, config)
+            results.push(result)
+        }


The current implementation processes tasks sequentially using a for...of loop. This can be inefficient when there are multiple files or URLs to process. To improve performance, you can execute these tasks in parallel using Promise.all.

Suggested change

const results: MinerUTaskResult[] = []

for (const task of tasks) {

const result = await this.runTask(task, config)

results.push(result)

}

const results: MinerUTaskResult[] = await Promise.all(tasks.map((task) => this.runTask(task, config)))

gemini-code-assist · 2026-03-26T03:13:09Z

packages/components/nodes/documentloaders/MinerU/MinerU.ts

+
+        const mimeMatch = contentPart.match(/^data:([^;]+);base64,/i)
+        const ext = mimeMatch?.[1] ? MIME_EXTENSION_MAP[mimeMatch[1].toLowerCase()] || 'bin' : 'bin'
+        const guessedName = fileNameFromPayload || `upload_${Date.now()}_${index}.${ext}`


Using Date.now() to generate filenames introduces non-determinism, which can make testing more difficult and less reliable. While collisions are unlikely, it's a good practice to use a more deterministic approach. Consider using a cryptographic hash of the file content for a unique and deterministic name, or a simpler counter if the scope is limited.

chaserRen · 2026-03-26T04:09:37Z

Pushed fixes for the MinerU review feedback in MinerU.ts:

ZIP parsing hardening (Security - High Priority)
Replaced the custom ZIP parsing logic with yauzl (battle-tested library) in the precision markdown
extraction path.
Fix details:

Removed custom low-level ZIP parsing methods.
Added safe ZIP handling with entry-based reading.
Added size guards:
- max ZIP buffer size (MAX_ZIP_BUFFER_BYTES)
- max markdown entry size (MAX_MARKDOWN_BYTES)
Only .md entries are accepted for extraction.

Task execution performance (Medium Priority)
Changed sequential task processing to controlled parallel execution.
Fix details:

Replaced the for...of + await flow with batched concurrent execution via runTasksWithConcurrency(...).
Added default concurrency limit (DEFAULT_TASK_CONCURRENCY = 3) to avoid unbounded Promise.all fan-out.

Deterministic fallback filename generation (Medium Priority)
Removed Date.now()-based fallback naming in inline upload decoding.
Fix details:

Added buildDeterministicUploadName(...) using SHA-256 hash of file content + index + extension.
Keeps filenames stable and test-friendly while preserving existing explicit filename behavior.

Additional:

Added direct dependencies for ZIP parsing support in packages/components/package.json:
- yauzl
- @types/yauzl

chaserRen and others added 3 commits March 23, 2026 14:45

add MinerU Document Reader

74e2573

update

7b6663d

Merge branch 'FlowiseAI:main' into feat/mineru

9821942

gemini-code-assist bot reviewed Mar 26, 2026

View reviewed changes

fix:fix code review

9bd0206

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(components): add MinerU Document Loader with flash and precision modes#6063

feat(components): add MinerU Document Loader with flash and precision modes#6063
chaserRen wants to merge 4 commits intoFlowiseAI:mainfrom
chaserRen:feat/mineru

chaserRen commented Mar 26, 2026

Uh oh!

gemini-code-assist bot commented Mar 26, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

gemini-code-assist bot Mar 26, 2026

Uh oh!

gemini-code-assist bot Mar 26, 2026

Uh oh!

chaserRen commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

chaserRen commented Mar 26, 2026

Uh oh!

gemini-code-assist bot commented Mar 26, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

chaserRen commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant