Skip to content

Bump kreuzberg from 4.2.9 to 4.3.1#69

Closed
dependabot[bot] wants to merge 1 commit into
masterfrom
dependabot/uv/kreuzberg-4.3.1
Closed

Bump kreuzberg from 4.2.9 to 4.3.1#69
dependabot[bot] wants to merge 1 commit into
masterfrom
dependabot/uv/kreuzberg-4.3.1

Conversation

@dependabot
Copy link
Copy Markdown
Contributor

@dependabot dependabot Bot commented on behalf of github Feb 13, 2026

Bumps kreuzberg from 4.2.9 to 4.3.1.

Release notes

Sourced from kreuzberg's releases.

Release v4.3.1

Fixed

Elixir Package Checksums (#383)

  • Fixed checksum mismatch for Elixir 4.3.0 Hex package: Updated checksum-Elixir.Kreuzberg.Native.exs with correct SHA256 checksums for all 8 precompiled NIF binaries (NIF 2.16/2.17 across aarch64-apple-darwin, aarch64-unknown-linux-gnu, x86_64-unknown-linux-gnu, x86_64-pc-windows-gnu). The 4.3.0 release shipped with outdated 4.2.10 checksums, causing installation failures.

Dependency Updates

  • Updated all dependencies across 10 language ecosystems: Rust, Python, Node/TypeScript, Ruby, PHP, Go, Java, C#, Elixir, WASM, and pre-commit hooks all updated to latest compatible versions.
  • Enhanced dependency update tasks: All language-specific `task update` commands now upgrade to latest major versions (not just respecting version constraints). PHP, Ruby, C#, Elixir, and Python update tasks enhanced with major version upgrade support.

WASM Compatibility

  • Fixed WASM build failures: Added explicit `getrandom 0.3.4` dependency with `wasm_js` feature to `kreuzberg-wasm` crate to ensure transitive dependencies (ahash, lopdf, rand_core) have WebAssembly support enabled.

Dependency Pins

  • Pinned lzma-rust2 to 0.15.7: The 0.16.1 upgrade is incompatible with crc 3.4.0. Keeping 0.15.7 until upstream compatibility is restored.

See CHANGELOG.md for full details.

v4.3.0

Added

Blank Page Detection

  • is_blank field on PageInfo and PageContent: Pages with fewer than 3 non-whitespace characters and no tables or images are flagged as blank. Detection uses a two-phase approach: text-only analysis during extraction, then refinement after table/image assignment. Available across all 9 language bindings (Python, TypeScript, Ruby, Java, Go, C#, PHP, Elixir, WASM). Closes #378.

PaddleOCR Backend

  • PaddleOCR backend via ONNX Runtime: New OCR backend (kreuzberg-paddle-ocr) using PaddlePaddle's PP-OCRv4 models converted to ONNX format, run via ONNX Runtime. Supports 6 languages (English, Chinese, Japanese, Korean, German, French) with automatic model downloading and caching. Provides superior CJK recognition compared to Tesseract.
  • PaddleOCR support in all bindings: Available across Python, Rust, TypeScript/Node.js, Go, Java, PHP, Ruby, C#, and Elixir bindings via the paddle-ocr feature flag.
  • PaddleOCR CLI support: The kreuzberg-cli binary supports --ocr-backend paddle-ocr for PaddleOCR extraction.

Unified OCR Element Output

  • Structured OCR element data: Extraction results now include OcrElement data with bounding geometry (rectangles and quadrilaterals), per-element confidence scores, rotation information, and hierarchical levels (word, line, block, page). Available from both PaddleOCR and Tesseract backends.

Shared ONNX Runtime Discovery

  • ort_discovery module: Finds ONNX Runtime shared libraries across platforms, shared between PaddleOCR and future ONNX-based backends.

Document Structure Output

  • DocumentStructure support across all bindings: Added structured document output with include_document_structure configuration option across Python, TypeScript/Node.js, Go, Java, PHP, Ruby, C#, Elixir, and WASM bindings.

Native DOC/PPT Extraction

  • OLE/CFB-based extraction: Added native DOC and PPT extraction via OLE/CFB binary parsing. Legacy Office formats no longer require any external tools.

musl Linux Support

  • Re-enabled musl targets: Added x86_64-unknown-linux-musl and aarch64-unknown-linux-musl targets for CLI binaries, Python wheels (musllinux), and Node.js native bindings. Resolves glibc 2.38+ requirement for prebuilt CLI binaries on older distros like Ubuntu 22.04 (#364).

Fixed

MSG Extraction Hang on Large Attachments (#372)

  • Fixed .msg (Outlook) extraction hanging indefinitely on files with large attachments. Replaced the msg_parser crate with direct OLE/CFB parsing using the cfb crate — attachment binary data is now read directly without hex-encoding overhead.
  • Added lenient FAT padding for MSG files with truncated sector tables produced by some Outlook versions.

... (truncated)

Changelog

Sourced from kreuzberg's changelog.

[4.3.1] - 2026-02-12

Fixed

Elixir Package Checksums (#383)

  • Fixed checksum mismatch for Elixir 4.3.0 Hex package: Updated checksum-Elixir.Kreuzberg.Native.exs with correct SHA256 checksums for all 8 precompiled NIF binaries (NIF 2.16/2.17 across aarch64-apple-darwin, aarch64-unknown-linux-gnu, x86_64-unknown-linux-gnu, x86_64-pc-windows-gnu). The 4.3.0 release shipped with outdated 4.2.10 checksums, causing installation failures.

Dependency Updates

  • Updated all dependencies across 10 language ecosystems: Rust, Python, Node/TypeScript, Ruby, PHP, Go, Java, C#, Elixir, WASM, and pre-commit hooks all updated to latest compatible versions.
  • Enhanced dependency update tasks: All language-specific task update commands now upgrade to latest major versions (not just respecting version constraints). PHP, Ruby, C#, Elixir, and Python update tasks enhanced with major version upgrade support.

WASM Compatibility

  • Fixed WASM build failures: Added explicit getrandom 0.3.4 dependency with wasm_js feature to kreuzberg-wasm crate to ensure transitive dependencies (ahash, lopdf, rand_core) have WebAssembly support enabled.

Dependency Pins

  • Pinned lzma-rust2 to 0.15.7: The 0.16.1 upgrade is incompatible with crc 3.4.0. Keeping 0.15.7 until upstream compatibility is restored.

[4.3.0] - 2026-02-11

Added

Blank Page Detection

  • is_blank field on PageInfo and PageContent: Pages with fewer than 3 non-whitespace characters and no tables or images are flagged as blank. Detection uses a two-phase approach: text-only analysis during extraction, then refinement after table/image assignment. Available across all 9 language bindings (Python, TypeScript, Ruby, Java, Go, C#, PHP, Elixir, WASM). Closes #378.

PaddleOCR Backend

  • PaddleOCR backend via ONNX Runtime: New OCR backend (kreuzberg-paddle-ocr) using PaddlePaddle's PP-OCRv4 models converted to ONNX format, run via ONNX Runtime. Supports 6 languages (English, Chinese, Japanese, Korean, German, French) with automatic model downloading and caching. Provides superior CJK recognition compared to Tesseract.
  • PaddleOCR support in all bindings: Available across Python, Rust, TypeScript/Node.js, Go, Java, PHP, Ruby, C#, and Elixir bindings via the paddle-ocr feature flag.
  • PaddleOCR CLI support: The kreuzberg-cli binary supports --ocr-backend paddle-ocr for PaddleOCR extraction.

Unified OCR Element Output

  • Structured OCR element data: Extraction results now include OcrElement data with bounding geometry (rectangles and quadrilaterals), per-element confidence scores, rotation information, and hierarchical levels (word, line, block, page). Available from both PaddleOCR and Tesseract backends.

Shared ONNX Runtime Discovery

  • ort_discovery module: Finds ONNX Runtime shared libraries across platforms, shared between PaddleOCR and future ONNX-based backends.

Document Structure Output

  • DocumentStructure support across all bindings: Added structured document output with include_document_structure configuration option across Python, TypeScript/Node.js, Go, Java, PHP, Ruby, C#, Elixir, and WASM bindings.

Native DOC/PPT Extraction

  • OLE/CFB-based extraction: Added native DOC and PPT extraction via OLE/CFB binary parsing. Legacy Office formats no longer require any external tools.

musl Linux Support

  • Re-enabled musl targets: Added x86_64-unknown-linux-musl and aarch64-unknown-linux-musl targets for CLI binaries, Python wheels (musllinux), and Node.js native bindings. Resolves glibc 2.38+ requirement for prebuilt CLI binaries on older distros like Ubuntu 22.04 (#364).

Fixed

MSG Extraction Hang on Large Attachments (#372)

  • Fixed .msg (Outlook) extraction hanging indefinitely on files with large attachments. Replaced the msg_parser crate with direct OLE/CFB parsing using the cfb crate — attachment binary data is now read directly without hex-encoding overhead.

... (truncated)

Commits
  • 990a4ac chore: release v4.3.1
  • fa4d2e4 fix(elixir): update checksums for 4.3.0 precompiled NIF binaries
  • 7e67b9a fix: enable wasm_js feature for getrandom 0.3.4 transitive deps
  • a51ea30 fix: pin lzma-rust2 to 0.15.7 for crc compatibility
  • 29b38f5 chore: update all dependencies across all language ecosystems
  • 0df39d5 fix: correct uv sync syntax for Python update task
  • f197bf2 fix: correct YAML syntax in elixir update task
  • dce2db8 feat: enhance all language update tasks to upgrade to latest major versions
  • 4888236 feat: enhance PHP update task to upgrade to latest major versions
  • 5385fdf fix: upgrade PHPUnit to ^12.5 to match e2e/php tests
  • Additional commits viewable in compare view

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Bumps [kreuzberg](https://github.com/kreuzberg-dev/kreuzberg) from 4.2.9 to 4.3.1.
- [Release notes](https://github.com/kreuzberg-dev/kreuzberg/releases)
- [Changelog](https://github.com/kreuzberg-dev/kreuzberg/blob/main/CHANGELOG.md)
- [Commits](kreuzberg-dev/kreuzberg@v4.2.9...v4.3.1)

---
updated-dependencies:
- dependency-name: kreuzberg
  dependency-version: 4.3.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
@dependabot dependabot Bot added dependencies Pull requests that update a dependency file python Pull requests that update python code labels Feb 13, 2026
@dependabot @github
Copy link
Copy Markdown
Contributor Author

dependabot Bot commented on behalf of github Feb 20, 2026

Superseded by #71.

@dependabot dependabot Bot closed this Feb 20, 2026
@dependabot dependabot Bot deleted the dependabot/uv/kreuzberg-4.3.1 branch February 20, 2026 00:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file python Pull requests that update python code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants