Skip to content

Add file type validation#13802

Merged
yingfeng merged 6 commits intoinfiniflow:mainfrom
spider-yamet:fix/validation-file-type
Apr 2, 2026
Merged

Add file type validation#13802
yingfeng merged 6 commits intoinfiniflow:mainfrom
spider-yamet:fix/validation-file-type

Conversation

@spider-yamet
Copy link
Copy Markdown
Contributor

@spider-yamet spider-yamet commented Mar 26, 2026

What problem does this PR solve?

This PR fixes WebDAV sync behavior for unsupported file types (#13795).

Previously, the WebDAV connector selected files primarily by modified time (and size threshold) and could still pass unsupported extensions into the download/document-generation path. This caused unnecessary processing and inconsistent behavior compared with connectors that validate file type earlier.

This change adds extension validation in two places:

  1. Early filter during recursive listing to skip unsupported files before they enter the download flow.
  2. Defensive filter before download/document creation to prevent unsupported files from being processed if any listing edge case slips through.

It also wires allow_images into the WebDAV sync path so image extension handling follows connector policy.

Scope is intentionally limited to WebDAV for a focused bug-fix PR.

Type of change

  • Bug Fix (non-breaking change which fixes an issue)

How was this tested?

  • Manual verification with mixed file types under the configured WebDAV path:
    • supported: .pdf, .txt, .md
    • unsupported: .exe, .bin, .dat
  • Triggered full sync and polling sync.
  • Confirmed unsupported files are skipped before download.
  • Confirmed supported files are still indexed normally.
  • Confirmed image handling follows allow_images setting.

Fixes: #13795

@dosubot dosubot Bot added size:S This PR changes 10-29 lines, ignoring generated files. 🐞 bug Something isn't working, pull request that fix bug. labels Mar 26, 2026
@spider-yamet
Copy link
Copy Markdown
Contributor Author

@yingfeng Would love to hear your opinion on this PR. Thanks

@Magicbook1108 Magicbook1108 added the ci Continue Integration label Mar 26, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 26, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 96.52%. Comparing base (dd52913) to head (410da22).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main   #13802   +/-   ##
=======================================
  Coverage   96.52%   96.52%           
=======================================
  Files          10       10           
  Lines         690      690           
  Branches      108      108           
=======================================
  Hits          666      666           
  Misses          8        8           
  Partials       16       16           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@spider-yamet
Copy link
Copy Markdown
Contributor Author

Would appreciate your feedback @Magicbook1108 @yingfeng :)

@spider-yamet
Copy link
Copy Markdown
Contributor Author

@Magicbook1108 @yingfeng Would appreciate your feedback, hope you can check once this pr passes the ci.

@yingfeng yingfeng merged commit 6b7989b into infiniflow:main Apr 2, 2026
1 check passed
sirj0k3r pushed a commit to sirj0k3r/ragflow that referenced this pull request Apr 2, 2026
### What problem does this PR solve?

This PR fixes WebDAV sync behavior for unsupported file types
([infiniflow#13795](infiniflow#13795)).

Previously, the WebDAV connector selected files primarily by modified
time (and size threshold) and could still pass unsupported extensions
into the download/document-generation path. This caused unnecessary
processing and inconsistent behavior compared with connectors that
validate file type earlier.

This change adds extension validation in two places:

1. **Early filter during recursive listing** to skip unsupported files
before they enter the download flow.
2. **Defensive filter before download/document creation** to prevent
unsupported files from being processed if any listing edge case slips
through.

It also wires `allow_images` into the WebDAV sync path so image
extension handling follows connector policy.

Scope is intentionally limited to WebDAV for a focused bug-fix PR.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

### How was this tested?

- Manual verification with mixed file types under the configured WebDAV
path:
  - supported: `.pdf`, `.txt`, `.md`
  - unsupported: `.exe`, `.bin`, `.dat`
- Triggered full sync and polling sync.
- Confirmed unsupported files are skipped before download.
- Confirmed supported files are still indexed normally.
- Confirmed image handling follows `allow_images` setting.

Fixes: infiniflow#13795
SailedApple1991 pushed a commit to SailedApple1991/greenlaw-ragflow that referenced this pull request Apr 12, 2026
### What problem does this PR solve?

This PR fixes WebDAV sync behavior for unsupported file types
([infiniflow#13795](infiniflow#13795)).

Previously, the WebDAV connector selected files primarily by modified
time (and size threshold) and could still pass unsupported extensions
into the download/document-generation path. This caused unnecessary
processing and inconsistent behavior compared with connectors that
validate file type earlier.

This change adds extension validation in two places:

1. **Early filter during recursive listing** to skip unsupported files
before they enter the download flow.
2. **Defensive filter before download/document creation** to prevent
unsupported files from being processed if any listing edge case slips
through.

It also wires `allow_images` into the WebDAV sync path so image
extension handling follows connector policy.

Scope is intentionally limited to WebDAV for a focused bug-fix PR.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

### How was this tested?

- Manual verification with mixed file types under the configured WebDAV
path:
  - supported: `.pdf`, `.txt`, `.md`
  - unsupported: `.exe`, `.bin`, `.dat`
- Triggered full sync and polling sync.
- Confirmed unsupported files are skipped before download.
- Confirmed supported files are still indexed normally.
- Confirmed image handling follows `allow_images` setting.

Fixes: infiniflow#13795
pennbay pushed a commit to pennbay/ragflow that referenced this pull request Apr 14, 2026
### What problem does this PR solve?

This PR fixes WebDAV sync behavior for unsupported file types
([infiniflow#13795](infiniflow#13795)).

Previously, the WebDAV connector selected files primarily by modified
time (and size threshold) and could still pass unsupported extensions
into the download/document-generation path. This caused unnecessary
processing and inconsistent behavior compared with connectors that
validate file type earlier.

This change adds extension validation in two places:

1. **Early filter during recursive listing** to skip unsupported files
before they enter the download flow.
2. **Defensive filter before download/document creation** to prevent
unsupported files from being processed if any listing edge case slips
through.

It also wires `allow_images` into the WebDAV sync path so image
extension handling follows connector policy.

Scope is intentionally limited to WebDAV for a focused bug-fix PR.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

### How was this tested?

- Manual verification with mixed file types under the configured WebDAV
path:
  - supported: `.pdf`, `.txt`, `.md`
  - unsupported: `.exe`, `.bin`, `.dat`
- Triggered full sync and polling sync.
- Confirmed unsupported files are skipped before download.
- Confirmed supported files are still indexed normally.
- Confirmed image handling follows `allow_images` setting.

Fixes: infiniflow#13795
pennbay pushed a commit to pennbay/ragflow that referenced this pull request Apr 15, 2026
### What problem does this PR solve?

This PR fixes WebDAV sync behavior for unsupported file types
([infiniflow#13795](infiniflow#13795)).

Previously, the WebDAV connector selected files primarily by modified
time (and size threshold) and could still pass unsupported extensions
into the download/document-generation path. This caused unnecessary
processing and inconsistent behavior compared with connectors that
validate file type earlier.

This change adds extension validation in two places:

1. **Early filter during recursive listing** to skip unsupported files
before they enter the download flow.
2. **Defensive filter before download/document creation** to prevent
unsupported files from being processed if any listing edge case slips
through.

It also wires `allow_images` into the WebDAV sync path so image
extension handling follows connector policy.

Scope is intentionally limited to WebDAV for a focused bug-fix PR.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

### How was this tested?

- Manual verification with mixed file types under the configured WebDAV
path:
  - supported: `.pdf`, `.txt`, `.md`
  - unsupported: `.exe`, `.bin`, `.dat`
- Triggered full sync and polling sync.
- Confirmed unsupported files are skipped before download.
- Confirmed supported files are still indexed normally.
- Confirmed image handling follows `allow_images` setting.

Fixes: infiniflow#13795
pennbay pushed a commit to pennbay/ragflow that referenced this pull request Apr 21, 2026
### What problem does this PR solve?

This PR fixes WebDAV sync behavior for unsupported file types
([infiniflow#13795](infiniflow#13795)).

Previously, the WebDAV connector selected files primarily by modified
time (and size threshold) and could still pass unsupported extensions
into the download/document-generation path. This caused unnecessary
processing and inconsistent behavior compared with connectors that
validate file type earlier.

This change adds extension validation in two places:

1. **Early filter during recursive listing** to skip unsupported files
before they enter the download flow.
2. **Defensive filter before download/document creation** to prevent
unsupported files from being processed if any listing edge case slips
through.

It also wires `allow_images` into the WebDAV sync path so image
extension handling follows connector policy.

Scope is intentionally limited to WebDAV for a focused bug-fix PR.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

### How was this tested?

- Manual verification with mixed file types under the configured WebDAV
path:
  - supported: `.pdf`, `.txt`, `.md`
  - unsupported: `.exe`, `.bin`, `.dat`
- Triggered full sync and polling sync.
- Confirmed unsupported files are skipped before download.
- Confirmed supported files are still indexed normally.
- Confirmed image handling follows `allow_images` setting.

Fixes: infiniflow#13795
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🐞 bug Something isn't working, pull request that fix bug. ci Continue Integration size:S This PR changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: WebDAV sync does not filter unsupported files before processing

3 participants