Skip to content

Constrain server data file paths#1423

Open
fallintoplace wants to merge 1 commit into
NVIDIA:mainfrom
fallintoplace:fix/server-data-file-path-boundary
Open

Constrain server data file paths#1423
fallintoplace wants to merge 1 commit into
NVIDIA:mainfrom
fallintoplace:fix/server-data-file-path-boundary

Conversation

@fallintoplace

Copy link
Copy Markdown

Summary

The local-file request path for cuopt-data-file now enforces the documented CUOPT_DATA_DIR boundary.

Previously the server joined CUOPT_DATA_DIR with the header value and only checked existence. Absolute paths and .. segments could resolve outside the configured data directory, and symlinks could do the same.

This change rejects absolute input paths, resolves both the configured root and requested file with realpath, verifies the result stays under CUOPT_DATA_DIR, and requires the target to be a regular file.

Validation

  • python3 -m py_compile python/cuopt_server/cuopt_server/webserver.py python/cuopt_server/cuopt_server/tests/test_file_path_validation.py
  • git diff --check
  • uvx --with ruff ruff check python/cuopt_server/cuopt_server/webserver.py python/cuopt_server/cuopt_server/tests/test_file_path_validation.py
  • PYTHONPATH=python/cuopt_server uvx --with 'pytest<9.0' --with fastapi --with jsonref==1.1.0 --with msgpack==1.1.2 --with msgpack-numpy==0.4.8 --with psutil --with uvicorn --with numpy --with pandas pytest python/cuopt_server/cuopt_server/tests/test_file_path_validation.py -q

The pytest run passed with existing Pydantic deprecation warnings during server import.

@fallintoplace fallintoplace requested a review from a team as a code owner June 10, 2026 21:42
@copy-pr-bot

copy-pr-bot Bot commented Jun 10, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai

coderabbitai Bot commented Jun 10, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

This PR rewrites the validate_file_path function to enforce CUOPT data directory containment, rejecting absolute paths and directory-traversal/symlink-based escapes, with comprehensive test coverage verifying all validation scenarios and error conditions.

Changes

File Path Validation Security

Layer / File(s) Summary
Path validation implementation
python/cuopt_server/cuopt_server/webserver.py
validate_file_path now validates CUOPT_DATA_DIR is configured, rejects absolute input paths, resolves real paths and enforces containment within the data directory using os.path.commonpath to prevent directory traversal and symlink escapes, confirms the resolved path is an existing file, and raises HTTP 400 with specific, clear error messages.
Path validation test suite
python/cuopt_server/cuopt_server/tests/test_file_path_validation.py
Autouse fixture saves and restores settings state; four test cases verify: valid relative paths within data directory return the absolute file path; absolute paths are rejected; parent-directory traversal escapes are rejected; symlinks resolving outside the data directory are rejected. All rejections assert HTTP 400 status and expected error message content.

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: constraining server data file paths to enforce CUOPT_DATA_DIR boundaries, which is the core objective of the PR.
Description check ✅ Passed The description is directly related to the changeset, explaining the security issue, the solution implemented, and validation steps performed.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
python/cuopt_server/cuopt_server/webserver.py (1)

1047-1052: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

The boundary check is still bypassable via TOCTOU.

Line 1051 validates the pathname, but Lines 1142-1154 enqueue only the string for later use in another worker. If CUOPT_DATA_DIR is writable by an untrusted process, the validated file can be replaced with a different file or symlink after the check and before the solver opens it, which reintroduces the escape this PR is trying to close. Open the file here and pass an fd/bytes, or re-resolve and validate again at the actual open site.

Also applies to: 1142-1154

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@python/cuopt_server/cuopt_server/webserver.py` around lines 1047 - 1052, The
code validates the pathname via validate_file_path(cuopt_data_file) and then
later enqueues only the string (file_path), leaving a TOCTOU window if
CUOPT_DATA_DIR is writable; fix by opening the file immediately after
validate_file_path (e.g., obtain a file descriptor or read bytes) and pass that
safe handle/data into the queue instead of the filename, or if you must enqueue
a path, re-resolve and re-run validate_file_path at the actual open site (the
worker that processes the queue, i.e., where you currently handle lines
~1142-1154) and ensure you use openat with the directory fd or verify inode/dev
after open to prevent symlink/race attacks.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@python/cuopt_server/cuopt_server/tests/test_file_path_validation.py`:
- Around line 22-74: Add tests in test_file_path_validation.py to cover the two
missing branches in webserver.validate_file_path: (1) the CUOPT_DATA_DIR-unset
path by clearing/unsetting settings.set_data_dir (or not calling it) and calling
validate_file_path to assert it raises the expected HTTPException/status and
message about CUOPT_DATA_DIR being unset; and (2) the "must be a regular file"
path by creating a non-regular target inside the data dir (e.g., a subdirectory
or FIFO) and calling validate_file_path on that name to assert it raises
HTTPException with the "regular file" (or similar) detail. Use the existing test
helpers and exception assertions (pytest.raises and exc_info.value.detail) and
reference validate_file_path and settings.set_data_dir so the new tests exercise
those branches.

In `@python/cuopt_server/cuopt_server/webserver.py`:
- Around line 216-220: The log call leaks the resolved internal filesystem path
via file_path; update the error logging in webserver.py so it does not include
file_path (use cuopt_data_file or a generic message instead) and ensure any
HTTPException detail or other logs never expose the resolved server path; locate
the check using file_path and cuopt_data_file and replace the
logging.error(f"File path '{file_path}'...") with a safe message that references
only cuopt_data_file or a non-path generic string.

---

Outside diff comments:
In `@python/cuopt_server/cuopt_server/webserver.py`:
- Around line 1047-1052: The code validates the pathname via
validate_file_path(cuopt_data_file) and then later enqueues only the string
(file_path), leaving a TOCTOU window if CUOPT_DATA_DIR is writable; fix by
opening the file immediately after validate_file_path (e.g., obtain a file
descriptor or read bytes) and pass that safe handle/data into the queue instead
of the filename, or if you must enqueue a path, re-resolve and re-run
validate_file_path at the actual open site (the worker that processes the queue,
i.e., where you currently handle lines ~1142-1154) and ensure you use openat
with the directory fd or verify inode/dev after open to prevent symlink/race
attacks.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: cadfe213-c21f-4a01-af0b-1bfba8893d48

📥 Commits

Reviewing files that changed from the base of the PR and between c99a5ce and 0c5676c.

📒 Files selected for processing (2)
  • python/cuopt_server/cuopt_server/tests/test_file_path_validation.py
  • python/cuopt_server/cuopt_server/webserver.py

Comment on lines +22 to +74
def test_validate_file_path_returns_file_if_relative_path_stays_in_data_dir(
tmp_path,
):
data_dir = tmp_path / "data"
data_dir.mkdir()
data_file = data_dir / "input.json"
data_file.write_text("{}", encoding="utf-8")
settings.set_data_dir(str(data_dir))

assert webserver.validate_file_path("input.json") == str(data_file)


def test_validate_file_path_rejects_absolute_path(tmp_path):
data_dir = tmp_path / "data"
data_dir.mkdir()
outside_file = tmp_path / "input.json"
outside_file.write_text("{}", encoding="utf-8")
settings.set_data_dir(str(data_dir))

with pytest.raises(HTTPException) as exc_info:
webserver.validate_file_path(str(outside_file))

assert exc_info.value.status_code == 400
assert "relative to CUOPT_DATA_DIR" in exc_info.value.detail


def test_validate_file_path_rejects_parent_directory_escape(tmp_path):
data_dir = tmp_path / "data"
data_dir.mkdir()
outside_file = tmp_path / "input.json"
outside_file.write_text("{}", encoding="utf-8")
settings.set_data_dir(str(data_dir))

with pytest.raises(HTTPException) as exc_info:
webserver.validate_file_path("../input.json")

assert exc_info.value.status_code == 400
assert "stay inside CUOPT_DATA_DIR" in exc_info.value.detail


def test_validate_file_path_rejects_symlink_escape(tmp_path):
data_dir = tmp_path / "data"
data_dir.mkdir()
outside_file = tmp_path / "input.json"
outside_file.write_text("{}", encoding="utf-8")
os.symlink(outside_file, data_dir / "linked-input.json")
settings.set_data_dir(str(data_dir))

with pytest.raises(HTTPException) as exc_info:
webserver.validate_file_path("linked-input.json")

assert exc_info.value.status_code == 400
assert "stay inside CUOPT_DATA_DIR" in exc_info.value.detail

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Cover the remaining new validation branches.

This suite never exercises the CUOPT_DATA_DIR-unset branch or the new "must be a regular file" branch. A regression in either path would still leave this PR green even though both are part of the new validator behavior.

Suggested tests
+def test_validate_file_path_rejects_when_data_dir_is_unset():
+    settings.set_data_dir("")
+
+    with pytest.raises(HTTPException) as exc_info:
+        webserver.validate_file_path("input.json")
+
+    assert exc_info.value.status_code == 400
+    assert "data directory not set" in exc_info.value.detail
+
+
+def test_validate_file_path_rejects_directory_inside_data_dir(tmp_path):
+    data_dir = tmp_path / "data"
+    nested_dir = data_dir / "subdir"
+    nested_dir.mkdir(parents=True)
+    settings.set_data_dir(str(data_dir))
+
+    with pytest.raises(HTTPException) as exc_info:
+        webserver.validate_file_path("subdir")
+
+    assert exc_info.value.status_code == 400
+    assert "specified data file does not exist" in exc_info.value.detail

As per coding guidelines, **/*test*.{cpp,py}: "Add test coverage for edge cases (empty, infeasible, unbounded, degenerate) when adding new code paths" and python/**/tests/**: "Regression coverage for fixed bugs".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@python/cuopt_server/cuopt_server/tests/test_file_path_validation.py` around
lines 22 - 74, Add tests in test_file_path_validation.py to cover the two
missing branches in webserver.validate_file_path: (1) the CUOPT_DATA_DIR-unset
path by clearing/unsetting settings.set_data_dir (or not calling it) and calling
validate_file_path to assert it raises the expected HTTPException/status and
message about CUOPT_DATA_DIR being unset; and (2) the "must be a regular file"
path by creating a non-regular target inside the data dir (e.g., a subdirectory
or FIFO) and calling validate_file_path on that name to assert it raises
HTTPException with the "regular file" (or similar) detail. Use the existing test
helpers and exception assertions (pytest.raises and exc_info.value.detail) and
reference validate_file_path and settings.set_data_dir so the new tests exercise
those branches.

Source: Coding guidelines

Comment on lines +216 to +220
if not os.path.isfile(file_path):
logging.error(f"File path '{file_path}' doesn't exist")
raise HTTPException(
status_code=400,
detail="unable to read "
"optimization data from file %s, %s" % (file_path, str(e)),
detail=f"specified data file does not exist: {cuopt_data_file}",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid logging the resolved server path for missing files.

Line 217 logs file_path, which includes the fully resolved CUOPT_DATA_DIR path for a user-controlled miss. That leaks internal filesystem layout into server logs. Log cuopt_data_file or a generic message instead.

Suggested change
-        logging.error(f"File path '{file_path}' doesn't exist")
+        logging.error(
+            "Requested cuopt data file '%s' does not exist inside CUOPT_DATA_DIR",
+            cuopt_data_file,
+        )

As per coding guidelines, python/cuopt_server/**: "No credential or internal-path leakage in error responses or logs".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@python/cuopt_server/cuopt_server/webserver.py` around lines 216 - 220, The
log call leaks the resolved internal filesystem path via file_path; update the
error logging in webserver.py so it does not include file_path (use
cuopt_data_file or a generic message instead) and ensure any HTTPException
detail or other logs never expose the resolved server path; locate the check
using file_path and cuopt_data_file and replace the logging.error(f"File path
'{file_path}'...") with a safe message that references only cuopt_data_file or a
non-path generic string.

Source: Coding guidelines

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant