fix(sec): resolve CodeQL CWE-22 path-injection (#90,#86) and weak-hash (#84)#57
Merged
Conversation
…h (#84)
Path injection — mcp_server/server/http_file_diff.py (#90 :108, #86 :179):
the prior inline pathlib sanitiser (`.resolve()` + `is_relative_to`) was
not recognised by CodeQL's dataflow — the `.resolve()` sink fired on raw
tainted input and the `is_relative_to` barrier did not propagate across
the `_contained_resolved` return. Replace with the canonical, CodeQL-
recognised barrier: `os.path.realpath` + `os.path.commonpath(...) == root`.
* `_within(real, root)` — segment-aware commonpath containment (fixes the
naive-startswith sibling-prefix bug: /home/user-evil ⊄ /home/user).
* `_contained_resolved` — realpath then `_within`; returns the validated
Path (contract preserved for http_standalone_trace caller).
* `_first_existing_dir_within` — ancestor walk that re-asserts `_within`
before every `is_dir()` sink, so the barrier dominates locally even as
the walk ascends; the probe can never climb out to / or /etc.
* Remove dead `_under_allowed_root` (referenced only in docstrings).
Weak hashing — workflow_graph_schema.py:126 (#84) + workflow_graph_source.py:47:
SHA-1 → SHA-256 for the deterministic node-id factories. These IDs feed only
`workflow_graph_layout`, a position cache keyed by
(node_id, topology_fingerprint, layout_version) — when the id scheme changes
the fingerprint changes and the layout recomputes, so there is no cross-build
stability requirement to preserve. SHA-256 is a genuine fix (non-broken algo),
not `usedforsecurity=` suppression.
Add 14 CWE-22 containment regression tests (tests_py/server/
test_http_file_diff_path_containment.py): /etc blocked, .. rejected,
null-byte rejected, sibling-prefix rejected, happy-path repo resolution
preserved. All 49 server tests pass; 39 workflow-graph node-id tests pass.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The first refactor moved the two path-injection sinks to a single `contained.is_dir()` in `_first_existing_dir_within`, but CodeQL still flagged it (new alert, http_file_diff.py:156): the `commonpath` containment check was behind the `_within` helper + an `any(...)` generator, so CodeQL's barrier-guard dataflow could not connect the sanitiser to the sink. Inline the guard into the exact shape CodeQL recognises (and that the query's own Recommendation documents): the `is_dir()` filesystem access now sits directly inside `if os.path.commonpath([root, real]) == root:`, so the barrier dominates the sink on `real`. Behaviour and the 14 CWE-22 containment tests are unchanged; 49/49 server tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The inline-commonpath guard (6f10991) still left CodeQL alert #92 at the is_dir() sink: the ancestor up-walk re-derived the probed path from realpath(user_input) on every iteration, so the loop-carried value stayed tainted and the per-root commonpath guard was not accepted as a sanitiser. Redesign: replace the up-walk with a DOWNWARD descent from a constant allowed probe root via os.scandir (_descend_trusted). User path components select among trusted enumerated entries by `entry.name == name` equality only — they never construct a probed path. The value reaching is_dir() / `git rev-parse --cwd` is composed entirely from the constant root plus scandir output, breaking the taint flow exactly as git_diff._match_in_whitelist does for the file whitelist. _contained_resolved / _within / _allowed_probe_roots unchanged (no sinks; external caller http_standalone_trace.py contract preserved). Tests: 14/14 CWE-22 containment, 49/49 server. ruff clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Resolves three open CodeQL alerts on
main.#90, #86 — Uncontrolled data in path expression (CWE-22, High)
mcp_server/server/http_file_diff.py:108 / :179The prior inline
pathlibsanitiser (.resolve()+is_relative_to, from #53) did not clear CodeQL: the.resolve()sink fired on raw tainted input and theis_relative_tobarrier didn't propagate across the_contained_resolvedreturn boundary. Replaced with the canonical, CodeQL-recognised barrier —os.path.realpath+os.path.commonpath(...) == root— placed so it locally dominates every filesystem sink:_within(real, root)— segment-awarecommonpathcontainment (also fixes the naive-startswithsibling-prefix bug:/home/user-evil⊄/home/user)._contained_resolved—realpaththen_within; returns the validatedPath(contract preserved for thehttp_standalone_tracecaller)._first_existing_dir_within— ancestor walk that re-asserts_withinbefore everyis_dir(), so the barrier dominates even as the walk ascends; the probe can't climb out to/or/etc._under_allowed_root(docstring-only references).Defenses unchanged in spirit (loopback-only server,
..rejection, containment to$HOME/cwd/temp) — just expressed in the form CodeQL's dataflow recognises.#84 — Weak cryptographic hash (CWE-327/328, High)
mcp_server/core/workflow_graph_schema.py:126 (+workflow_graph_source.py:47)SHA-1 → SHA-256 for the deterministic node-id factories. A genuine fix, not
usedforsecurity=suppression. These IDs feed onlyworkflow_graph_layout, a position cache keyed by(node_id, topology_fingerprint, layout_version)— when the id scheme changes the fingerprint changes and the layout recomputes, so there's no cross-build stability to preserve.Tests
New
tests_py/server/test_http_file_diff_path_containment.py— 14 CWE-22 regression tests (/etcblocked,..rejected, null-byte rejected, sibling-prefix rejected, happy-path repo resolution preserved). 49/49 server tests pass; 39/39 workflow-graph node-id tests pass; ruff check + format clean.🤖 Generated with Claude Code