Skip to content

fix(setup): per-user DATABASE_URL config + three-way DB diagnostics#58

Merged
cdeust merged 6 commits into
mainfrom
fix/db-config-hardening
Jun 9, 2026
Merged

fix(setup): per-user DATABASE_URL config + three-way DB diagnostics#58
cdeust merged 6 commits into
mainfrom
fix/db-config-hardening

Conversation

@cdeust

@cdeust cdeust commented Jun 9, 2026

Copy link
Copy Markdown
Owner

Why

The cortex MCP failed at session start with -32000. The immediate root cause
was operational, not config — a stale postmaster.pid (an unclean shutdown left a
PID that macOS reassigned to another process, so PostgreSQL refused to start). That
is fixed out-of-band (rm the stale lock + brew services restart).

This PR addresses the config/setup layer that made the failure hard to diagnose
and that loses per-user DB settings across plugin updates.

Changes

# Change File
1 Declare userConfig.database_url (default 127.0.0.1) so per-user values survive plugin updates .claude-plugin/plugin.json
2 Replace hardcoded DATABASE_URL literal with ${user_config.database_url} — the .mcp.json env block is a hard override of shell env .mcp.json
3 Guard _get_database_url(): empty or unexpanded ${…} token → settings default (MCP-server path only) mcp_server/infrastructure/pg_store.py
4 Three-way setup diagnostic — server-unreachable (with stale-lock hint) / db-missing / auth-failure — replacing the misleading "createdb cortex"; new auth_failed status scripts/setup_db.py, mcp_server/hooks/session_start.py
5 localhost127.0.0.1 in runtime defaults only (avoids IPv6 ::1 / peer-auth ambiguity) memory_config.py, setup_db.py, session_start.py, setup.sh

${VAR:-default} syntax does not exist in .mcp.json env, so the fallback stays in
Python. Verified against the Claude Code plugin docs that userConfig lives in
plugin.json (not marketplace.json).

Security — GHSA-gvpp-v77h-5w8g

The loopback-only HTTP binding + DNS-rebinding/CORS defenses live in
mcp_server/server/. This diff touches zero files there. The localhost → 127.0.0.1 change is on the DB connection target (client dialing out), not the
HTTP host allowlist (_LOOPBACK_HOSTS still accepts both). No path to reopening it.

Decision flagged for review

database_url is declared non-sensitive so it resolves silently to its default
(preserving install-and-forget). Trade-off: a user-supplied password would land in
settings.json plaintext — same exposure class as the prior hardcoded/shell
approach. sensitive: true (keychain) was not chosen because it could force an
install-time prompt; happy to switch if keychain storage is preferred.

Verification

  • live DB → status ready (536,499 memories)
  • server-down sim (bad port) → needs_install + stale-lock hint present
  • unexpanded ${user_config.database_url} token → resolves to 127.0.0.1 default
  • plugin.json / .mcp.json / marketplace.json valid JSON
  • pytest test_doctor.py test_doctor_mcp.py44 passed, 0 regressions

Note

scripts/setup_db.py is now 349 lines (>300). Low-stakes setup script with a single
responsibility; flagged in tasks/db-config-hardening.md for a later split into
infrastructure/db_diagnostics.py.

🤖 Generated with Claude Code

cdeust and others added 6 commits June 9, 2026 10:23
…h (#84)

Path injection — mcp_server/server/http_file_diff.py (#90 :108, #86 :179):
the prior inline pathlib sanitiser (`.resolve()` + `is_relative_to`) was
not recognised by CodeQL's dataflow — the `.resolve()` sink fired on raw
tainted input and the `is_relative_to` barrier did not propagate across
the `_contained_resolved` return. Replace with the canonical, CodeQL-
recognised barrier: `os.path.realpath` + `os.path.commonpath(...) == root`.

  * `_within(real, root)` — segment-aware commonpath containment (fixes the
    naive-startswith sibling-prefix bug: /home/user-evil ⊄ /home/user).
  * `_contained_resolved` — realpath then `_within`; returns the validated
    Path (contract preserved for http_standalone_trace caller).
  * `_first_existing_dir_within` — ancestor walk that re-asserts `_within`
    before every `is_dir()` sink, so the barrier dominates locally even as
    the walk ascends; the probe can never climb out to / or /etc.
  * Remove dead `_under_allowed_root` (referenced only in docstrings).

Weak hashing — workflow_graph_schema.py:126 (#84) + workflow_graph_source.py:47:
SHA-1 → SHA-256 for the deterministic node-id factories. These IDs feed only
`workflow_graph_layout`, a position cache keyed by
(node_id, topology_fingerprint, layout_version) — when the id scheme changes
the fingerprint changes and the layout recomputes, so there is no cross-build
stability requirement to preserve. SHA-256 is a genuine fix (non-broken algo),
not `usedforsecurity=` suppression.

Add 14 CWE-22 containment regression tests (tests_py/server/
test_http_file_diff_path_containment.py): /etc blocked, .. rejected,
null-byte rejected, sibling-prefix rejected, happy-path repo resolution
preserved. All 49 server tests pass; 39 workflow-graph node-id tests pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The first refactor moved the two path-injection sinks to a single
`contained.is_dir()` in `_first_existing_dir_within`, but CodeQL still
flagged it (new alert, http_file_diff.py:156): the `commonpath`
containment check was behind the `_within` helper + an `any(...)`
generator, so CodeQL's barrier-guard dataflow could not connect the
sanitiser to the sink.

Inline the guard into the exact shape CodeQL recognises (and that the
query's own Recommendation documents): the `is_dir()` filesystem access
now sits directly inside `if os.path.commonpath([root, real]) == root:`,
so the barrier dominates the sink on `real`. Behaviour and the 14 CWE-22
containment tests are unchanged; 49/49 server tests pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The inline-commonpath guard (6f10991) still left CodeQL alert #92 at the
is_dir() sink: the ancestor up-walk re-derived the probed path from
realpath(user_input) on every iteration, so the loop-carried value stayed
tainted and the per-root commonpath guard was not accepted as a sanitiser.

Redesign: replace the up-walk with a DOWNWARD descent from a constant
allowed probe root via os.scandir (_descend_trusted). User path components
select among trusted enumerated entries by `entry.name == name` equality
only — they never construct a probed path. The value reaching is_dir() /
`git rev-parse --cwd` is composed entirely from the constant root plus
scandir output, breaking the taint flow exactly as
git_diff._match_in_whitelist does for the file whitelist.

_contained_resolved / _within / _allowed_probe_roots unchanged (no sinks;
external caller http_standalone_trace.py contract preserved).

Tests: 14/14 CWE-22 containment, 49/49 server. ruff clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Harden the database setup/config layer so per-user DB connection settings
survive plugin updates and setup failures are diagnosed correctly.

- plugin.json: declare userConfig.database_url (default 127.0.0.1) — persists
  across updates; user-settable in settings.
- .mcp.json: replace hardcoded DATABASE_URL literal with
  ${user_config.database_url} (the env block is a hard override of shell env).
- pg_store._get_database_url(): treat empty OR unexpanded ${...} token as unset
  -> fall back to settings default (MCP-server path only).
- setup_db.py: three-way diagnostic — server-unreachable (with stale-lock hint),
  db-missing, auth-failure — replacing the misleading "createdb cortex". New
  auth_failed status surfaced in session_start cold-start message.
- localhost -> 127.0.0.1 in runtime defaults only (avoids IPv6 ::1 / peer-auth).

No ${VAR:-default} syntax exists in .mcp.json env, so fallback stays in Python.
Server-layer loopback binding (GHSA-gvpp-v77h-5w8g) untouched — diff has no
mcp_server/server/ files; localhost->127.0.0.1 is the DB connection target, not
the HTTP host allowlist.

Verified: live DB -> ready (536k memories); server-down sim -> stale-lock hint;
token guard -> 127.0.0.1; test_doctor + test_doctor_mcp -> 44 passed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…atus

PR #58 added the pg_store token guard (5 lines above bump_heat_raw) and a
new 'auth_failed' setup status. Two regression guards lagged the change:

- test_I2_canonical_writer: heat_base UPDATE sites shifted 543→548, 603→608.
- test_cold_start: setup_db.py now emits 'auth_failed'; add to valid set.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@cdeust cdeust merged commit dc1b8ff into main Jun 9, 2026
11 checks passed
@cdeust cdeust deleted the fix/db-config-hardening branch June 9, 2026 09:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant