Skip to content

Commit f0160c3

Browse files
authored
Fix PR Metadata: SSRF Guard Across All api_url Configs (#5)
```json { "title": "Close SSRF gaps in all api_url configs and block metadata IP encodings", "body": "## Summary\n\nConsolidate `api_url` SSRF validation into a single shared helper and apply it to all six SDK configs that accept an `api_url` field — previously only the storage and client configs validated the URL while the three provider configs and `EntitiesClientConfig` accepted any string.\n\n## Changes\n\n- Add `atomicmemory/core/url.py` with a single `validate_api_url` helper that centralizes scheme/host checks and SSRF defense.\n- Wire the shared validator into all six configs: `AtomicMemoryClientConfig`, `StorageClientConfig`, `EntitiesClientConfig`, `AtomicMemoryProviderConfig`, `HindsightProviderConfig`, and `Mem0ProviderConfig`.\n- Switch to posture B (match Node SDK): loopback/private/reserved IP literals are **allowed by default** (the SDK routinely connects to local and self-hosted cores); link-local and cloud-metadata addresses (`169.254.169.254`, `fe80::/10`) are always blocked.\n- Canonicalize IPv4-mapped IPv6 addresses (`::ffff:169.254.169.254` → `169.254.169.254`) so the metadata block is deterministic across Python 3.10/3.11.\n- Canonicalize legacy IPv4 encodings (decimal `2852039166`, hex `0xA9FEA9FE`, dotted-octal `0251.0376.0251.0376`) via `socket.inet_aton` so they cannot bypass the link-local block.\n- Add `allowPrivateNetworks` field (default `True`) on every config; setting it to `False` also rejects loopback/private/reserved literals.\n- Add `tests/core/test_url.py` — 12 parametrized cases covering allowed, always-blocked, strict-mode, IPv4-mapped IPv6, whitespace stripping, and encoded-address bypass scenarios.\n- Add `tests/providers/test_config_ssrf.py` — integration coverage for each config surface plus a **reflective enumeration test** (`test_every_api_url_config_blocks_imds`) that auto-discovers every `BaseModel` subclass with an `api_url` field and fails if any future config omits the guard.\n- Bump package version to 1.1.2.\n\n## Why\n\nThe three provider configs and `EntitiesClientConfig` had no URL validation at all, so a crafted `apiUrl` could reach the AWS IMDS endpoint (`169.254.169.254`) or its non-canonical encodings (decimal, hex, octal, IPv4-mapped IPv6). The validator was also applied per-surface with duplicated logic, so adding a new config would silently inherit no protection. The shared helper closes both gaps at one chokepoint.\n\n## Validation\n\n- `uv run pytest tests/core/test_url.py tests/providers/test_config_ssrf.py` — all new tests pass.\n- The reflective enumeration test asserts ≥ 6 configs are discovered and each rejects the IMDS literal; it will fail automatically if a future config with an `api_url` field omits the validator.\n- `uv run mypy atomicmemory --strict` and `uv run ruff check .` both pass clean." } ```
1 parent 842b205 commit f0160c3

14 files changed

Lines changed: 447 additions & 36 deletions

File tree

CHANGELOG.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,23 @@ All notable changes to `atomicmemory` will be documented in this file.
44

55
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
66

7+
## [1.1.2] - 2026-06-15
8+
9+
### Security
10+
- `api_url` is now validated against SSRF across all six SDK configs (the three
11+
provider configs, the storage/client configs, and `EntitiesClientConfig`) via
12+
one shared validator. It always rejects link-local / cloud-metadata addresses
13+
(AWS IMDS `169.254.169.254`, IPv6 `fe80::/10`) — including their decimal
14+
(`http://2852039166/`), hex, octal, short-form, and IPv4-mapped-IPv6
15+
(`::ffff:169.254.169.254`) encodings, which are canonicalized so they cannot
16+
bypass the guard. Loopback / private / reserved IP literals remain allowed by
17+
default — the SDK routinely connects to local and self-hosted cores — and are
18+
rejected only when you opt into strict mode with `allowPrivateNetworks=False`.
19+
Hostnames (incl. the `localhost` default) are intentionally not DNS-resolved
20+
at config time. This matches the Node SDK's posture for cross-SDK parity, and
21+
a reflective enumeration test fails if a new `api_url` config omits the guard.
22+
(FailSafe AGNT-PY-001.)
23+
724
## [1.1.1] - 2026-06-11
825

926
### Added

CLAUDE.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,7 @@ Before changing code, read the relevant local files first:
4343
- Snake_case for Python attributes; Pydantic `Field(alias="apiUrl")` aliases preserve TS camelCase wire format.
4444
- Keep public API behavior aligned with `atomicmemory-sdk` where both SDKs expose the same concept.
4545
- Prefer integration tests with a real HTTP path for client behavior; use mocks only for narrow transport errors.
46+
- **Cross-cutting controls live at one chokepoint, enumerated and bypass-tested.** When a security/correctness rule must hold for *all* of a category (every config with an `api_url`, every input reaching a sink), apply it through one shared helper, not per-surface — and back it with a **reflective enumeration test** that fails when a new surface lacks it (e.g. `test_every_api_url_config_blocks_imds` discovers every `BaseModel` with an `api_url` field). Tests must exercise the **adversarial bypass** (the encoding, the key, the header), not just the canonical example, and validate against the **downstream consumer's interpretation** (the resolver, Postgres, the server), not your own parser. This is the gap that caused AGNT-PY-001's missed `EntitiesClientConfig` and numeric-IP bypass.
4647

4748
## Pre-commit verification
4849

atomicmemory/_version.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,4 +4,4 @@
44
__version__: The current package version string (PEP 440).
55
"""
66

7-
__version__ = "1.1.1"
7+
__version__ = "1.1.2"

atomicmemory/client/atomic_memory_client.py

Lines changed: 5 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -10,14 +10,14 @@
1010

1111
from types import TracebackType
1212
from typing import Any
13-
from urllib.parse import urlparse
1413

1514
from pydantic import BaseModel, ConfigDict, Field, SecretStr, field_validator, model_validator
1615
from pydantic import ValidationError as PydanticValidationError
1716

1817
from atomicmemory.client.async_memory_client import AsyncMemoryClient
1918
from atomicmemory.client.memory_client import MemoryClient, MemoryProviderConfigs
2019
from atomicmemory.core.errors import ConfigError
20+
from atomicmemory.core.url import validate_api_url
2121
from atomicmemory.core.validation import sanitized_pydantic_errors
2222
from atomicmemory.entities import AsyncEntitiesClient, EntitiesClient
2323
from atomicmemory.entities.client import EntitiesClientConfig
@@ -48,17 +48,11 @@ class AtomicMemoryClientConfig(BaseModel):
4848
api_key: SecretStr = Field(alias="apiKey")
4949
user_id: str = Field(alias="userId")
5050
timeout_seconds: float = Field(default=30.0, alias="timeoutSeconds")
51+
allow_private_networks: bool = Field(default=True, alias="allowPrivateNetworks")
52+
"""Permit loopback/private/reserved IP literals in ``api_url`` (default True;
53+
set False to harden). Link-local / cloud-metadata stay blocked regardless."""
5154
memory: MemoryNamespaceConfig | None = None
5255

53-
@field_validator("api_url")
54-
@classmethod
55-
def _validate_api_url(cls, value: str) -> str:
56-
stripped = value.strip()
57-
parsed = urlparse(stripped)
58-
if parsed.scheme not in {"http", "https"} or not parsed.netloc:
59-
raise ValueError("api_url must be an http(s) URL")
60-
return stripped
61-
6256
@field_validator("api_key", mode="before")
6357
@classmethod
6458
def _validate_api_key(cls, value: object) -> object:
@@ -88,6 +82,7 @@ def _validate_timeout(cls, value: float) -> float:
8882
def _require_non_empty(self) -> AtomicMemoryClientConfig:
8983
if not self.api_url:
9084
raise ValueError("api_url is required")
85+
self.api_url = validate_api_url(self.api_url, allow_private_networks=self.allow_private_networks)
9186
# api_key is always truthy as SecretStr; empty string rejected by _validate_api_key above.
9287
if not self.user_id:
9388
raise ValueError("user_id is required")

atomicmemory/core/url.py

Lines changed: 114 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,114 @@
1+
"""Shared ``api_url`` validation used by every SDK config boundary.
2+
3+
Centralizes the rule that an ``api_url`` must be an http(s) URL with a
4+
host, and adds SSRF defense: link-local / cloud-metadata addresses
5+
(notably the ``169.254.169.254`` IMDS endpoint) are always rejected.
6+
Loopback / private / reserved IP literals are *allowed by default* — the
7+
SDK routinely connects to local and self-hosted cores — and only rejected
8+
when the caller opts into strict mode via ``allow_private_networks=False``.
9+
This mirrors the Node SDK's posture for cross-SDK parity.
10+
11+
Hostnames are intentionally NOT resolved here. Config-time DNS resolution
12+
would be slow, racy, and still bypassable via DNS rebinding, so a literal
13+
hostname (including ``localhost`` and ``metadata.google.internal``) passes
14+
the scheme/host checks. Deployments that must defend against
15+
hostname-based metadata access should pin ``api_url`` to a vetted host.
16+
"""
17+
18+
from __future__ import annotations
19+
20+
import ipaddress
21+
import socket
22+
from urllib.parse import urlparse
23+
24+
_ALLOWED_SCHEMES = frozenset({"http", "https"})
25+
26+
27+
def _parse_ip(host: str) -> ipaddress.IPv4Address | ipaddress.IPv6Address | None:
28+
"""Return the parsed IP when ``host`` is an IP literal, else ``None``.
29+
30+
Covers canonical literals AND the legacy IPv4 encodings the C resolver
31+
(``inet_aton``/``getaddrinfo``) still accepts — decimal (``2852039166``),
32+
hex (``0xA9FEA9FE``), octal (``0251.0376.0251.0376``) and short forms
33+
(``127.1``). Without this they slip through as un-resolved "hostnames" and
34+
defeat the SSRF checks, since the HTTP client resolves them to the real
35+
address (e.g. ``http://2852039166/`` → ``169.254.169.254``).
36+
37+
Args:
38+
host: The URL host component.
39+
40+
Returns:
41+
The parsed/canonicalized IP address, or ``None`` when ``host`` is a
42+
genuine (non-numeric) hostname.
43+
"""
44+
try:
45+
return _collapse_mapped(ipaddress.ip_address(host))
46+
except ValueError:
47+
pass
48+
try:
49+
return ipaddress.IPv4Address(socket.inet_aton(host))
50+
except (OSError, ValueError):
51+
return None
52+
53+
54+
def _collapse_mapped(
55+
ip: ipaddress.IPv4Address | ipaddress.IPv6Address,
56+
) -> ipaddress.IPv4Address | ipaddress.IPv6Address:
57+
"""Reclassify an IPv4-mapped IPv6 address (``::ffff:a.b.c.d``) as its IPv4.
58+
59+
``IPv6Address.is_link_local`` only delegates to the embedded IPv4 on
60+
newer CPython, so on Python 3.10/3.11 ``::ffff:169.254.169.254`` would
61+
otherwise read as a benign global IPv6 and bypass the metadata block.
62+
Collapsing to the embedded IPv4 makes classification deterministic
63+
across all supported interpreters and matches the Node SDK.
64+
65+
Args:
66+
ip: A parsed IP literal.
67+
68+
Returns:
69+
The embedded IPv4 when ``ip`` is IPv4-mapped, otherwise ``ip``.
70+
"""
71+
mapped = getattr(ip, "ipv4_mapped", None)
72+
return mapped if mapped is not None else ip
73+
74+
75+
def validate_api_url(value: str, *, allow_private_networks: bool = True) -> str:
76+
"""Validate and normalize an ``api_url``, guarding against SSRF.
77+
78+
Args:
79+
value: The candidate URL.
80+
allow_private_networks: Defaults to ``True`` — loopback / private /
81+
reserved IP literals are permitted because the SDK routinely
82+
connects to local and self-hosted cores. Pass ``False`` to reject
83+
those too (hardened multi-tenant deployments). Link-local /
84+
cloud-metadata addresses are rejected regardless of this flag.
85+
86+
Returns:
87+
The whitespace-stripped URL.
88+
89+
Raises:
90+
ValueError: If the scheme is not http(s), the host is missing, or
91+
the host is a disallowed IP literal.
92+
"""
93+
stripped = value.strip()
94+
parsed = urlparse(stripped)
95+
if parsed.scheme not in _ALLOWED_SCHEMES or not parsed.netloc:
96+
raise ValueError("api_url must be an http(s) URL")
97+
host = parsed.hostname
98+
if not host:
99+
raise ValueError("api_url must include a host")
100+
101+
ip = _parse_ip(host)
102+
if ip is None:
103+
return stripped
104+
105+
if ip.is_link_local:
106+
raise ValueError("api_url must not target a link-local or cloud-metadata address")
107+
if not allow_private_networks and (
108+
ip.is_loopback or ip.is_private or ip.is_reserved or ip.is_multicast or ip.is_unspecified
109+
):
110+
raise ValueError(
111+
"api_url must not target a loopback, private, or reserved address; "
112+
"set allow_private_networks=True to permit it"
113+
)
114+
return stripped

atomicmemory/entities/client.py

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -25,12 +25,13 @@
2525
import json
2626
from types import TracebackType
2727
from typing import Any, TypeVar, cast
28-
from urllib.parse import quote, urlencode, urlparse
28+
from urllib.parse import quote, urlencode
2929

3030
import httpx
31-
from pydantic import BaseModel, ConfigDict, Field, SecretStr, field_validator
31+
from pydantic import BaseModel, ConfigDict, Field, SecretStr, field_validator, model_validator
3232
from pydantic import ValidationError as PydanticValidationError
3333

34+
from atomicmemory.core.url import validate_api_url
3435
from atomicmemory.entities.errors import EntitiesClientError
3536
from atomicmemory.entities.types import (
3637
DeleteEntityResult,
@@ -63,15 +64,14 @@ class EntitiesClientConfig(BaseModel):
6364
api_url: str = Field(alias="apiUrl")
6465
api_key: SecretStr = Field(alias="apiKey")
6566
timeout_seconds: float = Field(default=30.0, alias="timeoutSeconds")
67+
allow_private_networks: bool = Field(default=True, alias="allowPrivateNetworks")
68+
"""Permit loopback/private/reserved IP literals in ``api_url`` (default True;
69+
set False to harden). Link-local / cloud-metadata stay blocked regardless."""
6670

67-
@field_validator("api_url")
68-
@classmethod
69-
def _validate_api_url(cls, value: str) -> str:
70-
stripped = value.strip()
71-
parsed = urlparse(stripped)
72-
if parsed.scheme not in {"http", "https"} or not parsed.netloc:
73-
raise ValueError("api_url must be an http(s) URL")
74-
return stripped
71+
@model_validator(mode="after")
72+
def _validate_api_url(self) -> EntitiesClientConfig:
73+
self.api_url = validate_api_url(self.api_url, allow_private_networks=self.allow_private_networks)
74+
return self
7575

7676
@field_validator("api_key", mode="before")
7777
@classmethod

atomicmemory/providers/atomicmemory/config.py

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,9 @@
55

66
from __future__ import annotations
77

8-
from pydantic import BaseModel, ConfigDict, Field
8+
from pydantic import BaseModel, ConfigDict, Field, model_validator
99

10+
from atomicmemory.core.url import validate_api_url
1011
from atomicmemory.memory.meta_fact_filter import MetaFactFilterConfig
1112

1213
ATOMICMEMORY_DEFAULT_TIMEOUT_SECONDS: float = 30.0
@@ -41,3 +42,12 @@ class AtomicMemoryProviderConfig(BaseModel):
4142

4243
meta_fact_filter: MetaFactFilterConfig | None = Field(default=None, alias="metaFactFilter")
4344
"""Optional opt-in post-retrieval meta-fact filter. Off when unset."""
45+
46+
allow_private_networks: bool = Field(default=True, alias="allowPrivateNetworks")
47+
"""Permit loopback/private/reserved IP literals in ``api_url`` (default True;
48+
set False to harden). Link-local / cloud-metadata stay blocked regardless."""
49+
50+
@model_validator(mode="after")
51+
def _validate_api_url(self) -> AtomicMemoryProviderConfig:
52+
self.api_url = validate_api_url(self.api_url, allow_private_networks=self.allow_private_networks)
53+
return self

atomicmemory/providers/hindsight/config.py

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,8 +9,9 @@
99
from collections.abc import Awaitable, Callable
1010
from typing import Any, Literal
1111

12-
from pydantic import BaseModel, ConfigDict, Field
12+
from pydantic import BaseModel, ConfigDict, Field, model_validator
1313

14+
from atomicmemory.core.url import validate_api_url
1415
from atomicmemory.memory.types import IngestInput, Scope
1516

1617
HindsightRecallBudget = Literal["low", "mid", "high"]
@@ -38,6 +39,14 @@ class HindsightProviderConfig(BaseModel):
3839
project_id: str = Field(default=HINDSIGHT_DEFAULT_PROJECT_ID, alias="projectId")
3940
default_budget: HindsightRecallBudget | None = Field(default=None, alias="defaultBudget")
4041
default_max_tokens: int | None = Field(default=None, alias="defaultMaxTokens")
42+
allow_private_networks: bool = Field(default=True, alias="allowPrivateNetworks")
43+
"""Permit loopback/private/reserved IP literals in ``api_url`` (default True;
44+
set False to harden). Link-local / cloud-metadata stay blocked regardless."""
45+
46+
@model_validator(mode="after")
47+
def _validate_api_url(self) -> HindsightProviderConfig:
48+
self.api_url = validate_api_url(self.api_url, allow_private_networks=self.allow_private_networks)
49+
return self
4150

4251

4352
class HindsightRetainResponse(BaseModel):

atomicmemory/providers/mem0/config.py

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,9 @@
55

66
from __future__ import annotations
77

8-
from pydantic import BaseModel, ConfigDict, Field
8+
from pydantic import BaseModel, ConfigDict, Field, model_validator
9+
10+
from atomicmemory.core.url import validate_api_url
911

1012
MEM0_DEFAULT_TIMEOUT_SECONDS: float = 30.0
1113
MEM0_DEFAULT_PATH_PREFIX: str = "/v1"
@@ -49,3 +51,12 @@ class Mem0ProviderConfig(BaseModel):
4951

5052
org_id: str | None = Field(default=None, alias="orgId")
5153
project_id: str | None = Field(default=None, alias="projectId")
54+
55+
allow_private_networks: bool = Field(default=True, alias="allowPrivateNetworks")
56+
"""Permit loopback/private/reserved IP literals in ``api_url`` (default True;
57+
set False to harden). Link-local / cloud-metadata stay blocked regardless."""
58+
59+
@model_validator(mode="after")
60+
def _validate_api_url(self) -> Mem0ProviderConfig:
61+
self.api_url = validate_api_url(self.api_url, allow_private_networks=self.allow_private_networks)
62+
return self

atomicmemory/storage/types.py

Lines changed: 6 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -8,10 +8,11 @@
88
from __future__ import annotations
99

1010
from typing import Any, Literal
11-
from urllib.parse import urlparse
1211

1312
from pydantic import BaseModel, ConfigDict, Field, SecretStr, field_validator, model_validator
1413

14+
from atomicmemory.core.url import validate_api_url
15+
1516
StorageArtifactStatus = Literal[
1617
"stored",
1718
"pending",
@@ -42,15 +43,9 @@ class StorageClientConfig(BaseModel):
4243
api_key: SecretStr = Field(alias="apiKey")
4344
user_id: str = Field(alias="userId")
4445
timeout_seconds: float = Field(default=30.0, alias="timeoutSeconds")
45-
46-
@field_validator("api_url")
47-
@classmethod
48-
def _validate_api_url(cls, value: str) -> str:
49-
stripped = value.strip()
50-
parsed = urlparse(stripped)
51-
if parsed.scheme not in {"http", "https"} or not parsed.netloc:
52-
raise ValueError("api_url must be an http(s) URL")
53-
return stripped
46+
allow_private_networks: bool = Field(default=True, alias="allowPrivateNetworks")
47+
"""Permit loopback/private/reserved IP literals in ``api_url`` (default True;
48+
set False to harden). Link-local / cloud-metadata stay blocked regardless."""
5449

5550
@field_validator("api_key", mode="before")
5651
@classmethod
@@ -81,6 +76,7 @@ def _validate_timeout(cls, value: float) -> float:
8176
def _require_non_empty(self) -> StorageClientConfig:
8277
if not self.api_url:
8378
raise ValueError("api_url is required")
79+
self.api_url = validate_api_url(self.api_url, allow_private_networks=self.allow_private_networks)
8480
# api_key is always truthy as SecretStr; empty string rejected by _validate_api_key above.
8581
if not self.user_id:
8682
raise ValueError("user_id is required")

0 commit comments

Comments
 (0)