10xHub · Iamsdt · Jun 13, 2026 · Jun 13, 2026 · Jun 13, 2026 · Jun 13, 2026
diff --git a/.github/dependabot.yml b/.github/dependabot.yml
@@ -0,0 +1,26 @@
+version: 2
+updates:
+  # Python dependencies (pyproject.toml + uv.lock)
+  - package-ecosystem: "pip"
+    directory: "/"
+    schedule:
+      interval: "weekly"
+    open-pull-requests-limit: 5
+    labels:
+      - "dependencies"
+      - "python"
+    groups:
+      python-dev-dependencies:
+        dependency-type: "development"
+        patterns:
+          - "*"
+
+  # GitHub Actions used in CI / release workflows
+  - package-ecosystem: "github-actions"
+    directory: "/"
+    schedule:
+      interval: "weekly"
+    open-pull-requests-limit: 5
+    labels:
+      - "dependencies"
+      - "github-actions"
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -50,3 +50,16 @@ repos:
         args: [-c, pyproject.toml]
         additional_dependencies: ["bandit[toml]"]
         exclude: ^(tests|docs|examples)/
+
+  # Run mypy from the project venv (synced via `uv sync --dev`) so it resolves
+  # real dependencies and honours the [tool.mypy] config in pyproject.toml.
+  # Checks the whole package once (pass_filenames: false) rather than per-file.
+  - repo: local
+    hooks:
+      - id: mypy
+        name: mypy
+        entry: uv run mypy agentflow/
+        language: system
+        types: [python]
+        pass_filenames: false
+        exclude: ^(tests|docs|examples|normal_tests)/
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -24,7 +24,7 @@ persistence, tools, memory, evaluation, and event publishing. Inspired by LangGr
   the README and several docstrings still show pre-refactor paths (see Known Doc Drift).
 - **Surgical edits.** This is `Development Status :: 5 - Production/Stable`. Don't refactor
   module boundaries or rename exports without checking every `__init__.py` that re-exports them.
-- **Keep coverage green.** `pytest` enforces `--cov-fail-under=70`. New code needs tests.
+- **Keep coverage green.** `pytest` enforces `--cov-fail-under=80`. New code needs tests.
 - **Optional deps are optional.** Provider SDKs, MCP, Postgres, Redis, Qdrant, Mem0, Kafka,
   RabbitMQ, OTEL, a2a are all extras. Guard imports; never make core import a hard optional dep.
 
@@ -152,7 +152,7 @@ already present.
 
 ```bash
 # from this folder (agentflow/)
-.venv/bin/python -m pytest               # full suite (enforces coverage >= 70%)
+.venv/bin/python -m pytest               # full suite (enforces coverage >= 80%)
 .venv/bin/python -m pytest tests/graph   # one area
 ruff check . && ruff format .            # lint + format (line-length 100, py312)
 # editable install with extras for local dev:

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -0,0 +1,112 @@
+# Contributing to Agentflow
+
+Thanks for your interest in improving `10xscale-agentflow`. This guide covers the
+core Python framework that lives in this folder. For the API server, TypeScript
+client, docs, or playground, see the `CONTRIBUTING`/`CLAUDE.md` in their
+respective packages.
+
+- Package (PyPI): `10xscale-agentflow`
+- Requires: Python >= 3.12
+- The importable package is the nested `agentflow/` directory; this folder is the
+  repo root for the core library.
+
+## Getting set up
+
+We use [`uv`](https://docs.astral.sh/uv/) for environment and dependency
+management.
+
+```bash
+# from this folder (the core library root)
+uv sync --dev          # create .venv and install the package + dev tools
+uv run pre-commit install   # enable the git hooks (optional but recommended)
+```
+
+If you work on optional subsystems, install the matching extras, e.g.:
+
+```bash
+uv pip install -e ".[google-genai,openai,mcp,pg_checkpoint]"
+```
+
+## Before you open a pull request
+
+Run the same checks CI runs. All must pass:
+
+```bash
+uv run pre-commit run --all-files     # ruff format + lint, bandit, mypy, hooks
+uv run pytest --cov --cov-branch      # tests + coverage gate (>= 80%)
+```
+
+You can also run pieces individually:
+
+```bash
+uv run ruff check . && uv run ruff format .
+uv run mypy agentflow/
+uv run pytest tests/graph             # one area
+```
+
+### What the gates enforce
+
+- **Formatting & linting:** `ruff` (line length 100, target py312). Most issues
+  are auto-fixed by `ruff format` / `ruff check --fix`.
+- **Types:** `mypy` runs in pre-commit. The codebase is on *phased* typing: a set
+  of modules with pre-existing errors is listed under `[[tool.mypy.overrides]]`
+  in `pyproject.toml` with `ignore_errors = true`. New code is type-checked.
+  Improving a listed module's types and removing it from that list is a welcome
+  contribution; please don't add new modules to it.
+- **Security:** `bandit`.
+- **Coverage:** `pytest` fails under 80% line coverage. New code needs tests.
+
+## Tests
+
+- Tests live in `tests/`, mirroring the package layout (`graph/`, `state/`,
+  `storage/`, `publisher/`, `prebuilt/`, `evaluation/`, `testing/`, plus
+  `chaos/`, `benchmarks/`, `integration/`).
+- Markers: `asyncio`, `integration` (needs real databases — Redis/Postgres),
+  `slow`. Integration tests are skipped unless their backends are available.
+- Prefer the in-repo test helpers in `agentflow.qa.testing` (`TestAgent`,
+  `MockMCPClient`, `MockToolRegistry`) to exercise graphs without live LLM calls.
+
+## Import paths (read this before referencing symbols)
+
+The package is organised into `core/`, `storage/`, `runtime/`, `qa/`. There are
+**no** top-level `agentflow.graph` / `agentflow.state` / `agentflow.checkpointer`
+shims — use the canonical paths:
+
+```python
+from agentflow.core.graph import StateGraph, Agent, ToolNode, CompiledGraph
+from agentflow.core.state import AgentState, Message
+from agentflow.core.llm import call_llm, create_llm_client, detect_provider
+from agentflow.storage.checkpointer import InMemoryCheckpointer, PgCheckpointer
+```
+
+`examples/` uses current import paths and is the most reliable usage reference.
+
+## Optional dependencies
+
+Provider SDKs (OpenAI, Google GenAI), MCP, Postgres, Redis, Qdrant, Mem0, Kafka,
+RabbitMQ, OTEL, and a2a are all **extras**. Guard their imports inside the
+functions that need them so the core package never hard-imports an optional
+dependency. See `agentflow/core/llm/client_factory.py` for the pattern.
+
+## Commit and PR conventions
+
+- Use clear, conventional-style commit subjects (`feat:`, `fix:`, `docs:`,
+  `refactor:`, `test:`, `chore:`), matching the existing history.
+- Keep changes surgical. This package is `Development Status :: 5 -
+  Production/Stable`; avoid renaming exports or moving module boundaries without
+  checking every `__init__.py` that re-exports the symbol.
+- Update docs/examples when you change public behaviour. Prefer fixing a stale
+  doc/example to match the code over the reverse.
+- One logical change per PR. Describe the motivation and how you tested it.
+
+## Reporting bugs and security issues
+
+- **Bugs / feature requests:** open an issue at
+  https://github.com/10xHub/agentflow/issues with a minimal reproduction.
+- **Security vulnerabilities:** do **not** open a public issue — follow
+  [`SECURITY.md`](SECURITY.md).
+
+## License
+
+By contributing, you agree that your contributions are licensed under the
+project's [MIT License](LICENSE).
diff --git a/SECURITY.md b/SECURITY.md
@@ -0,0 +1,64 @@
+# Security Policy
+
+## Supported versions
+
+`10xscale-agentflow` is pre-1.0 and ships from a single release line. Security
+fixes are applied to the latest published release only. Pin a known-good version
+in production and upgrade promptly when a security release is announced.
+
+| Version | Supported          |
+| ------- | ------------------ |
+| 0.7.x   | :white_check_mark: |
+| < 0.7   | :x:                |
+
+## Reporting a vulnerability
+
+**Please do not open a public GitHub issue for security problems.**
+
+Report privately through either channel:
+
+- **GitHub Security Advisories** (preferred): open a private report at
+  https://github.com/10xHub/agentflow/security/advisories/new
+- **Email:** contact@10xscale.ai (you may also CC shudiptotrafder@gmail.com)
+
+Include as much of the following as you can:
+
+- A description of the issue and the impact you believe it has.
+- The affected version(s) and, if known, the affected module/import path
+  (e.g. `agentflow.core.llm.client_factory`).
+- A minimal reproduction or proof of concept.
+- Any suggested remediation.
+
+### What to expect
+
+- **Acknowledgement** within 3 business days.
+- An initial assessment and severity triage within 7 business days.
+- Coordinated disclosure: we will agree on a disclosure timeline with you and
+  credit you in the advisory unless you prefer to remain anonymous.
+
+## Scope
+
+This policy covers the `10xscale-agentflow` core Python package in this
+repository. Issues in the API server (`10xscale-agentflow-cli`), the TypeScript
+client, or third-party dependencies should be reported against their respective
+projects, though we are happy to help route a report.
+
+### Things that are expected behaviour, not vulnerabilities
+
+- **Tools execute arbitrary code by design.** Tools you register with a
+  `ToolNode` run with the privileges of the host process. Only register trusted
+  tools and treat tool inputs derived from model output as untrusted.
+- **Provider API keys are read from the environment** (`OPENAI_API_KEY`,
+  `GEMINI_API_KEY`, etc.). Protecting that environment is the deployer's
+  responsibility.
+- **Prompt injection** against an LLM is a property of the model/application
+  design. Reports demonstrating a concrete privilege escalation or data
+  exfiltration path *through the framework* are in scope; generic "the model can
+  be jailbroken" reports are not.
+
+## Good practice for deployers
+
+- Keep `IS_DEBUG=false` and `MODE=production` in production.
+- Never set `ORIGINS=*` in production.
+- Use a secrets manager rather than committing `.env` files.
+- Constrain which tools and MCP servers an agent can reach.
diff --git a/agentflow/core/graph/agent_internal/circuit_breaker.py b/agentflow/core/graph/agent_internal/circuit_breaker.py
@@ -0,0 +1,95 @@
+"""A small circuit breaker for LLM calls.
+
+Complements retry + fallback: once a model/provider has failed
+``failure_threshold`` times in a row, its circuit *opens* and further calls to
+it are short-circuited (skipped, moving straight to the next fallback) for
+``reset_timeout`` seconds. After that cooldown a single trial is allowed
+(*half-open*); success closes the circuit, another failure re-opens it.
+
+This stops a dead provider from being retried on every single invocation.
+"""
+
+from __future__ import annotations
+
+import time
+from collections.abc import Callable
+from enum import Enum
+
+
+class CircuitState(str, Enum):
+    """Lifecycle state of a :class:`CircuitBreaker`."""
+
+    closed = "closed"  # normal operation, calls allowed
+    open = "open"  # failing, calls skipped until the cooldown elapses
+    half_open = "half_open"  # cooldown elapsed, one trial call allowed
+
+
+class CircuitBreakerOpenError(RuntimeError):
+    """Raised/used as the recorded error when a call is skipped by an open circuit."""
+
+    def __init__(self, key: object, retry_after: float) -> None:
+        self.key = key
+        self.retry_after = retry_after
+        super().__init__(f"Circuit breaker open for {key!r}; retry in {retry_after:.1f}s")
+
+
+class CircuitBreaker:
+    """Per-target failure tracker with open/half-open/closed states.
+
+    Args:
+        failure_threshold: Consecutive failures that trip the circuit (>= 1).
+        reset_timeout: Seconds to stay open before allowing a half-open trial.
+        time_func: Monotonic clock source; injectable for testing.
+    """
+
+    def __init__(
+        self,
+        failure_threshold: int = 5,
+        reset_timeout: float = 30.0,
+        time_func: Callable[[], float] = time.monotonic,
+    ) -> None:
+        if failure_threshold < 1:
+            raise ValueError("failure_threshold must be >= 1")
+        if reset_timeout <= 0:
+            raise ValueError("reset_timeout must be > 0")
+        self.failure_threshold = failure_threshold
+        self.reset_timeout = reset_timeout
+        self._time = time_func
+        self._failures = 0
+        self._state = CircuitState.closed
+        self._opened_at = 0.0
+
+    @property
+    def state(self) -> CircuitState:
+        return self._state
+
+    @property
+    def failure_count(self) -> int:
+        return self._failures
+
+    def allow(self) -> bool:
+        """Return True if a call may proceed, transitioning open -> half-open if due."""
+        if self._state is CircuitState.open:
+            if self._time() - self._opened_at >= self.reset_timeout:
+                self._state = CircuitState.half_open
+                return True
+            return False
+        return True
+
+    def record_success(self) -> None:
+        """Reset the breaker to closed after a successful call."""
+        self._failures = 0
+        self._state = CircuitState.closed
+
+    def record_failure(self) -> None:
+        """Register a failure, opening the circuit at/over threshold or from half-open."""
+        self._failures += 1
+        if self._state is CircuitState.half_open or self._failures >= self.failure_threshold:
+            self._state = CircuitState.open
+            self._opened_at = self._time()
+
+    def retry_after(self) -> float:
+        """Seconds remaining before an open circuit allows a half-open trial."""
+        if self._state is not CircuitState.open:
+            return 0.0
+        return max(0.0, self.reset_timeout - (self._time() - self._opened_at))
diff --git a/agentflow/core/graph/agent_internal/constants.py b/agentflow/core/graph/agent_internal/constants.py
@@ -16,6 +16,13 @@ class RetryConfig:
         max_delay: Upper-bound cap on exponential back-off delay (default ``30.0``).
         backoff_factor: Multiplier applied after each retry (default ``2.0``).
         retryable_status_codes: HTTP status codes considered transient/retryable.
+        circuit_breaker_enabled: When True, track failures per (provider, model)
+            and skip a target whose circuit is open, moving straight to the next
+            fallback instead of retrying a known-dead provider (default ``False``).
+        circuit_breaker_threshold: Consecutive failures that open a circuit
+            (default ``5``).
+        circuit_breaker_reset_timeout: Seconds a circuit stays open before a
+            single half-open trial is allowed (default ``30.0``).
     """
 
     max_retries: int = 3
@@ -25,6 +32,9 @@ class RetryConfig:
     retryable_status_codes: frozenset[int] = field(
         default_factory=lambda: frozenset({429, 500, 502, 503, 529}),
     )
+    circuit_breaker_enabled: bool = False
+    circuit_breaker_threshold: int = 5
+    circuit_breaker_reset_timeout: float = 30.0
 
 
 DEFAULT_RETRY_CONFIG = RetryConfig()