AzureCosmosDB · aryan-410 · Jun 26, 2026 · Jun 26, 2026 · Jun 30, 2026 · Jun 30, 2026
diff --git a/.env.example b/.env.example
@@ -54,6 +54,22 @@ ASPNETCORE_LOGGING__LOGLEVEL__DEFAULT=Information
 # Optional: Server URLs (default: http://+:8080)
 ASPNETCORE_URLS=http://+:8080
 
+# ============================================================================
+# OPTIONAL: agentic_search TOOL (Cosmos retriever HTTP service)
+# ============================================================================
+# The `agentic_search` MCP tool calls the trained Harness-1 multi-turn
+# retrieval agent, which runs as a long-lived FastAPI service started with
+# `python -m cosmos_retriever serve`. See docs/AGENTIC_SEARCH.md.
+# Both vars below are optional with sensible defaults; if the service is not
+# reachable, agentic_search simply returns a clean JSON error envelope to the
+# caller.
+
+# Base URL of the cosmos-retriever FastAPI service (default http://127.0.0.1:9000).
+# COSMOS_RETRIEVER_URL=http://127.0.0.1:9000
+
+# Per-request wall-clock cap in seconds (default 600).
+# COSMOS_RETRIEVER_TIMEOUT_S=600
+
 # ============================================================================
 # DOCKER COMPOSE NOTES
 # ============================================================================

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -5,6 +5,32 @@ All notable changes to this project will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
+## [1.2.0] - 2026-06-18
+
+### Added
+- **`agentic_search` tool**: Runs the trained Harness-1 multi-turn retrieval
+  agent (`pat-jj/harness-1` served by vLLM) against a Cosmos DB corpus and
+  returns ranked, curated documents that best answer the query. The agent
+  issues hybrid (vector + full-text) RRF searches, optionally reranks with
+  Qwen3-Reranker-8B, reads full documents, and prunes its context across
+  multiple turns. Implemented as a subprocess call into the companion
+  [`cosmos-retriever`](https://github.com/your-org/cosmos-retriever)
+  Python package; see [`docs/AGENTIC_SEARCH.md`](docs/AGENTIC_SEARCH.md) for
+  the deployment story.
+- Optional `database` and `container` arguments on `agentic_search` so a
+  single MCP server can target multiple Cosmos corpora at request time. When
+  the corpus registry (`CORPUS_REGISTRY` / `CORPUS_REGISTRY_FILE`) is set
+  in the host environment, the matching account, database, and embedding
+  model are picked automatically per call.
+- New service: `AgenticSearchExecutor` (subprocess lifecycle, timeout, error
+  envelope generation).
+- New env vars: `COSMOS_RETRIEVER_PYTHON`, `COSMOS_RETRIEVER_DIR`,
+  `COSMOS_RETRIEVER_TIMEOUT_S` — see [`.env.example`](.env.example).
+
+### Changed
+- `AppState` now also exposes `ILoggerFactory` so static `[McpServerTool]`
+  methods can obtain a properly-named logger.
+
 ## [1.1.2] - 2026-05-29
 
 ### Added

diff --git a/README.md b/README.md
@@ -40,6 +40,7 @@ This toolkit provides:
 | `text_search` | Search for documents where a property contains a search phrase |
 | `vector_search` | Perform vector search using Azure OpenAI embeddings |
 | `hybrid_search` | Perform hybrid search combining vector similarity and full-text keyword search using Reciprocal Rank Fusion (RRF) |
+| `agentic_search` | Run the trained Harness-1 multi-turn retrieval agent against a Cosmos DB corpus. Backed by the bundled [`cosmos-retriever/`](cosmos-retriever/) FastAPI service; see [docs/AGENTIC_SEARCH.md](docs/AGENTIC_SEARCH.md) for setup and per-corpus configuration. |
 
 ## Project Structure
 

diff --git a/cosmos-retriever/.env.example b/cosmos-retriever/.env.example
@@ -0,0 +1,61 @@
+# ----- Inference backend -----
+# "harmony_vllm" (default): the fine-tuned pat-jj/harness-1 checkpoint served by
+#   vLLM, driven with raw Harmony token-IDs (set VLLM_* below).
+# "openai_chat": ANY OpenAI-compatible chat model (Azure AI Foundry deployment,
+#   OpenAI, local server, ...) driven with standard function/tool calling
+#   (set CHAT_* below).
+INFERENCE_BACKEND=harmony_vllm
+
+# ----- vLLM serving the Harness-1 model (harmony_vllm backend) -----
+# URL of the OpenAI-compatible vLLM endpoint that serves `pat-jj/harness-1`.
+VLLM_BASE_URL=http://127.0.0.1:8000
+VLLM_MODEL_NAME=harness-1
+VLLM_TIMEOUT_S=900
+
+# ----- Generic chat LLM endpoint (openai_chat backend) -----
+# Any OpenAI-compatible chat-completions endpoint. For Azure AI Foundry,
+# CHAT_BASE_URL is the deployment's OpenAI-compatible URL and CHAT_MODEL is the
+# deployment name. Set CHAT_API_VERSION to use the Azure OpenAI client instead.
+# CHAT_BASE_URL=https://your-resource.services.ai.azure.com/openai/v1
+# CHAT_API_KEY=
+# CHAT_MODEL=gpt-4o
+# CHAT_API_VERSION=
+# CHAT_TEMPERATURE=0.7
+# CHAT_MAX_TOKENS=4096
+# CHAT_MAX_TURNS=20
+
+# ----- Cosmos DB (required) -----
+# The corpus container must already be ingested with `id`, `docid`, `chunk_idx`,
+# `text`, and `embedding` fields per the Harness-1 schema.
+ACCOUNT_URI=https://your-cosmos-account.documents.azure.com:443/
+COSMOS_DATABASE=your-database-name
+COSMOS_CORPUS_CONTAINER=your-corpus-container
+# Optional: leave unset to use AzureCliCredential / DefaultAzureCredential.
+# COSMOS_KEY=
+
+# ----- Embeddings for SearchCorpusTool (required) -----
+# OpenAI by default; set AZURE_OPENAI_* to route through Azure OpenAI instead.
+OPENAI_API_KEY=sk-...
+OPENAI_EMBEDDING_MODEL=text-embedding-3-small
+# AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com
+# AZURE_OPENAI_API_KEY=
+# Optional instruction prepended to embedding queries (used by some Qwen embedders).
+# EMBED_QUERY_INSTRUCTION=
+
+# ----- Reranker (optional) -----
+# Pick at most one of these. Leave both unset to disable reranking.
+# Baseten:
+# BASETEN_API_KEY=
+# BASETEN_MODEL_URL=https://model-xyz.api.baseten.co/environments/production/sync
+# Local vLLM Qwen3-Reranker-8B (run on a separate port):
+# VLLM_RERANKER_URL=http://127.0.0.1:8011
+
+# ----- Retriever budget knobs (optional) -----
+# COSMOS_RETRIEVER_MAX_TURNS=35
+# COSMOS_RETRIEVER_THRESHOLD_BUDGET=16384
+# COSMOS_RETRIEVER_TOKEN_BUDGET=32268
+
+# ----- HTTP server -----
+HOST=0.0.0.0
+PORT=9000
+LOG_LEVEL=info
diff --git a/cosmos-retriever/.github/workflows/ci.yml b/cosmos-retriever/.github/workflows/ci.yml
@@ -0,0 +1,37 @@
+name: ci
+
+on:
+  push:
+    branches: [main]
+  pull_request:
+
+concurrency:
+  group: ci-${{ github.ref }}
+  cancel-in-progress: true
+
+jobs:
+  lint-and-test:
+    runs-on: ubuntu-latest
+    strategy:
+      matrix:
+        python-version: ["3.11", "3.12"]
+    steps:
+      - uses: actions/checkout@v4
+
+      - uses: actions/setup-python@v5
+        with:
+          python-version: ${{ matrix.python-version }}
+
+      - name: Install uv
+        uses: astral-sh/setup-uv@v3
+        with:
+          enable-cache: true
+
+      - name: Install package with dev extras
+        run: uv pip install --system -e ".[dev]"
+
+      - name: Ruff lint
+        run: ruff check src tests
+
+      - name: Pytest
+        run: pytest -q
diff --git a/cosmos-retriever/.gitignore b/cosmos-retriever/.gitignore
@@ -0,0 +1,37 @@
+# --- Python ---
+__pycache__/
+*.py[cod]
+*$py.class
+*.egg-info/
+.eggs/
+build/
+dist/
+.coverage
+.coverage.*
+htmlcov/
+.pytest_cache/
+.mypy_cache/
+.ruff_cache/
+
+# --- Virtual envs ---
+.venv/
+venv/
+env/
+
+# --- IDE ---
+.vscode/
+.idea/
+*.swp
+
+# --- Secrets / local config ---
+.env
+.env.local
+.env.*.local
+
+# --- Logs / scratch ---
+*.log
+tmp/
+runs/
+
+# --- Build artefacts ---
+src/*.egg-info/
diff --git a/cosmos-retriever/.python-version b/cosmos-retriever/.python-version
@@ -0,0 +1 @@
+3.11