Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,22 @@ ASPNETCORE_LOGGING__LOGLEVEL__DEFAULT=Information
# Optional: Server URLs (default: http://+:8080)
ASPNETCORE_URLS=http://+:8080

# ============================================================================
# OPTIONAL: agentic_search TOOL (Cosmos retriever HTTP service)
# ============================================================================
# The `agentic_search` MCP tool calls the trained Harness-1 multi-turn
# retrieval agent, which runs as a long-lived FastAPI service started with
# `python -m cosmos_retriever serve`. See docs/AGENTIC_SEARCH.md.
# Both vars below are optional with sensible defaults; if the service is not
# reachable, agentic_search simply returns a clean JSON error envelope to the
# caller.

# Base URL of the cosmos-retriever FastAPI service (default http://127.0.0.1:9000).
# COSMOS_RETRIEVER_URL=http://127.0.0.1:9000

# Per-request wall-clock cap in seconds (default 600).
# COSMOS_RETRIEVER_TIMEOUT_S=600

# ============================================================================
# DOCKER COMPOSE NOTES
# ============================================================================
Expand Down
26 changes: 26 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,32 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [1.2.0] - 2026-06-18

### Added
- **`agentic_search` tool**: Runs the trained Harness-1 multi-turn retrieval
agent (`pat-jj/harness-1` served by vLLM) against a Cosmos DB corpus and
returns ranked, curated documents that best answer the query. The agent
issues hybrid (vector + full-text) RRF searches, optionally reranks with
Qwen3-Reranker-8B, reads full documents, and prunes its context across
multiple turns. Implemented as a subprocess call into the companion
[`cosmos-retriever`](https://github.com/your-org/cosmos-retriever)
Python package; see [`docs/AGENTIC_SEARCH.md`](docs/AGENTIC_SEARCH.md) for
the deployment story.
- Optional `database` and `container` arguments on `agentic_search` so a
single MCP server can target multiple Cosmos corpora at request time. When
the corpus registry (`CORPUS_REGISTRY` / `CORPUS_REGISTRY_FILE`) is set
in the host environment, the matching account, database, and embedding
model are picked automatically per call.
- New service: `AgenticSearchExecutor` (subprocess lifecycle, timeout, error
envelope generation).
- New env vars: `COSMOS_RETRIEVER_PYTHON`, `COSMOS_RETRIEVER_DIR`,
`COSMOS_RETRIEVER_TIMEOUT_S` — see [`.env.example`](.env.example).

### Changed
- `AppState` now also exposes `ILoggerFactory` so static `[McpServerTool]`
methods can obtain a properly-named logger.

## [1.1.2] - 2026-05-29

### Added
Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ This toolkit provides:
| `text_search` | Search for documents where a property contains a search phrase |
| `vector_search` | Perform vector search using Azure OpenAI embeddings |
| `hybrid_search` | Perform hybrid search combining vector similarity and full-text keyword search using Reciprocal Rank Fusion (RRF) |
| `agentic_search` | Run the trained Harness-1 multi-turn retrieval agent against a Cosmos DB corpus. Backed by the bundled [`cosmos-retriever/`](cosmos-retriever/) FastAPI service; see [docs/AGENTIC_SEARCH.md](docs/AGENTIC_SEARCH.md) for setup and per-corpus configuration. |

## Project Structure

Expand Down
61 changes: 61 additions & 0 deletions cosmos-retriever/.env.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# ----- Inference backend -----
# "harmony_vllm" (default): the fine-tuned pat-jj/harness-1 checkpoint served by
# vLLM, driven with raw Harmony token-IDs (set VLLM_* below).
# "openai_chat": ANY OpenAI-compatible chat model (Azure AI Foundry deployment,
# OpenAI, local server, ...) driven with standard function/tool calling
# (set CHAT_* below).
INFERENCE_BACKEND=harmony_vllm

# ----- vLLM serving the Harness-1 model (harmony_vllm backend) -----
# URL of the OpenAI-compatible vLLM endpoint that serves `pat-jj/harness-1`.
VLLM_BASE_URL=http://127.0.0.1:8000
VLLM_MODEL_NAME=harness-1
VLLM_TIMEOUT_S=900

# ----- Generic chat LLM endpoint (openai_chat backend) -----
# Any OpenAI-compatible chat-completions endpoint. For Azure AI Foundry,
# CHAT_BASE_URL is the deployment's OpenAI-compatible URL and CHAT_MODEL is the
# deployment name. Set CHAT_API_VERSION to use the Azure OpenAI client instead.
# CHAT_BASE_URL=https://your-resource.services.ai.azure.com/openai/v1
# CHAT_API_KEY=
# CHAT_MODEL=gpt-4o
# CHAT_API_VERSION=
# CHAT_TEMPERATURE=0.7
# CHAT_MAX_TOKENS=4096
# CHAT_MAX_TURNS=20

# ----- Cosmos DB (required) -----
# The corpus container must already be ingested with `id`, `docid`, `chunk_idx`,
# `text`, and `embedding` fields per the Harness-1 schema.
ACCOUNT_URI=https://your-cosmos-account.documents.azure.com:443/
COSMOS_DATABASE=your-database-name
COSMOS_CORPUS_CONTAINER=your-corpus-container
# Optional: leave unset to use AzureCliCredential / DefaultAzureCredential.
# COSMOS_KEY=

# ----- Embeddings for SearchCorpusTool (required) -----
# OpenAI by default; set AZURE_OPENAI_* to route through Azure OpenAI instead.
OPENAI_API_KEY=sk-...
OPENAI_EMBEDDING_MODEL=text-embedding-3-small
# AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com
# AZURE_OPENAI_API_KEY=
# Optional instruction prepended to embedding queries (used by some Qwen embedders).
# EMBED_QUERY_INSTRUCTION=

# ----- Reranker (optional) -----
# Pick at most one of these. Leave both unset to disable reranking.
# Baseten:
# BASETEN_API_KEY=
# BASETEN_MODEL_URL=https://model-xyz.api.baseten.co/environments/production/sync
# Local vLLM Qwen3-Reranker-8B (run on a separate port):
# VLLM_RERANKER_URL=http://127.0.0.1:8011

# ----- Retriever budget knobs (optional) -----
# COSMOS_RETRIEVER_MAX_TURNS=35
# COSMOS_RETRIEVER_THRESHOLD_BUDGET=16384
# COSMOS_RETRIEVER_TOKEN_BUDGET=32268

# ----- HTTP server -----
HOST=0.0.0.0
PORT=9000
LOG_LEVEL=info
37 changes: 37 additions & 0 deletions cosmos-retriever/.github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
name: ci

on:
push:
branches: [main]
pull_request:

concurrency:
group: ci-${{ github.ref }}
cancel-in-progress: true

jobs:
lint-and-test:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.11", "3.12"]
steps:
- uses: actions/checkout@v4

- uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}

- name: Install uv
uses: astral-sh/setup-uv@v3
with:
enable-cache: true

- name: Install package with dev extras
run: uv pip install --system -e ".[dev]"

- name: Ruff lint
run: ruff check src tests

- name: Pytest
run: pytest -q
37 changes: 37 additions & 0 deletions cosmos-retriever/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# --- Python ---
__pycache__/
*.py[cod]
*$py.class
*.egg-info/
.eggs/
build/
dist/
.coverage
.coverage.*
htmlcov/
.pytest_cache/
.mypy_cache/
.ruff_cache/

# --- Virtual envs ---
.venv/
venv/
env/

# --- IDE ---
.vscode/
.idea/
*.swp

# --- Secrets / local config ---
.env
.env.local
.env.*.local

# --- Logs / scratch ---
*.log
tmp/
runs/

# --- Build artefacts ---
src/*.egg-info/
1 change: 1 addition & 0 deletions cosmos-retriever/.python-version
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
3.11
Loading