Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 15 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -188,6 +188,8 @@ inference gateways.
| `anthropic_proxy` | `ANTHROPIC_PROXY_API_KEY` + `ANTHROPIC_PROXY_ENDPOINT_URL` | Any Vertex-style raw-predict proxy | `claude-sonnet-4-6` |
| `bedrock` | `AWS_PROFILE` (optional) + `AWS_REGION` — SigV4 via boto3 | AWS Bedrock Runtime | `us.anthropic.claude-sonnet-4-6-20250915-v1:0` |
| `nv_build` | `NVIDIA_INFERENCE_KEY` | build.nvidia.com | `deepseek-ai/deepseek-v4-flash` |
| `claude_cli` | _(none — uses local CLI auth)_ | local `claude` binary | `claude-sonnet-4-6` |
| `codex_cli` | _(none — uses local CLI auth)_ | local `codex` binary | `o4-mini` |

```bash
# Stock OpenAI
Expand Down Expand Up @@ -224,6 +226,16 @@ export SKILLSPECTOR_PROVIDER=nv_build
export NVIDIA_INFERENCE_KEY=nvapi-...
skillspector scan ./my-skill/

# Local Claude CLI — no API key; uses your existing `claude auth login` session
# Requires: claude CLI installed and authenticated (claude auth login)
export SKILLSPECTOR_PROVIDER=claude_cli
skillspector scan ./my-skill/

# Local Codex CLI — no API key; uses your existing `codex login` session
# Requires: codex CLI installed and authenticated
export SKILLSPECTOR_PROVIDER=codex_cli
skillspector scan ./my-skill/

# Local Ollama or any OpenAI-compatible endpoint
export SKILLSPECTOR_PROVIDER=openai
export OPENAI_API_KEY=ollama
Expand Down Expand Up @@ -514,7 +526,7 @@ Issues (2)

| Variable | Description | Required |
|----------|-------------|----------|
| `SKILLSPECTOR_PROVIDER` | Active LLM provider: `openai`, `anthropic`, `anthropic_proxy`, `bedrock`, or `nv_build`. Each provider has its own bundled `model_registry.yaml` and default model (see the LLM Analysis table above). Defaults to `nv_build`. | Optional |
| `SKILLSPECTOR_PROVIDER` | Active LLM provider: `openai`, `anthropic`, `anthropic_proxy`, `bedrock`, `nv_build`, `claude_cli`, `codex_cli`, or `gemini_cli`. Each provider has its own bundled `model_registry.yaml` and default model (see the LLM Analysis table above). Defaults to `nv_build`. | Optional |
| `NVIDIA_INFERENCE_KEY` | Credential for the `nv_build` provider (build.nvidia.com). | Required for LLM analysis when `SKILLSPECTOR_PROVIDER=nv_build` |
| `OPENAI_API_KEY` | Credential for the OpenAI provider (`SKILLSPECTOR_PROVIDER=openai`). Also serves as the tier-2 fallback in the credential waterfall when the active provider returns no credentials. | Required for LLM analysis when `SKILLSPECTOR_PROVIDER=openai` |
| `OPENAI_BASE_URL` | Override the OpenAI endpoint (e.g. point at Ollama). | Optional |
Expand All @@ -528,6 +540,8 @@ Issues (2)
| `SKILLSPECTOR_MODEL_REGISTRY` | Override the bundled per-provider YAML registry (`src/skillspector/providers/<provider>/model_registry.yaml`) with a custom path. | Optional |
| `SKILLSPECTOR_LOG_LEVEL` | Log level: `DEBUG`, `INFO`, `WARNING`, `ERROR` (default: `WARNING`). | Optional |

> **CLI providers** (`claude_cli`, `codex_cli`): No API key is needed. Authentication is managed entirely by the agent CLI's own login session (`claude auth login` / `codex login`). SkillSpector never reads or forwards API keys when these providers are active. The subprocess is run in a hardened sandbox: tools disabled, no MCP, read-only sandbox mode (codex), and untrusted skill content is delivered only via stdin.

### CLI Options

```bash
Expand Down
20 changes: 16 additions & 4 deletions docs/DEVELOPMENT.md
Original file line number Diff line number Diff line change
Expand Up @@ -265,12 +265,14 @@ Copy [.env.example](../.env.example) to `.env` in the project root and set value

| Variable | Description | Example |
|----------|-------------|---------|
| `SKILLSPECTOR_PROVIDER` | Active LLM provider: `openai` \| `anthropic` \| `nv_build`. Defaults to `nv_build`. | `openai` |
| `SKILLSPECTOR_PROVIDER` | Active LLM provider: `openai` \| `anthropic` \| `nv_build` \| `claude_cli` \| `codex_cli`. Defaults to `nv_build`. | `claude_cli` |
| `NVIDIA_INFERENCE_KEY` | Credential for `nv_build`. | `nvapi-...` |
| `OPENAI_API_KEY` | Credential for `SKILLSPECTOR_PROVIDER=openai`. Also tier-2 fallback for non-OpenAI providers. | `sk-...` |
| `OPENAI_BASE_URL` | Override the OpenAI endpoint (e.g. point at Ollama). | `http://localhost:11434/v1` |
| `ANTHROPIC_API_KEY` | Credential for `SKILLSPECTOR_PROVIDER=anthropic`. | `sk-ant-...` |
| `SKILLSPECTOR_MODEL` | Override the active provider's bundled default model (see [README.md](../README.md) for per-provider defaults). | `gpt-5.2` |
| `SKILLSPECTOR_MODEL` | Override the active provider's bundled default model (see [README.md](../README.md) for per-provider defaults). For `claude_cli`, this is passed as `--model` to the `claude` binary. | `gpt-5.2` |

> **CLI providers** (`claude_cli`, `codex_cli`): no credential env var is needed. Authentication is managed by the agent CLI's own session (`claude auth login` / `codex login`). The subprocess is heavily sandboxed — see [providers/_agent_cli.py](../src/skillspector/providers/_agent_cli.py).

### Live provider tests

Expand All @@ -291,8 +293,18 @@ Base URL env vars are not needed for live provider tests; the tests intentionall
- **`get_max_input_tokens(model)`** — input budget per LLM request (75% of resolved context window).
- **`get_max_output_tokens(model)`** — output budget per LLM request (min of 25% context, registry's `max_output_tokens` cap if set).
- Batch budget overhead is computed per-prompt via `estimate_tokens(base_prompt)` rather than a fixed constant.
- **Providers** ([providers/](../src/skillspector/providers/)): pluggable credential + token-budget resolvers. Each provider is a subpackage with its own `provider.py` and bundled `model_registry.yaml`; [registry.py](../src/skillspector/providers/registry.py) exposes `lookup_context_length` / `lookup_max_output_tokens` utilities the providers call directly. The active provider is chosen by `SKILLSPECTOR_PROVIDER` (default: `nv_build`) — see [providers/`__init__`.py](../src/skillspector/providers/__init__.py): `nv_build/` (build.nvidia.com), `openai/`, or `anthropic/`.
- **LLM calls** ([llm_utils.py](../src/skillspector/llm_utils.py)): **`get_chat_model()`** and **`chat_completion()`** resolve credentials in two tiers — active NVIDIA provider (`NVIDIA_INFERENCE_KEY` → endpoint) → standard `OPENAI_API_KEY` / `OPENAI_BASE_URL` — against any OpenAI-compatible endpoint. `max_tokens` is auto-bound to `get_max_output_tokens(model)` from `model_info`.
- **Providers** ([providers/](../src/skillspector/providers/)): pluggable credential + token-budget resolvers. Each provider is a subpackage with its own `provider.py` and bundled `model_registry.yaml`; [registry.py](../src/skillspector/providers/registry.py) exposes `lookup_context_length` / `lookup_max_output_tokens` utilities the providers call directly. The active provider is chosen by `SKILLSPECTOR_PROVIDER` (default: `nv_build`):
- `nv_build/` — build.nvidia.com (HTTP, `NVIDIA_INFERENCE_KEY`)
- `openai/` — api.openai.com or any OpenAI-compatible URL (`OPENAI_API_KEY`)
- `anthropic/` — api.anthropic.com (`ANTHROPIC_API_KEY`)
- `claude_cli/` — **local `claude` binary; no API key**. Uses the CLI's own auth session (`claude auth login`). Set `SKILLSPECTOR_PROVIDER=claude_cli`.
- `codex_cli/` — **local `codex` binary; no API key**. Uses the CLI's own auth session (`codex login`). Set `SKILLSPECTOR_PROVIDER=codex_cli`.

CLI providers (`claude_cli`, `codex_cli`) implement the optional `AgentCLICapable` interface (`is_available()` + `complete()`) defined in [providers/base.py](../src/skillspector/providers/base.py). `has_cli_capability(provider)` detects this at runtime. All subprocess calls go through the hardened helper [providers/_agent_cli.py](../src/skillspector/providers/_agent_cli.py) which enforces: no shell (`shell=False`), untrusted content via stdin only, capability stripping (tools disabled / sandboxed), environment scrubbing (no API keys forwarded), per-call timeout, and fail-closed error handling.

- **LLM calls** ([llm_utils.py](../src/skillspector/llm_utils.py)): **`get_chat_model()`** and **`chat_completion()`** dispatch based on the active provider:
- **HTTP providers**: resolve credentials in two tiers — active provider (`NVIDIA_INFERENCE_KEY` / `ANTHROPIC_API_KEY` / `OPENAI_API_KEY` → endpoint) — against any OpenAI-compatible endpoint. `max_tokens` is auto-bound to `get_max_output_tokens(model)` from `model_info`.
- **CLI providers** (`claude_cli`, `codex_cli`): `get_chat_model()` returns an `AgentCLIChatModel` adapter backed by `provider.complete()`, so the analyzers' `.invoke()` / `.with_structured_output(schema).invoke()` calls work with no API key (structured output is produced by prompting for JSON, then Pydantic-validating). `chat_completion()` routes through `get_chat_model()` as well. `is_llm_available()` calls `provider.is_available()` instead of credential resolution.
- **LLM analyzer base** ([llm_analyzer_base.py](../src/skillspector/nodes/llm_analyzer_base.py)): `LLMAnalyzerBase` provides per-file/per-chunk batching, token-budget-aware chunking, and a run loop for all LLM-based analyzers. `LLMMetaAnalyzer` extends it for filter/enrich (meta_analyzer node). Future semantic analyzers extend `LLMAnalyzerBase` for discovery mode.

---
Expand Down
204 changes: 187 additions & 17 deletions src/skillspector/llm_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,13 +13,17 @@
# See the License for the specific language governing permissions and
# limitations under the License.

"""Shared LLM utilities.
"""Shared LLM utilities (OpenAI-compatible chat models + agent CLI transports).

Credentials are resolved in this order:
1. The active SkillSpector provider (see :mod:`skillspector.providers`) —
reads its own credential env var and supplies the matching client.
1. The active provider (see :mod:`skillspector.providers`):
- CLI providers (``claude_cli``, ``codex_cli``, ``gemini_cli``): use
``is_available()`` and ``complete()`` — no API key needed.
- HTTP providers (``anthropic``, ``openai``, ``nv_build``): read their
respective credential env vars and supply a base URL.
2. ``OPENAI_API_KEY`` / ``OPENAI_BASE_URL`` (the langchain-openai
defaults).
defaults) — only consulted for HTTP providers when the provider's
own credential env var is unset.

There is no SkillSpector-specific credential env var: setting
``NVIDIA_INFERENCE_KEY`` configures whichever NVIDIA endpoint the
Expand All @@ -30,13 +34,18 @@

from __future__ import annotations

import asyncio
import json
from typing import NoReturn

from langchain_core.language_models.chat_models import BaseChatModel
from langchain_core.messages import BaseMessage

from skillspector.model_info import get_max_input_tokens, get_max_output_tokens
from skillspector.providers import (
create_chat_model,
get_active_provider,
get_metadata_provider,
has_cli_capability,
raise_no_llm_api_key_configured,
resolve_chat_model_credentials,
resolve_provider_credentials,
Expand All @@ -47,8 +56,9 @@
def _resolve_llm_credentials() -> tuple[str, str | None]:
"""Return ``(api_key, base_url)`` resolved from the environment.

Tries the active NVIDIA provider first; falls back to ``OPENAI_API_KEY``
/ ``OPENAI_BASE_URL`` when the provider is not configured.
Tries the active SkillSpector provider first; falls back to
``OPENAI_API_KEY`` / ``OPENAI_BASE_URL`` when the provider is not
configured.

Raises:
ValueError: when no API key can be resolved from any source.
Expand All @@ -72,7 +82,15 @@ def _resolve_default_chat_model() -> str:


def is_llm_available() -> tuple[bool, str | None]:
"""Return ``(available, error_message)`` describing LLM credential status."""
"""Return ``(available, error_message)`` describing LLM availability.

For CLI providers (``claude_cli``, ``codex_cli``, ``gemini_cli``) the check
delegates to the provider's ``is_available()`` method (binary on PATH +
auth). For HTTP providers, it falls back to credential resolution.
"""
provider = get_active_provider()
if has_cli_capability(provider):
return provider.is_available() # type: ignore[attr-defined]
try:
_resolve_llm_credentials()
except ValueError as exc:
Expand All @@ -85,12 +103,157 @@ def fetch_model_token_limits(model_label: str) -> tuple[int, int]:
return get_max_input_tokens(model_label), get_max_output_tokens(model_label)


def get_chat_model(model: str | None = None) -> BaseChatModel:
"""Return the active provider's native LangChain chat model.
# ---------------------------------------------------------------------------
# Agent CLI chat-model adapter
# ---------------------------------------------------------------------------
#
# The LLM analyzers (meta_analyzer, semantic_*) obtain a model from
# ``get_chat_model()`` and call ``.invoke()`` / ``.with_structured_output(
# schema).invoke()`` on it (see ``llm_analyzer_base``) — they never go through
# ``chat_completion``. To support CLI providers there, ``get_chat_model``
# returns this minimal adapter, which mimics the slice of the ``ChatOpenAI``
# interface the analyzers rely on, backed by the provider's ``complete()``
# subprocess transport.


class _AgentCLIMessage:
"""Minimal stand-in for a LangChain message: exposes ``.content``."""

def __init__(self, content: str) -> None:
self.content = content


def _extract_json_object(raw: str) -> dict:
"""Extract a single JSON object from a CLI model's text response.

Tolerates markdown code fences and surrounding prose. Raises ``ValueError``
(fail-closed) when no JSON object can be parsed.
"""
text = raw.strip()
if text.startswith("```"):
# Drop the opening fence line (``` or ```json) and any closing fence.
text = text.split("\n", 1)[1] if "\n" in text else ""
fence = text.rfind("```")
if fence != -1:
text = text[:fence]
text = text.strip()
try:
obj = json.loads(text)
if isinstance(obj, dict):
return obj
except json.JSONDecodeError:
pass
start, end = text.find("{"), text.rfind("}")
if start != -1 and end > start:
try:
obj = json.loads(text[start : end + 1])
if isinstance(obj, dict):
return obj
except json.JSONDecodeError:
pass
raise ValueError(f"could not extract a JSON object from CLI response: {raw[:200]!r}")


class _StructuredAgentCLIModel:
"""Mimics ``ChatOpenAI.with_structured_output(schema)`` for a CLI provider.

``invoke`` augments the prompt with the schema, calls the provider's
``complete()``, then parses and validates the response into *schema*.
"""

def __init__(self, provider: object, model: str, max_output_tokens: int, schema: type) -> None:
self._provider = provider
self._model = model
self._max_output_tokens = max_output_tokens
self._schema = schema

def _augment(self, prompt: str) -> str:
schema_json = json.dumps(self._schema.model_json_schema(), indent=2)
return (
f"{prompt}\n\n"
"Respond with ONLY a single JSON object conforming to the JSON Schema "
"below. Do not wrap it in markdown code fences and do not add any prose "
f"before or after the JSON.\n\nJSON Schema:\n{schema_json}"
)

def invoke(self, prompt: str) -> object:
raw = self._provider.complete( # type: ignore[attr-defined]
self._augment(prompt),
model=self._model,
max_output_tokens=self._max_output_tokens,
)
return self._schema.model_validate(_extract_json_object(raw))

async def ainvoke(self, prompt: str) -> object:
return await asyncio.to_thread(self.invoke, prompt)


class AgentCLIChatModel:
"""Minimal ``ChatOpenAI``-compatible adapter backed by a CLI provider.

Implements only the surface the analyzers use: ``invoke`` (returns an
object with ``.content``), ``ainvoke``, and ``with_structured_output``.
The rest of the ``BaseChatModel`` surface (``batch``, ``stream``,
callbacks) is intentionally unsupported; the stubs below make that boundary
explicit so a future analyzer reaching for it fails loudly with a clear
message rather than a confusing ``AttributeError``.
"""

def __init__(self, provider: object, model: str, max_output_tokens: int) -> None:
self._provider = provider
self._model = model
self._max_output_tokens = max_output_tokens

def batch(self, *args: object, **kwargs: object) -> NoReturn:
raise NotImplementedError(
"AgentCLIChatModel supports only invoke/ainvoke/with_structured_output; "
"batch() is not available for CLI providers."
)

def stream(self, *args: object, **kwargs: object) -> NoReturn:
raise NotImplementedError(
"AgentCLIChatModel supports only invoke/ainvoke/with_structured_output; "
"stream() is not available for CLI providers."
)

def invoke(self, prompt: str) -> _AgentCLIMessage:
text = self._provider.complete( # type: ignore[attr-defined]
prompt,
model=self._model,
max_output_tokens=self._max_output_tokens,
)
return _AgentCLIMessage(text)

async def ainvoke(self, prompt: str) -> _AgentCLIMessage:
return await asyncio.to_thread(self.invoke, prompt)

def with_structured_output(self, schema: type) -> _StructuredAgentCLIModel:
return _StructuredAgentCLIModel(
self._provider, self._model, self._max_output_tokens, schema
)


def get_chat_model(model: str | None = None) -> BaseChatModel | AgentCLIChatModel:
"""Return a chat model for the active provider.

For CLI providers (``claude_cli``, ``codex_cli``, ``gemini_cli``) this
returns an :class:`AgentCLIChatModel` adapter backed by the provider's
``complete()`` subprocess transport — so the LLM analyzers (which use
``.invoke()`` and ``.with_structured_output()``) work with no API key.

For HTTP providers it delegates to
:func:`skillspector.providers.create_chat_model`, which uses the
provider's own native client (e.g. ``ChatAnthropic`` for Anthropic) with
an ``OPENAI_API_KEY`` / ``ChatOpenAI`` fallback.

Raises:
ValueError: when no API key is configured (see ``is_llm_available``).
ValueError: when an HTTP provider has no API key configured.
"""
provider = get_active_provider()
if has_cli_capability(provider):
resolved_model = model or provider.resolve_model()
return AgentCLIChatModel(provider, resolved_model, get_max_output_tokens(resolved_model))

model = model or _resolve_default_chat_model()
return create_chat_model(
model=model,
Expand All @@ -100,9 +263,16 @@ def get_chat_model(model: str | None = None) -> BaseChatModel:


def chat_completion(prompt: str, *, model: str | None = None) -> str:
"""Request a single chat completion and return the assistant text."""
llm = get_chat_model(model=model)
response = llm.invoke(prompt)
if not isinstance(response, BaseMessage):
raise TypeError(f"Expected BaseMessage from chat model, got {type(response).__name__}")
return str(response.text)
"""Request a single chat completion and return the assistant content.

Routes through :func:`get_chat_model`, which dispatches to the CLI adapter
for CLI providers and to the provider's native chat model for HTTP providers.

Uses ``.text`` when available (real LangChain ``BaseMessage`` objects,
which normalise content blocks to a single string) and falls back to
``.content`` for the CLI adapter's ``_AgentCLIMessage``.
"""
response = get_chat_model(model=model).invoke(prompt)
if hasattr(response, "text"):
return response.text # type: ignore[union-attr]
return response.content or "" # type: ignore[union-attr]
Loading