One interface for every model. Authentication, routing, streaming, retries, caching — handled.
Quick Start · Features · Docs · Examples · Providers · Architecture · Contributing
eyrie is the LLM provider runtime that powers the hawk coding agent. It handles everything between your application and LLM APIs — authentication, model resolution, streaming, retries, rate limiting, and caching.
When your app calls a model, eyrie figures out which provider to use, how to talk to it, and how to stream the response back. Switch from Anthropic to Ollama? eyrie handles the translation. API returns 529? eyrie retries with backoff. Response hits max_tokens? eyrie continues automatically.
Your app never talks to an LLM API directly. eyrie does.
go get github.com/GrayCodeAI/eyrieRequires Go 1.26+. Minimal dependencies (UUID, OpenTelemetry, SQLite, keyring).
import "github.com/GrayCodeAI/eyrie/client"
// Create a client — provider auto-detected from environment
c := client.NewEyrieClient(&client.EyrieConfig{
Provider: client.DetectProvider(),
})
// Stream a response
sr, err := c.StreamChat(ctx, messages, client.ChatOptions{
Model: "claude-sonnet-4-6",
})
defer sr.Close()
for evt := range sr.Events {
switch evt.Type {
case "content": // stream text
case "tool_call": // execute tool
case "done": // response complete
}
}Automatically detects and routes to the right provider based on environment variables, config files, or explicit selection.
Maps abstract tiers (opus/sonnet/haiku) to concrete model IDs per provider. Ships with an embedded catalog of pricing, context windows, and capabilities.
Parses SSE for Anthropic and OpenAI formats — text, tool calls, and thinking blocks.
- Retries on 429/500/529 with exponential backoff and
Retry-Aftersupport - Auto-continuation when
stop_reason == max_tokens - Provider fallback chains for high availability
Token bucket rate limiter per provider — prevents hitting API limits before they happen.
- Response caching with configurable TTL
- Semantic similarity caching for repeated prompts
- Anthropic prompt caching breakpoints on system prompt and conversation prefix
Built-in cost estimation per call, with per-provider pricing from the embedded model catalog.
Passes reasoning_effort and Anthropic extended-thinking thinking_budget_tokens through to capable models — omitted when unset.
GitHub OIDC keyless authentication for cloud deployments — mints a short-lived token in GitHub Actions and exchanges it for AWS Bedrock (STS AssumeRoleWithWebIdentity) or GCP Vertex (Workload Identity Federation) credentials, no stored secrets.
Serves POST /v1/chat/completions so existing OpenAI SDK clients can talk to eyrie unchanged.
Named routing strategies beyond weighted random: simple-shuffle, least-busy, latency-based, cost-based, and usage-based.
Distributed CacheBackend interface (in-memory default, RESP/Redis-capable) and an AuditSink interface (no-op default, JSONL file sink) for privacy-preserving call metadata.
Named primary / weak / editor model slots with fallback to primary, plus an LLM summarizing condenser that shrinks long conversation histories using the weak model.
POST /rerank endpoint (provider-backed with lexical fallback) and a GET /ready readiness probe alongside the existing health check.
Dependency-free gRPC API skeleton behind the grpc build tag — wired when generated stubs are available.
Detailed documentation is available in the docs/ directory:
- Architecture — System design, data flow, and reliability features
- Provider Setup Guide — Credential configuration and provider setup
- Dynamic Model Discovery — Live model discovery architecture
Runnable examples are in the examples/ directory:
- Basic Chat — Simple synchronous chat
- Streaming — SSE streaming with event handling
- Multi-Provider — Fallback chains across providers
Run any example with:
ANTHROPIC_API_KEY=sk-... go run ./examples/basic/12 setup gateways in catalog/registry/providers.go (hawk /config uses the same list):
| Provider | ID | Env variable |
|---|---|---|
| Anthropic | anthropic |
ANTHROPIC_API_KEY |
| OpenAI | openai |
OPENAI_API_KEY |
| Google Gemini | gemini |
GEMINI_API_KEY |
| OpenRouter | openrouter |
OPENROUTER_API_KEY |
| xAI (Grok) | grok |
XAI_API_KEY |
| Z.AI | z-ai |
ZAI_API_KEY |
| CanopyWave | canopywave |
CANOPYWAVE_API_KEY |
| OpenCode Go | opencodego |
OPENCODEGO_API_KEY |
| Kimi (Moonshot) | kimi |
MOONSHOT_API_KEY |
| Xiaomi (MiMo) Pay-as-you-go | xiaomi_mimo_payg |
XIAOMI_MIMO_PAYG_API_KEY |
| Xiaomi (MiMo) Token Plan | xiaomi_mimo_token_plan |
XIAOMI_MIMO_TOKEN_PLAN_API_KEY (+ region cn / sgp / ams) |
| Ollama | ollama |
OLLAMA_BASE_URL (local; no API key) |
Runtime auto-detection uses a separate priority order for chat when no deployment is pinned; see config profiles.
resp, err := c.Chat(ctx, messages, client.ChatOptions{
Model: "gpt-4o",
})// Auto-continues when max_tokens is hit
resp, err := client.ChatWithContinuation(ctx, provider, messages,
client.ChatOptions{Model: model},
client.DefaultContinuationConfig(),
)mock := client.NewMockProvider(client.MockModeFixed)
mock.Response = "Here is the code you asked for..."
resp, _ := mock.Chat(ctx, messages, opts)
// No real API calls — perfect for testscat := catalog.DefaultModelCatalog()
// Get the best model for a tier
model := catalog.GetPreferredProviderModel("anthropic", catalog.TierSonnet, &cat)
// → "claude-sonnet-4-6"
// Check deprecation warnings
warn := catalog.GetModelDeprecationWarning("claude-3-7-sonnet", "anthropic")cfg := config.LoadProviderConfig("") // load from disk
config.ApplyProviderConfigToEnv(cfg, false, nil) // apply to environment
config.SaveProviderConfig(cfg, "") // save changeseyrie/
├── cmd/eyrie/ # CLI binary
├── client/ # Provider client & streaming interface
├── config/ # Provider configuration & routing
│ └── credential/ # Credential file management
├── catalog/ # Model catalog & tier system
│ ├── discover/ # Model discovery
│ ├── legacy/ # Legacy model support
│ ├── live/ # Live model data
│ └── registry/ # Model registry
├── codeagent/ # Code agent retry & fallback strategies
├── conversation/ # Conversation engine with branching
├── credentials/ # Credential management
├── docs/ # Documentation & guides
├── examples/ # Runnable code examples
├── router/ # Weighted provider router
├── runtime/ # Runtime manifest & routing policies
├── storage/ # SQLite conversation DAG store
├── types/ # Branded types & API errors
├── errors/ # Error message constants
├── constants/ # API limits
├── utils/ # Error utilities
├── internal/
│ ├── api/ # HTTP API handlers
│ ├── cache/ # Response cache warmer
│ ├── health/ # Provider health checker
│ ├── observability/ # OpenTelemetry spans & metrics
│ ├── sdk/ # Go, Python, TypeScript client SDKs
│ └── version/ # Version information
└── assets/ # Logo and branding
See docs/ARCHITECTURE.md for detailed system design and data flows.
eyrie is part of the hawk-eco:
| Component | Repository | Purpose |
|---|---|---|
| hawk | GrayCodeAI/hawk | AI coding agent |
| eyrie | This repo | LLM provider runtime |
| tok | GrayCodeAI/tok | Tokenizer & compression |
| yaad | GrayCodeAI/yaad | Graph-based memory |
| trace | GrayCodeAI/trace | Session capture |
- Go 1.26+
go build ./... # Verify the library compiles
go test -race ./... # Run all tests with race detector
make ci # Run full CI suite (lint, test, security)
make cover # Generate coverage reportWe welcome contributions! Please see CONTRIBUTING.md for development setup, commit conventions, and the PR process.
Quick start:
- Fork and create a branch:
git checkout -b feat/short-description - Make changes in small, focused commits
- Run
make cilocally - Open a pull request
Use Conventional Commits for commit messages — release-please uses them for versioning.
MIT — see LICENSE for details.
© 2026 GrayCode AI