This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
This is the Inference Gateway CLI - a Go-based command-line interface for managing and interacting with AI inference services. It provides interactive chat, autonomous agent capabilities, and extensive tool execution for AI models.
Key Technology Stack:
- Language: Go 1.26+
- UI Framework: Bubble Tea (TUI framework)
- Gateway Integration: Via
inference-gateway/sdkandinference-gateway/adk - Storage Backends: JSONL (default), SQLite, PostgreSQL, Redis, In-memory
- Build Tool: Task (Taskfile)
- Environment: Flox (development environment manager)
# Build the binary
task build
# Run all tests
task test
# Run tests with verbose output
task test:verbose
# Run tests with coverage
task test:coverage
# Format code
task fmt
# Run linter
task lint# Run locally without building
task run CLI_ARGS="chat"
task run CLI_ARGS="status"
task run CLI_ARGS="version"
# Or after building
./infer chat
./infer agent "task description"
./infer status# Download Go modules
task mod:download
# Install pre-commit hooks
task precommit:install
# Run pre-commit on all files
task precommit:run# Regenerate all mocks (uses counterfeiter)
task mocks:generate
# Clean generated mocks
task mocks:clean# Build for current platform
task release:build
# Build macOS binary
task release:build:darwin
# Build portable Linux binary (via Docker)
task release:build:linux
# Build and push container images
task container:build
task container:pushcmd/ # CLI commands (cobra-based)
├── agent.go # Autonomous agent command
├── channels.go # Channel listener daemon command
├── chat.go # Interactive chat command
├── config.go # Configuration management commands
├── agents.go # A2A agent management
└── root.go # Root command and global flags
internal/
├── app/ # Application initialization
├── container/ # Dependency injection container
├── domain/ # Domain interfaces and models
│ ├── interfaces.go # Core service interfaces
│ └── filewriter/ # File writing domain logic
├── handlers/ # Message/event handlers
│ ├── chat_handler.go # Main chat orchestrator
│ ├── chat_message_processor.go # Message processing logic
│ └── chat_shortcut_handler.go # Shortcut command handling
├── services/ # Business logic implementations
│ ├── agent.go # Agent service
│ ├── conversation.go # Conversation management
│ ├── conversation_optimizer.go # Conversation compaction
│ ├── approval_policy.go # Tool approval logic
│ ├── tools/ # Tool implementations
│ │ ├── registry.go # Tool registry
│ │ ├── bash.go # Bash execution
│ │ ├── read.go, write.go # File I/O
│ │ ├── edit.go, multiedit.go # File editing
│ │ ├── web_search.go # Web search
│ │ └── mcp_tool.go # MCP integration
│ ├── channels/ # Pluggable messaging channels
│ │ └── telegram.go # Telegram Bot API channel
│ └── filewriter/ # File writing services
├── infra/ # Infrastructure layer
│ ├── storage/ # Conversation storage backends
│ │ ├── factory.go # Storage factory
│ │ ├── sqlite.go # SQLite implementation
│ │ ├── postgres.go # PostgreSQL implementation
│ │ ├── redis.go # Redis implementation
│ │ └── memory.go # In-memory implementation
│ └── adapters/ # External service adapters
├── ui/ # Terminal UI components
│ ├── components/ # Reusable UI components
│ ├── styles/ # Theme and styling
│ └── keybinding/ # Keyboard handling
├── shortcuts/ # Shortcut system
│ └── registry.go # Shortcut management
├── web/ # Web terminal interface
└── utils/ # Shared utilities
config/ # Configuration structs
└── config.go # Main config definition
The application uses a service container pattern (internal/container/container.go) for dependency management.
All services are initialized once and injected where needed:
- Configuration service
- Model service
- Agent service
- Tool service
- Conversation repository
- Storage backends
- MCP manager
Tools are self-contained modules that implement the domain.Tool interface:
- Tool Interface (
internal/domain/interfaces.go): DefinesExecute(),Definition(),Validate(),IsEnabled() - Tool Registry (
internal/services/tools/registry.go): Manages tool registration and lookup - Tool Implementations (
internal/services/tools/*.go): Individual tool logic - Approval System (
internal/services/approval_policy.go): Handles user approval for sensitive operations
- User input →
ChatHandler.Handle()→ routes to appropriate handler ChatMessageProcessorprocesses user message- Tool calls →
ToolService.Execute()→ Tool registry → Individual tool - Tool approval (if required) → Approval UI → Execute or reject
- LLM response → Stream to UI via Bubble Tea messages
- Conversation saved to storage backend
- Chat Mode: Interactive TUI with real-time user input and approval
- Agent Mode: Autonomous background execution with minimal user interaction
- Both use the same
AgentServicebut different handlers and UI flows
The conversation storage uses a factory pattern with pluggable backends:
- JSONL: Default, file-based, human-readable, zero-config
- SQLite: SQL-based, file-based, structured queries
- PostgreSQL: Production-grade, concurrent access
- Redis: Fast, in-memory, distributed setups
- Memory: Testing and ephemeral sessions
Backend selection is config-driven via config.yaml or environment variables.
ChatHandler Responsibilities:
- Orchestrates message flow between user, LLM, and tools
- Manages conversation state
- Routes shortcuts to
ChatShortcutHandler - Handles tool approval workflow
- Manages background bash shells
- Integrates with message queue for async operations
Key Handler Methods:
Handle(): Main entry point, routes messageshandleUserMessage(): Processes user inputhandleToolCalls(): Executes tool requests from LLMhandleShortcut(): Delegates to shortcut handler
When adding a new tool:
- Create tool file:
internal/services/tools/your_tool.go - Implement
domain.Toolinterface:Definition(): Returns SDK tool definition with JSON schemaExecute(ctx, args): Tool execution logicValidate(args): Parameter validationIsEnabled(): Check if tool is enabled
- Register tool: Add to
registry.goinregisterTools() - Add config: Update
config/config.goif tool needs configuration - Write tests: Create
your_tool_test.go - Update approval policy: If tool needs approval, configure in
approval_policy.go
Tool Parameter Extraction:
Use ParameterExtractor for type-safe parameter extraction:
extractor := tools.NewParameterExtractor(args)
filePath, err := extractor.GetString("file_path")
lineNum, err := extractor.GetInt("line_number")Important Tool Conventions:
- Always respect
ctxfor cancellation - Return
*domain.ToolExecutionResultwith meaningful output - Use
configto check if tool is enabled - File operations should use absolute paths
- Validate all user inputs before execution
The CLI uses a 2-layer configuration system:
- Project config:
.infer/config.yaml(project-specific) - Userspace config:
~/.infer/config.yaml(user defaults) - Environment variables:
INFER_*prefix (highest priority) - Command flags: Override config values
Key Config Sections:
gateway.*: Gateway connection settingsagent.*: Agent behavior (model, max_turns, system_prompt, custom_instructions)tools.*: Tool-specific configurationchat.*: Chat UI settings (theme, keybindings, status bar)web.*: Web terminal settingspricing.*: Cost tracking configurationcomputer_use.*: Computer use tool settings
Environment variable format: INFER_<PATH> (dots become underscores)
Example: agent.model → INFER_AGENT_MODEL
The CLI automatically enhances the model's context with project awareness to reduce confusion and improve accuracy.
When operating in a git repository, the model receives:
- Repository name (extracted from remote URL, e.g., "inference-gateway/cli")
- Current branch (e.g., "main", "feature/xyz")
- Main branch name (detected as "main" or "master")
- Recent commits (last 5 commits with hashes and messages)
This context is automatically injected into the system prompt on every request. The git context is cached and refreshed every N turns (configurable) to balance performance with up-to-date information.
The model receives the current working directory path, helping it understand:
- Where files should be read from or written to
- Which directory commands will execute in
- Project location context
- First prompt: +50-100ms (git command execution)
- Subsequent prompts: <1ms (cached)
- Token overhead: ~100-300 tokens (depends on git history)
- Git refresh: Every 10 turns by default (configurable)
Control via .infer/config.yaml:
agent:
context:
git_context_enabled: true # Enable git repository context
working_dir_enabled: true # Enable working directory context
git_context_refresh_turns: 10 # Refresh git context every N turnsOr via environment variables:
INFER_AGENT_CONTEXT_GIT_CONTEXT_ENABLED=true
INFER_AGENT_CONTEXT_WORKING_DIR_ENABLED=true
INFER_AGENT_CONTEXT_GIT_CONTEXT_REFRESH_TURNS=10Before:
- Model confused about repository name ("inference-gateway" vs "inference-gateway/cli" vs "inference-gateway/infer")
- No awareness of current branch or git state
- Unclear about working directory
After:
- Model knows exact repository:
inference-gateway/cli - Aware of current branch and recent commits
- Understands working directory context
- Reduced need for clarifying questions
- Location:
internal/services/agent_utils.go - Context builders:
buildGitContextInfo(),buildWorkingDirectoryInfo() - Git helpers:
isGitRepository(),getGitRepositoryName(),getGitBranch(),getGitMainBranch(),getRecentCommits() - Caching: Thread-safe caching via
sync.RWMutexinAgentServiceImpl - Error handling: All git operations fail gracefully (log debug, return empty string)
Shortcuts are YAML-defined commands stored in .infer/shortcuts/:
- Built-in shortcuts:
/clear,/exit,/help,/switch,/theme,/cost - Git shortcuts:
/git status,/git commit,/git push - SCM shortcuts:
/scm issues,/scm pr-create - Custom shortcuts: User-defined in project
Shortcuts support:
- Subcommands (e.g.,
/git commit) - AI-powered snippets (LLM-generated content)
- Command chaining
- Dynamic context injection
Test Organization:
- Unit tests:
*_test.gofiles alongside implementation - Mocks:
tests/mocks/(generated via counterfeiter)
Running Specific Tests:
# Test specific package
go test ./internal/services/tools
# Test specific function
go test ./internal/services/tools -run TestBashTool
# With race detector
go test -race ./...The CLI supports MCP servers for extended tool capabilities:
- MCP manager:
internal/services/mcp_manager.go - MCP tools:
internal/services/tools/mcp_tool.go - Configuration:
config.Tools.MCPServers
MCP servers are configured in .infer/config.yaml and tools are dynamically registered at runtime.
A2A enables agents to delegate tasks to specialized agents:
- Agent registry:
~/.infer/agents.yaml - A2A tools:
A2A_SubmitTask,A2A_QueryAgent,A2A_QueryTask - Agent polling: Background monitor for task status
- Configuration: Via
infer agentscommands
Channels provide pluggable messaging transports (Telegram, WhatsApp, etc.)
for remote-controlling the agent from external platforms. The
infer channels-manager command runs as a standalone daemon, completely
decoupled from the agent. Each incoming message triggers
infer agent --session-id <id> as a subprocess.
- Channels command:
cmd/channels.go - Channel Manager:
internal/services/channel_manager.go - Telegram channel:
internal/services/channels/telegram.go - Domain types:
Channel,InboundMessage,OutboundMessageininternal/domain/interfaces.go - Configuration:
config.Channelsinconfig/config.go
Channels are configured in .infer/config.yaml under the channels key.
Each channel has its own allowlist for security.
See docs/channels.md for full documentation.
When channels.require_approval is true (default), the channel manager
enables interactive tool approval via stdin/stdout IPC with the agent subprocess:
- Channel manager passes
--require-approvaltoinfer agent - Agent emits
ApprovalRequestJSON on stdout, blocks reading stdin - Channel manager detects request, sends approval prompt to user
- User replies "yes"/"no"; reply intercepted in
routeInbound()beforehandleMessage()to avoid sender mutex deadlock - Channel manager writes
ApprovalResponseJSON to agent stdin - 5-minute timeout auto-rejects if no reply
- IPC types:
internal/domain/ipc.go(ApprovalRequest,ApprovalResponse) - Agent side:
cmd/agent.go(executeToolCallsWithApproval,readApprovalResponses,outputApprovalRequest) - Channel manager side:
internal/services/channel_manager.go(handleApprovalRequest,parseApprovalRequest,isApprovalReply) - Reuses existing
tools.*.require_approvalandtools.safety.require_approvalconfig - Read-only tools (Tree, Read, Grep) default to
require_approval: false
- Implement
domain.Channelinterface ininternal/services/channels/ - Add config type to
config/config.go - Register in
registerChannels()incmd/channels.go - Add allowlist case in
channel_manager.goisAllowedUser()
When models use extended thinking (reasoning), their internal thought process is displayed as collapsible blocks above responses.
- Data Storage: Thinking content is stored in
ConversationEntry.ThinkingContentfield - Event Flow: Reasoning content flows through
StreamingContentEvent.ReasoningContentduring streaming - Rendering: Thinking blocks are rendered before assistant message content in
renderStandardEntry()andrenderAssistantWithToolCalls() - Display State: Collapsed by default, showing first sentence with ellipsis
- Styling: Rendered using dim color (theme-aware) with 💭 icon
- Expansion: Toggled via keybinding (configurable as
display_toggle_thinking, defaults toctrl+k)
internal/domain/interfaces.go:ConversationEntry.ThinkingContentfieldinternal/domain/ui_events.go:StreamingContentEvent.ReasoningContentfieldinternal/ui/components/conversation_view.go: Rendering logic and expansion stateconfig/keybindings.go: Keybinding definitioninternal/ui/keybinding/actions.go: Action handler registration
- Toggle thinking block expansion/collapse using the configured keybinding (default:
ctrl+k) - Default state: collapsed (first sentence visible)
- Expanded state: full thinking content with word wrapping
- Keybinding can be customized via
chat.keybindings.bindings.display_toggle_thinkingin config
This project uses Conventional Commits:
<type>[optional scope]: <description>
[optional body]
[optional footer]
Types: feat, fix, docs, style, refactor, perf, test, build, ci, chore, revert
Breaking changes: Add ! after type (e.g., feat!:) or footer BREAKING CHANGE:
Pre-commit hooks automatically validate commit messages.
- Make changes following Go best practices
- Run quality checks:
task precommit:run(runs formatting, linting, validation) - Test thoroughly:
task test - Commit with conventional commit message
- Pre-commit hooks run automatically on commit
- Push and create PR
Release Process:
Automated via semantic-release on main branch:
- Commit types determine version bumps
- Binaries built for macOS (Intel/ARM64) and Linux (AMD64/ARM64)
- GitHub releases created automatically with changelogs
- No CGO: Project uses pure Go dependencies for portability
- Flox environment: Use
flox activatefor consistent dev environment - Binary name: Built as
infer(notcli) - Gateway dependency: CLI requires Inference Gateway (auto-managed in Docker/binary mode)
- Storage migrations: SQLite and PostgreSQL use automatic schema migrations
- Tool safety: File modification tools require user approval by default
- Context limits: Conversation optimizer handles token limits automatically