HelixCode is a distributed AI development platform designed for enterprise-grade scalability and reliability. The architecture follows microservices principles with clear separation of concerns and robust distributed computing capabilities.
- SSHWorkerPool: Manages SSH-accessible worker nodes
- Worker Registration: Automatic discovery and registration
- Health Monitoring: Real-time worker health checks
- Capability Detection: Automatic hardware and software capability detection
- Auto-Install: Automatic Helix CLI installation on worker nodes
- WorkerManager: Central worker lifecycle management
- Resource Allocation: Dynamic resource allocation based on capabilities
- Load Balancing: Intelligent task distribution across workers
- Failure Recovery: Automatic worker recovery and task reassignment
type LLMProvider interface {
Generate(ctx context.Context, req *LLMRequest) (*LLMResponse, error)
GenerateStream(ctx context.Context, req *LLMRequest, ch chan<- LLMResponse) error
GetModels() []Model
GetCapabilities() []ModelCapability
GetHealth(ctx context.Context) (*ProviderHealth, error)
IsAvailable(ctx context.Context) bool
Close() error
}-
Local Models:
- Llama.cpp: Direct local inference
- Ollama: Streamlined local model management
-
Cloud Providers:
- Anthropic Claude: Extended thinking, prompt caching, tool caching, 200K context
- Google Gemini: 2M token context, function calling, safety settings
- OpenAI: GPT-4, GPT-3.5-turbo with function calling
- xAI: Grok models with reasoning capabilities
- Qwen: Chinese language models with OAuth2
-
Aggregators:
- OpenRouter: Multi-provider access with unified API
- GitHub Copilot: GitHub integration for multiple models
-
Extended Thinking: Automatic reasoning mode for complex tasks (Anthropic)
- Keyword-based detection
- 80% token budget allocation
- Transparent thinking process
-
Prompt Caching: Multi-layer caching for cost optimization (Anthropic)
- System message caching (5-minute TTL)
- Conversation history caching
- Tool definition caching
- Up to 90% cost reduction
-
Massive Context: 2M token context windows (Gemini)
- Full codebase analysis
- Long-form documentation processing
- Complex multi-file reasoning
-
Function Calling: Structured tool integration
- AUTO mode: Automatic tool selection
- ANY mode: Force tool usage
- NONE mode: Disable tools
-
Vision Capabilities: Image understanding (Anthropic, Gemini)
- Code screenshot analysis
- Diagram interpretation
- UI/UX review
-
Streaming: Real-time response generation
- Server-Sent Events (SSE)
- Chunk-based updates
- Progress indicators
- Chain-of-Thought: Step-by-step reasoning with intermediate results
- Tree-of-Thoughts: Multiple reasoning paths with selection
- Self-Reflection: Error correction and improvement cycles
- Progressive Reasoning: Incremental reasoning with tool integration
- Extended Thinking: Deep reasoning with transparent thought process
- Stdio Transport: Process-based communication
- SSE Transport: Server-Sent Events for real-time updates
- HTTP Transport: RESTful API communication
- WebSocket Transport: Bidirectional real-time communication
- Dynamic Tool Registration: Runtime tool discovery
- Multi-Server Support: Concurrent MCP server management
- Authentication: OAuth2 and API key support
- Resource Management: Efficient resource allocation and sampling
- REST API: Comprehensive HTTP API with OpenAPI specification
- Terminal UI: Rich interactive terminal interface
- CLI: Command-line interface for scripting and automation
- Mobile Apps: Native iOS and Android applications
- HTTP/REST: Standard RESTful API
- WebSocket: Real-time bidirectional communication
- SSH: Secure shell for worker communication
- MCP: Model Context Protocol for tool integration
- Slack: Webhook and bot integration
- Discord: Bot API with rich embeds
- Telegram: Bot API with media support
- Email: SMTP with HTML templates
- Yandex Messenger: Russian platform integration
- Max: Enterprise communication platform
- Rule-Based Routing: Configurable notification rules
- Template System: Customizable message templates
- Priority System: Priority-based delivery
- Fallback Strategies: Multi-channel fallback
- users: User authentication and profiles
- user_sessions: Active user sessions
- workers: Distributed worker nodes
- worker_metrics: Performance metrics collection
- distributed_tasks: Task management with work preservation
- task_checkpoints: Automatic checkpointing system
- projects: Project management
- sessions: Development sessions
- notifications: Notification system
- mcp_servers: MCP server configurations
- llm_models: LLM model management
- Automatic Checkpointing: Periodic task state saving
- Dependency Management: Task dependency tracking
- Criticality Levels: Task importance classification
- Rollback System: Automatic rollback on failures
- Graceful Degradation: System stability during failures
- JWT-Based Authentication: Secure token-based authentication
- Role-Based Access Control: Fine-grained permission system
- Multi-Factor Authentication: Enhanced security options
- Session Management: Secure session handling
- End-to-End Encryption: All communications encrypted
- Secure Key Management: Proper key rotation and storage
- Input Validation: Comprehensive input sanitization
- Security Headers: HTTP security headers
- Response Time: <500ms for all operations
- Resource Efficiency: >85% hardware utilization
- Scalability: Support for 100+ concurrent workers
- Availability: 99.9% uptime for core features
- Horizontal Scaling: Worker pool expansion
- Load Balancing: Intelligent task distribution
- Distributed Caching: Efficient state management
- Resource Optimization: Dynamic resource allocation
services:
helixcode-server:
image: helixcode/server:latest
environment:
- DATABASE_URL=postgres://user:pass@db:5432/helixcode
- REDIS_URL=redis://redis:6379
ports:
- "8080:8080"
worker-node-1:
image: helixcode/worker:latest
environment:
- HELIX_SERVER_URL=http://helixcode-server:8080
- WORKER_CAPABILITIES=llm-inference,code-generation- Database Replication: PostgreSQL streaming replication
- Load Balancer: Round-robin worker distribution
- Health Checks: Comprehensive system monitoring
- Backup Strategy: Automated backup and recovery
- System Metrics: CPU, memory, disk usage
- Application Metrics: Request rates, error rates, response times
- Business Metrics: User activity, task completion rates
- Worker Metrics: Health status, performance metrics
- Structured Logging: JSON-formatted logs
- Log Levels: Debug, Info, Warn, Error
- Log Aggregation: Centralized log collection
- Audit Logging: Security and compliance logging
- Distributed project analysis
- Multi-source technology research
- Architecture design with collaborative input
- Resource requirement calculation
- Distributed compilation and building
- Parallel code generation
- Build artifact caching
- Cross-platform build support
- Distributed test execution
- Parallel test suites
- Comprehensive quality scanning
- Performance testing across workers
- Distributed refactoring operations
- Cross-file refactoring coordination
- Safety validation and rollback
- Collaborative refactoring sessions
- LLM Providers:
- Local: Llama.cpp, Ollama
- Cloud: Anthropic Claude, Google Gemini, OpenAI, xAI, Qwen
- Aggregators: OpenRouter, GitHub Copilot
- Version Control: Git integration
- CI/CD Systems: Jenkins, GitHub Actions
- Monitoring Tools: Prometheus, Grafana
- Extension Points: Well-defined extension interfaces
- Hot Reloading: Runtime plugin loading
- Dependency Management: Plugin dependency resolution
- Security Sandboxing: Secure plugin execution
- Edge Computing: Edge device integration
- Federated Learning: Distributed model training
- Blockchain Integration: Immutable task tracking
- Quantum Computing: Quantum algorithm support
- Microservices: Further service decomposition
- Event-Driven Architecture: Event sourcing implementation
- Service Mesh: Advanced service communication
- Multi-Region Deployment: Global distribution
Architecture Version: 1.1.0 Last Updated: 2025-11-05 Compatibility: Go 1.26+, PostgreSQL 15+, Redis 7+
-
Anthropic Claude Provider: Full API implementation with advanced features
- Extended thinking with automatic detection
- Multi-layer prompt caching (system/messages/tools)
- Tool caching for repeated operations
- Vision support for image analysis
- Streaming with Server-Sent Events
-
Google Gemini Provider: Complete API integration with massive context support
- 2M token context windows (Gemini 2.5 Pro, 1.5 Pro)
- Function calling with AUTO/ANY/NONE modes
- Configurable safety settings
- System instruction separation
- Vision and multimodal capabilities
- Unified provider interface for all LLM backends
- Health monitoring and availability checks
- Model capability introspection
- Streaming support across all providers
- Error handling with context-aware retries