feat: implement two-level model pooling and clustering#1
Merged
Conversation
Level 1 Pools: aggregate the same model across multiple providers under a single virtual model ID. Supports priority and round-robin selection strategies, with cross-member failover on 429/503 errors. Level 2 Clusters: group multiple pools (or raw model IDs) under semantic names (e.g., coding-high, coding-fast, chat, reasoning). Changes: - internal/config/config.go: ModelPoolConfig, ModelPool, ModelCluster, PoolMember, ClusterMember types; Config.ModelPools field - internal/pool/resolver.go: Resolver with Resolve, Reload, IsPoolOrCluster, ListVirtualModels; thread-safe hot-reload - internal/pool/resolver_test.go: 11 unit tests covering all scenarios - sdk/api/handlers/handlers.go: PoolResolver field on BaseAPIHandler; VirtualModels() helper; UpdateConfig hook for hot-reload; pool failover wired into Execute*, ExecuteCount*, ExecuteStream* - sdk/api/handlers/pool_execution.go: ExecuteWithPoolFailover and ExecuteStreamWithPoolFailover; isPoolFailoverError; member iteration - sdk/api/handlers/*/: Models() in OpenAI/Claude/Gemini handlers now append virtual pool+cluster models to /v1/models listing - internal/api/handlers/management/model_pools.go: GET/PUT/PATCH/DELETE management API endpoints for pools and clusters - internal/api/server.go: register /v0/management/model-pools routes - config.example.yaml: full documentation with examples
There was a problem hiding this comment.
Hi @anschmieg! 👋
Your private repo does not have access to Sourcery.
Please upgrade to continue using Sourcery ✨
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements a two-level model pooling and clustering system.
Level 1 — Pools
Aggregate the same logical model from multiple providers under a single virtual model ID.
Example: Request
claude-sonnet-4→ gateway tries Kiro first, then Claude API on 429/503.Level 2 — Clusters
Group multiple pools under semantic intent-based names.
Example:
coding-high,coding-fast,chat,reasoning,vision.Changes
New Files
internal/pool/resolver.go— thread-safe pool/cluster resolver with hot-reloadinternal/pool/resolver_test.go— 11 unit testssdk/api/handlers/pool_execution.go—ExecuteWithPoolFailover+ExecuteStreamWithPoolFailoverinternal/api/handlers/management/model_pools.go— management API handlersModified Files
internal/config/config.go—ModelPoolConfig,ModelPool,ModelClustertypessdk/api/handlers/handlers.go—PoolResolverfield, hot-reload inUpdateConfig, pool failover in allExecute*methodssdk/api/handlers/*/—Models()includes pool/cluster virtual modelsinternal/api/server.go— management routes registeredconfig.example.yaml— full documentation with examplesTesting