LLM Module - Architecture Guide

Version: 1.0 Last Updated: 2026-05-31 Module Path: src/llm/

1. Overview

The LLM module provides inference execution, routing, model and adapter lifecycle control, streaming, policy enforcement, and runtime safety surfaces.

2. Architecture Surfaces

Surface	Source files
Core inference engines	src/llm/async_inference_engine.cpp, src/llm/inference_engine_enhanced.cpp
Scheduling and queueing	src/llm/shared_worker_pool.cpp, src/llm/continuous_batch_scheduler.cpp
Routing and orchestration	src/llm/model_router.cpp, src/llm/ai_orchestrator.cpp
Model and plugin lifecycle	src/llm/llm_plugin_manager.cpp, src/llm/model_loader.cpp, src/llm/model_downloader.cpp
Adapter and LoRA lifecycle	src/llm/multi_lora_manager.cpp, src/llm/adapter_load_balancer.cpp, src/llm/lora_router.cpp
Streaming and response shaping	src/llm/streaming_handler.cpp, src/llm/openai_compat_adapter.cpp
Policy and safety controls	src/llm/prompt_policy.cpp, src/llm/llm_security_utils.cpp, src/llm/production_validator.cpp
Caching and resource controls	src/llm/llm_response_cache.cpp, src/llm/kv_cache_buffer.cpp, src/llm/token_quota_manager.cpp

3. Runtime Control Flow

Request enters engine submit path.
Policy/guard checks run before backend inference call.
Router and scheduling choose model/worker execution path.
Plugin/backend executes inference with cache/adapter/resource controls.
Result is emitted as full response or stream callback frames.

4. Integration Boundaries

Direction	Integration
Used by	API handlers, orchestration layers, AI runtime features
Uses	LLM backends/plugins, optional acceleration stacks, module-local safety controls
Exposes	inference APIs, routing hooks, adapter lifecycle hooks, streaming callbacks

5. Concurrency Model

Shared worker and scheduler components coordinate concurrent inference jobs.
Engine and manager components coordinate lifecycle state for models/adapters/plugins.
Streaming callbacks and cancellation paths are coordinated with request lifecycle state.

6. Known Limits

Some distributed and federated execution paths depend on deployment wiring and are not default-on.
Runtime behavior can vary across backend and acceleration configurations.
Cross-node and topology-sensitive benchmark coverage remains an ongoing hardening area.

7. Sourcecode Verification (Module: llm/architecture)

Verified files:
- src/llm/async_inference_engine.cpp
- src/llm/inference_engine_enhanced.cpp
- src/llm/shared_worker_pool.cpp
- src/llm/continuous_batch_scheduler.cpp
- src/llm/model_router.cpp
- src/llm/llm_plugin_manager.cpp
- src/llm/multi_lora_manager.cpp
- src/llm/streaming_handler.cpp
- src/llm/openai_compat_adapter.cpp
- src/llm/prompt_policy.cpp
- src/llm/llm_security_utils.cpp
- src/llm/token_quota_manager.cpp
Verified interfaces and behavior:
- request submit/schedule/execute flow
- routing and lifecycle integration points
- streaming and policy guard control surfaces
Note:
- Wave B tracking issue: https://github.com/makr-code/ThemisDB/issues/5039
- dependent Wave A issue: https://github.com/makr-code/ThemisDB/issues/5038
- follow-on Wave C issue: https://github.com/makr-code/ThemisDB/issues/5040

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLM Module - Architecture Guide

1. Overview

2. Architecture Surfaces

3. Runtime Control Flow

4. Integration Boundaries

5. Concurrency Model

6. Known Limits

7. Sourcecode Verification (Module: llm/architecture)

FilesExpand file tree

ARCHITECTURE.md

Latest commit

History

ARCHITECTURE.md

File metadata and controls

LLM Module - Architecture Guide

1. Overview

2. Architecture Surfaces

3. Runtime Control Flow

4. Integration Boundaries

5. Concurrency Model

6. Known Limits

7. Sourcecode Verification (Module: llm/architecture)