Feature/llm cache redis semantic issue 362 by Francis6-git · Pull Request #417 · Traqora/astroml

Francis6-git · 2026-06-27T00:23:34Z

Description

Implements an optimized, provider-agnostic semantic caching layer for LLM responses under #362. To achieve the required sub-50ms lookup performance without introducing heavy external vector database dependencies, this implementation leverages an inline Python cosine similarity calculation engine sweeping a bounded window of recent candidates managed through Redis Sorted Sets (ZSET).

Technical Architecture

Time-Ordered Candidate Bucketing (astroml/cache/llm_semantic_cache.py): Candidate lookups are pinned directly to ZSET blocks by target LLM models ({namespace}:idx:{model}:all). New completions are logged with chronological epoch milliseconds as their tracking scores.
Soft Capacity Caps: The tracking index dynamically prunes itself to hold a soft maximum capacity ceiling ($10 \times \text{Lookback } K$, with a baseline floor of $100$ items) via zremrangebyrank to safeguard operational memory constraints and ensure O(log(N)) execution timelines.
Provider Decoupled Core Layer (astroml/llm/llm_cached_client.py): Exposes an expandable LLMProvider structurally typed protocol and an underlying LLMEmbeddingProvider constructor to support hot-swapping embedding models natively without breaking API pipelines.
Telemetry Metrics Integration: Tracks rolling cache totals (hits, misses) along with high-precision time.perf_counter() calculations via Redis pipelines, automatically generating calculated hit rates and average lookup latency execution times.

Validation Status against Acceptance Criteria

1. Lookup Latency SLA (< 50ms)

Bounding the evaluation array to a tunable candidate_top_k threshold configuration payload safely constrains linear _cosine_similarity iteration times.
Combined Redis operations utilize multi-key pipeline() batches to minimize round-trip connection overheads.

2. Targeted Cache Hit Rate (> 40%)

Includes a configurable threshold variable LLM_CACHE_SIMILARITY_THRESHOLD (defaults to 0.88) allowing runtime matching optimization parameters across text structures.

Closes #362

…a#361)

drips-wave · 2026-06-27T00:23:43Z

@Francis6-git Great news! 🎉 Based on an automated assessment of this PR, the linked Wave issue(s) no longer count against your application limits.

You can now already apply to more issues while waiting for a review of this PR. Keep up the great work! 🚀

Learn more about application limits

Francis6-git added 3 commits June 27, 2026 00:42

feat(llm): implement token usage tracking and cost monitoring (Traqor…

2724c1d

…a#361)

feat(llm): implement redis semantic caching layer (Traqora#362)

5028528

del: removed the ToDo.md file

3430718

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature/llm cache redis semantic issue 362#417

Feature/llm cache redis semantic issue 362#417
Francis6-git wants to merge 3 commits into
Traqora:mainfrom
Francis6-git:feature/llm-cache-redis-semantic-issue-362

Francis6-git commented Jun 27, 2026

Uh oh!

drips-wave Bot commented Jun 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Francis6-git commented Jun 27, 2026

Description

Technical Architecture

Validation Status against Acceptance Criteria

1. Lookup Latency SLA (< 50ms)

2. Targeted Cache Hit Rate (> 40%)

Uh oh!

drips-wave Bot commented Jun 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant