Skip to content

Latest commit

 

History

History
549 lines (407 loc) · 18.2 KB

File metadata and controls

549 lines (407 loc) · 18.2 KB

HomeArchitectureComparison

How cachekit Compares

cachekit vs Alternatives


Tip

TL;DR: Need caching? → cachekit. Multi-pod or single-process, with or without Redis. Built-in reliability features you didn't know you needed.


Quick Decision Tree

Are you caching in Python?
│
├─ Single process, no Redis needed? → @cache(backend=None) [L1-only]
├─ Multiple pods, need distributed cache? → @cache [L1+L2, out-of-the-box]
├─ Need to encrypt cached data? → @cache.secure [AES-256-GCM]
├─ Building libraries/packages? → cachekit batteries included
│
└─ Nothing else needed? [GOOD] Competitors can't match all features together

Feature Comparison Matrix

Feature lru_cache cachetools aiocache redis-cache dogpile.cache cachekit
L1-only mode - - -
Unhashable args (list/dict) - - - -
Async support Broken¹ - - -
TTL support -
Multi-pod - -
Circuit Breaker - - - - Partial
Distributed Locking - - - -
Zero-Knowledge Encryption - - - - -
Prometheus Metrics - - - - -
Pluggable Backends - - - - -
Managed Cloud Backend - - - - -
Upgrade Path None None Rewrite Rewrite Rewrite ✅ Seamless

Note

Type preservation: The default serializer (MessagePack/StandardSerializer) converts tuples to lists and frozensets to lists — this is consistent across all backends and modes, and ensures cross-language SDK compatibility (Rust, TypeScript, PHP).

If you need type preservation, use serializer='auto':

@cache(serializer='auto', ttl=300)
def fn(): return (1, 2, 3)  # tuple preserved on cache hit

AutoSerializer preserves tuples, sets, frozensets, datetime, UUID, and Decimal through serialization via type markers. The tradeoff: Python-only (other language SDKs won't understand the type markers). See Serializer Guide for details.

¹ lru_cache on async functions caches the coroutine object, not the result. The second await raises RuntimeError. See #77.


Feature Breakdown by Use Case

1. Single-Process Apps: @cache(backend=None) Beats lru_cache

Scenario: Building a CLI tool, batch processor, or single-instance service.

Tip

Why cachekit wins:

  • TTL support (lru_cache only has maxsize)
  • Unhashable arguments: Cache functions that take lists, dicts, nested structures — lru_cache and cachetools raise TypeError
  • Async support: Same @cache decorator works on async functions (lru_cache breaks on async)
  • Prometheus metrics built-in (zero setup)
  • Zero code changes to upgrade: Remove backend=None → distributed at any time

Tradeoff: The default serializer converts tuples to lists for cross-language compatibility. Use serializer='auto' to preserve Python types (tuples, sets, frozensets). See note above.

# Single-process, local development
@cache(backend=None, ttl=300)
def expensive_computation(x: int) -> dict:
    return {"result": x * 2, "computed_at": datetime.now()}

result = expensive_computation(42)  # Cached, no Redis required

Upgrade path when you scale:

# Just remove backend=None. That's it.
@cache(ttl=300)  # Now uses Redis automatically if REDIS_URL set
def expensive_computation(x: int) -> dict:
    return {"result": x * 2, "computed_at": datetime.now()}

Limitations of alternatives:

  • functools.lru_cache: No TTL, no metrics, no upgrade path. Crashes on unhashable args (lists, dicts). Breaks on async functions (caches coroutine, not result).
  • cachetools: More complex (choose TTLCache/LRUCache/etc), less ergonomic, no upgrade path. Crashes on unhashable args.

2. Multi-Pod Apps: @cache Beats aiocache/redis-cache

Scenario: Kubernetes deployment, multiple services, need distributed cache.

Important

Why cachekit wins:

  • L1+L2 caching: L1 hits ~50ns (local memory), L1 miss → L2 Redis (~2-7ms)
  • Circuit breaker: Redis down? Cache gracefully, don't cascade failures
  • Distributed locking: Prevents cache stampedes across pods
  • Encryption: Client-side AES-256-GCM, Redis never sees plaintext
  • Adaptive timeouts: Auto-adjust to Redis latency, not static
  • Metrics: Prometheus counters for hits/misses/errors
from cachekit import cache

@cache(ttl=3600)  # All features enabled by default
def get_user_data(user_id: int) -> dict:
    # First pod (same instance): L1 hit (~50ns)
    # Different pod: L2 hit from Redis (~2-7ms)
    # Redis down: Circuit breaker returns stale/None, doesn't crash
    # Multiple pods calling simultaneously: Distributed lock prevents stampede
    return fetch_user_from_db(user_id)

Architecture advantage (cache lookup flow):

sequenceDiagram
    actor Client
    participant L1 as L1 Cache<br/>(in-memory)
    participant L2 as Redis L2<br/>(shared)
    participant Lock as Distributed<br/>Lock
    participant Fn as Function<br/>(expensive)

    Client->>L1: get(key)
    alt L1 Hit ✅
        L1-->>Client: return (~50ns)
    else L1 Miss
        L1->>L2: get(key)
        alt L2 Hit ✅
            L2-->>L1: value (~2-7ms)
            L1->>L1: populate L1
            L1-->>Client: return value
        else L2 Miss (Cache Stampede Risk)
            L2->>Lock: acquire_lock()
            Lock-->>L2: locked (1 pod wins)
            L2->>L2: double-check cache
            L2->>Fn: execute()
            Fn-->>L2: result
            L2->>L2: store in Redis
            L2->>L1: populate L1
            L2->>Lock: release_lock()
            L2-->>Client: return result
        end
    end
    Note over L2,Lock: Lock ONLY on L2 miss<br/>Prevents 1000 pods → 1 pod<br/>executing expensive function
Loading

Limitations of alternatives:

  • aiocache: L2-only (every hit is 2-7ms network), no circuit breaker, no locking
  • redis-cache: Minimal features, no encryption, no metrics
  • dogpile.cache: More complex API, heavier dependencies

3. Infrastructure-Agnostic: Works Everywhere

Scenario: Need offline-first app, then add Redis, then add encryption.

Why cachekit wins:

  • Same code: @cache decorator works with or without Redis
  • Pluggable backends: Redis, CachekitIO, File, Memcached built-in, plus custom protocol
  • Graceful degradation: Redis down? L1 cache serves, app continues
  • No vendor lock-in: Backend protocol is documented, implement your own
# Development: No Redis needed
@cache(backend=None, ttl=300)
def compute(x):
    return x * 2

# Staging: Add Redis (one env var, zero code change)
# REDIS_URL=redis://localhost:6379
# @cache automatically uses Redis when available

# Production: Add encryption (one parameter)
@cache.secure(ttl=300)  # Enables AES-256-GCM
# CACHEKIT_MASTER_KEY=hex_encoded_key

Limitations of alternatives:

  • lru_cache: Locked to in-memory, no scaling path
  • aiocache: Locked to Redis/Memcached, async-only
  • dogpile.cache: Locked to configured backend, heavier setup

4. Production Reliability: Built-In Batteries

Scenario: Service scales and Redis starts failing. Need reliability without rewriting.

Why cachekit wins:

Circuit Breaker: Prevents cascading failures

@cache(ttl=300)  # Circuit breaker enabled by default
def fetch_data(key):
    # Redis is down → Circuit breaker catches errors
    # After N failures → Open circuit → return None instead of errors
    # Caller can handle gracefully, app stays up
    return db.query(key)

Distributed Locking: Prevents cache stampedes

@cache(ttl=300)  # Distributed locking enabled by default
def expensive_compute(key):
    # 1000 pods call simultaneously on cache miss
    # Only ONE pod calls expensive_compute
    # Others wait for L2 cache to be populated
    # No cascade of simultaneous DB queries
    return compute_expensive_value(key)

Adaptive Timeouts: Auto-tune to your infrastructure

@cache(ttl=300)  # Adaptive timeout enabled by default
def get_data(key):
    # Monitors Redis latency (P99)
    # If Redis is slower today → increase timeout automatically
    # No need to tune timeout constants for environment
    return fetch_data(key)

Limitations of alternatives:

  • lru_cache/cachetools: No failure handling
  • aiocache: No circuit breaker, manual locking complex
  • dogpile.cache: Lock implementation is complex, heavy API

5. Observability: Metrics Without Custom Code

Scenario: Service goes slow. Need to understand cache behavior.

Why cachekit wins:

Automatic Prometheus metrics:

cachekit_cache_hits_total{function="fetch_user"} 9847
cachekit_cache_misses_total{function="fetch_user"} 203
cachekit_cache_errors_total{function="fetch_user"} 5
cachekit_cache_hit_latency_seconds{...} histogram
cachekit_cache_miss_latency_seconds{...} histogram

No instrumentation code needed:

@cache(ttl=300)  # Metrics exported automatically
def get_user(id):
    return db.query(User).get(id)

# Prometheus scrape endpoint just works
# Grafana dashboards available

Limitations of alternatives:

  • lru_cache: No metrics at all
  • cachetools: No metrics integration
  • aiocache: Requires custom instrumentation
  • dogpile.cache: Manual metric collection needed

6. Managed Cloud Backend: No Redis Required

Scenario: You want distributed caching without standing up or operating Redis infrastructure.

Why cachekit wins:

  • CachekitIOBackend: Drop-in L2 backend backed by api.cachekit.io — no Redis cluster to provision, patch, or scale
  • Zero-knowledge compatible: Pair with @cache.secure and the managed backend stores only ciphertext, never your data
  • Same decorator API: Swap backend by setting CACHEKIT_API_KEY — zero code changes
from cachekit import cache

# Uses cachekit.io as the L2 backend (CACHEKIT_API_KEY in environment)
@cache.io(ttl=3600)
def get_user_data(user_id: int) -> dict:
    return fetch_from_db(user_id)

No other caching library offers a managed cloud backend with first-class decorator support.


Claim: "I need async caching → use aiocache"

Reality: cachekit has native async support with zero configuration — and unlike lru_cache, it actually works.

Warning

lru_cache on async functions is broken. It caches the coroutine object, not the result. The second await raises RuntimeError: cannot reuse already awaited coroutine. This is a well-known Python footgun.

# lru_cache on async — BROKEN
@lru_cache(maxsize=128)
async def fn(x): return x * 2

await fn(1)  # Works
await fn(1)  # RuntimeError: cannot reuse already awaited coroutine

# cachekit on async — works correctly
@cache(backend=None, ttl=300)
async def fn(x): return x * 2

await fn(1)  # Works — returns 2
await fn(1)  # Works — returns 2 (from cache)

cachekit auto-detects async functions and wraps them correctly. Same @cache decorator for both sync and async.

Note

Async cache management uses await fn.ainvalidate_cache() instead of fn.cache_clear(). See #76.

Evidence: tests/competitive/test_head_to_head.py::TestAsyncSupport — verified against real lru_cache behavior


Claim: "lru_cache is simpler"

Reality: Simplicity without features is expensive

# lru_cache: Simple but limited
@lru_cache(maxsize=128)
def compute(x): return x * 2
# What about TTL? What about metrics? What about scaling?

# cachekit: Simple AND powerful
@cache(ttl=300)  # TTL, metrics, scaling, circuit breaker, all included
def compute(x): return x * 2

Your choice: Simple but limited, or simple AND complete?


Claim: "dogpile.cache is battle-tested"

Reality: Maturity ≠ fitness for modern architectures

  • dogpile.cache designed for older patterns (thread-local cache, manual regions)
  • cachekit designed for cloud-native (pods, stateless, multi-tenant)
  • Dogpile's region pattern adds complexity for modern use cases

Migration Examples

From lru_cache → cachekit (No Rewrite)

# OLD (lru_cache)
from functools import lru_cache

@lru_cache(maxsize=128)
def compute_score(user_id):
    return db.get_user_score(user_id)

# NEW (cachekit) - just swap decorator and add TTL
from cachekit import cache

@cache(ttl=3600)
def compute_score(user_id):
    return db.get_user_score(user_id)

# That's it. No other changes.

From aiocache → cachekit (Direct Equivalent)

# OLD (aiocache)
from aiocache import cached
import asyncio

@cached(cache=RedisCache)
async def get_user(user_id):
    return await db.get_user(user_id)

# NEW (cachekit) - sync-first approach
from cachekit import cache

@cache(ttl=3600)
def get_user(user_id):
    return db.get_user(user_id)  # Works in sync/async contexts

From Single-Process → Multi-Pod (Zero Code Changes)

# Development (single process, no Redis)
@cache(backend=None, ttl=300)
def expensive_operation(x):
    return compute(x)

# Staging → Production (Redis added, just remove backend=None)
# Same decorator, same function, automatically distributed
@cache(ttl=300)  # REDIS_URL env var makes this distributed
def expensive_operation(x):
    return compute(x)

Validation Evidence

All competitive claims validated by automated tests against real libraries (not mocks):

Head-to-Head Suite: pytest tests/competitive/test_head_to_head.py -v

  • 50 tests across 10 data type categories and 7 behavioral dimensions
  • Tests against functools.lru_cache, cachetools, aiocache
  • Covers: primitives, collections, special floats, binary data, rich types, unhashable arguments, TTL, cache management, concurrency (10 threads), async support, edge cases

Key verified findings:

  • lru_cache and cachetools crash on unhashable args (TypeError) — cachekit handles them
  • lru_cache on async functions caches the coroutine, not the result (RuntimeError on second await) — no stdlib fix as of Python 3.12+
  • cachekit serializes in all modes (including L1-only) — tuples become lists via MessagePack
  • All libraries handle primitives, bytes, datetime, Decimal, UUID, Enum identically in-memory

Legacy Suite: pytest tests/competitive/ -v (includes older assertion-based tests)


Performance Benchmarks

Latest Benchmarks: Measured on real Redis instances

Operation cachekit L1 cachekit L2 lru_cache aiocache
Cache hit (in-process) ~50ns n/a ~50ns n/a
Cache hit (distributed) n/a ~2-7ms n/a ~2-7ms
Cache miss varies varies varies varies
Circuit breaker overhead <10ns <10ns N/A N/A
Encryption (AES-256-GCM) N/A ~500μs N/A N/A

Conclusion: cachekit L1 matches lru_cache, L2 matches competitors, plus all additional features.


Getting Started

Installation

pip install cachekit

Basic Usage (Local Development)

from cachekit import cache

@cache(backend=None, ttl=300)  # L1-only, no Redis needed
def get_user_profile(user_id: int) -> dict:
    # Expensive operation
    return fetch_from_db(user_id)

profile = get_user_profile(123)  # Cached!

Multi-Pod Production

# Set environment variable
export REDIS_URL="redis://redis.default:6379"
from cachekit import cache

@cache(ttl=3600)  # Automatically uses Redis, L1+L2
def get_user_profile(user_id: int) -> dict:
    return fetch_from_db(user_id)

With Encryption

export CACHEKIT_MASTER_KEY="your_hex_encoded_key"
from cachekit import cache

@cache.secure(ttl=3600)  # AES-256-GCM encryption
def get_sensitive_data(user_id: int) -> dict:
    return fetch_sensitive_data(user_id)

FAQ

Q: Which should I use, cachekit or lru_cache? A: lru_cache for simple in-process caching (5% of use cases). cachekit for everything else because the upgrade path is trivial and features are free.

Q: Why not just use Redis directly? A: cachekit is the right level of abstraction - adds L1 cache, reliability, metrics, encryption. Raw Redis requires building all this yourself.

Q: Is cachekit production-ready? A: Yes. Used in production by early adopters. Full test coverage, fuzzing validation, security audit completed.

Q: Can I use cachekit with FastAPI? A: Yes. Works with any framework (FastAPI, Django, Flask, etc). Metrics integrate with Prometheus/Grafana.

Q: What if Redis goes down? A: Circuit breaker catches errors, returns stale cache or None, app continues working.

Q: Can I use my own backend? A: Yes. Four built-in backends (Redis, CachekitIO, File, Memcached) or implement the BaseBackend protocol (~100 LOC) for custom storage.


Next Steps

Previous: Performance Guide - Real benchmarks and optimization Previous Alternative: Data Flow Architecture - System design details

  1. Single-process? Start with Getting Started
  2. Multi-pod? Read Circuit Breaker + Distributed Locking
  3. Need encryption? See Zero-Knowledge Encryption
  4. Want metrics? Check out Prometheus Metrics
  5. Performance critical? Review Serializer Guide

See Also