- Channel/Interviewer: System Design Interview
- Duration: 00:34:32
- Original Video: https://www.youtube.com/watch?v=iuqZvajTOyA
This document summarizes the key content of a system design mock interview. Watching the full video is still recommended.
Teach Me: 5 Years Old | Beginner | Intermediate | Advanced | (reset auto redirect)
Learn Differently: Analogy | Storytelling | Cheatsheet | Mindmap | Flashcards | Practical Projects | Code Examples | Common Mistakes
Check Understanding: Generate Quiz | Interview Me | Refactor Challenge | Assessment Rubric | Next Steps
- Problem Prompt (One-liner): Design a fast, scalable, highly available distributed cache that supports
PUT(key, value)andGET(key)for string keys and values. - Primary Scope: Local LRU cache → sharded distributed cache → consistent hashing → replication (leader/follower) → discovery/config service → client/proxy variations → TTL/expiration → observability and security notes.
- Non-Functional Priorities: Performance first, plus scalability and availability; durability is secondary because it’s “just a cache.”
- Key Constraints & Numbers: No specific QPS/latency numbers provided. Not stated in video.
High-Level Architecture (Text)
- App calls cache client; client checks local-cache (LRU).
- On miss, client selects shard with consistent hashing and calls remote cache (TCP/UDP).
- Shards run LRU store; writes go to leader; reads can go to leader or replicas.
- Configuration service discovers servers, monitors health, and (optionally) performs leader election/failover.
- Optional proxy or server-side routing instead of client-side hashing.
Top Trade-offs
- Client-side hashing vs proxy/server-side routing
- Dedicated cache cluster vs co-located cache
- Asynchronous replication (availability/perf) vs synchronous (consistency)
- Simplicity of host-list files vs dynamic discovery service
Biggest Risks/Failure Modes
- Hot shards / hot keys
- Domino effect on node failure with vanilla consistent hashing
- Inconsistency across replicas (async replication)
- Clients holding divergent server lists
- Data loss on leader crash before replicate (acceptable: treated as misses)
5-Min Review Flashcards
- Q: Why LRU needs a linked list plus hashmap? A: Hashmap for O(1) lookups; doubly linked list to reorder recency in O(1).
- Q: Why not simple
hash % N? A: Rehash storm on membership changes -> many misses. Use consistent hashing. - Q: What improves availability and hot-shard handling? A: Leader/follower replication; add read replicas.
- Q: Who knows shard membership? A: Cache client (or a proxy/servers); kept in sync via file, shared storage polling, or config service.
- Q: Two flaws of vanilla consistent hashing? A: Domino effect on failure; uneven ring split. Mitigate with virtual nodes, Jump Hash, proportional hashing.
- Q: Why async replication is acceptable here? A: Cache prioritizes speed; occasional loss → cache miss only.
- Product Pattern:
caching - System Concerns:
high-availability,low-latency,eventual-consistency,hot-key - Infra/Tech (mentioned):
microservices,redis,memcached,zookeeper[Personal note: Many stacks today prefer etcd or Consul for service discovery/coordination in cloud/Kubernetes environments due to operational fit.]
- Original Prompt: Add a distributed in-memory cache to speed reads and shield the datastore from latency/outages. Support
PUTandGET. - Use Cases: Faster repeated reads; resilience during datastore slowness/outages; reduced backend load.
- Out of Scope: Persistent storage guarantees beyond typical cache semantics. Exact API schemas not specified.
- APIs: Not stated in video.
Given in Video
- Functional:
PUT(key, value),GET(key); strings for both. - Non-Functional: Performance (top priority), scalability, availability; durability lower priority.
- Consistency: Eventual (async replication, multi-replica reads).
Assumptions (conservative)
- Values are moderately sized (<1 MB) and compressible.
- Read-heavy workload (e.g., 90/10).
- Latency budgets in low ms for p50/p95.
- Multi-DC deployment for resilience. Assumption—validate per stack.
Ask AI: Requirements & Constraints
Not stated in video—skipping numerical estimation.
- Local Cache: In-process LRU (hashmap + doubly linked list). TTL support via passive/active expiration.
- Remote Cache Cluster: Sharded nodes (LRU per node). Client-side consistent hashing. TCP or UDP transport. [Personal note: Prefer TCP for reliability; if using UDP, application-level retries/ordering are required.]
- Replication: Leader/follower per shard; writes to leader; reads from any. Cross-DC replicas. [Personal note: The video uses “master-slave” as an alias; prefer “leader/follower” terminology.]
- Discovery & Failover: File/shared storage polling or configuration service (e.g., ZooKeeper; Redis Sentinel). [Personal note: Where you already run Kubernetes, controller-based discovery or etcd-backed coordination often reduces extra infra.]
- Client vs Proxy: Either client library does hashing/routing or use a proxy (e.g., twemproxy) / server-side routing (Redis Cluster).
- Observability & Security: Metrics (latency, hit/miss, CPU/mem, I/O), access logging; network ACLs; optional at-rest/in-flight encryption (perf impact).
Ask AI: High-Level Architecture
- Role: Fast in-process hits; reduces remote calls.
- Core Idea: Hashmap for O(1) key→node; doubly linked list for O(1) move-to-head/tail eviction.
- Eviction: Remove tail on capacity; on GET/PUT move to head.
- TTL/Expiration: Passive on access; active via periodic sampler to avoid scanning all entries.
Ask AI: Subsystem - Local LRU Cache
- Role: Distribute keys across nodes; minimize rehash on membership change.
- Mechanics: Place servers on a ring; each owns keys until next clockwise server.
- Hot/Failure Mitigation: Virtual nodes; Jump Hash; proportional hashing.
- Client List Sync: All clients must share same server list to avoid split-brain routing.
Ask AI: Subsystem - Sharding & Hashing
- Protocol: Leader/follower; async replication favored. Reads from replicas allowed.
- Consistency: Eventual; possible divergent reads across leader vs replica.
- Failure: Promote follower on leader failure (via config service or intra-cluster election). [Personal note: If requirements change to strong consistency, quorum writes/reads or synchronous replication with fewer replicas can be considered—expect higher latency.]
Ask AI: Subsystem - Replication & Consistency
- Options: (1) Static file deployed with app; (2) Shared storage polling (e.g., S3) via daemon; (3) Config service with heartbeats and auto-un/register.
- Leader Election: Config service can monitor leaders and promote followers.
- Topology Source of Truth: Clients query config service for up-to-date membership.
- Client Library: Lightweight; binary search over sorted ring; handles failures; emits metrics.
- Proxy: e.g., twemproxy—de-shards clients, centralizes routing.
- Server-side Routing: Client connects to any node; node forwards to correct shard (Redis Cluster).
Ask AI: Subsystem - Client/Proxy Routing
| Topic | Option A | Option B | Video’s Leaning | Rationale |
|---|---|---|---|---|
| Deployment | Dedicated cache cluster | Co-located per service | Both viable | Dedicated: isolation and scaling independence; Co-located: cheaper & auto-scales with service. |
| Routing | Client-side hashing | Proxy/server-side | Neutral | Client is simpler infra; proxy centralizes logic and reduces client complexity. |
| Replication | Async | Sync | Async | Lower latency; accepts eventual consistency and rare loss. |
| Discovery | Static/shared file | Config service | Config service for automation | Automatic health-based membership, failover, and single source of truth. |
| Hashing | Vanilla ring | Virtual nodes / Jump Hash | Enhanced | Better balance; reduces domino effect. |
| [Personal note: Jump Hash (2014) is still a solid choice in 2025 for simple, well-balanced distribution.] |
- Replication/Quorum: Leader/follower; async; failover via config service.
- Latency Budget: Emphasis on constant-time cache ops; TCP/UDP network overhead minimal compared to datastore.
- Degradation: Treat missing/unavailable shard as cache miss; rely on backing store.
- Hot Shards: Add read replicas; advanced hashing; potential app-level key splitting.
Ask AI: Reliability & Performance
- Network: Restrict cache ports with firewalls; don’t expose directly to the internet unless required.
- Data: Optional client-side encryption before storing (with performance caveats). [Personal note: If encrypting, ensure keys are managed outside the cache path and weigh CPU/memory overhead.]
- Metrics: Faults, latency, hit/miss rate, CPU/memory, network I/O.
- Logs: Who/when/key/status; keep entries small but informative.
- Why: Many dependent services will blame cache during incidents—be ready with data.
- None explicitly modeled; video instead proposes future topic: heavy hitters / Top-K frequent items.
- What are the read/write ratios and latency targets that define “fast enough”?
- Are hot keys expected (e.g., trending content)? Should we pre-split or cache-bust?
- Do we require cross-region reads/writes, or is replication within region sufficient?
- Who owns membership truth—config service or orchestration layer (e.g., Kubernetes)?
- Is data sensitivity high enough to justify encryption overhead in cache?
- Start simple: local LRU with hashmap + DLL gives O(1) operations.
- Scale out via sharding and consistent hashing to avoid rehash storms.
- Availability and hot shards improve with leader/follower replication and more read replicas.
- Keep clients’ server lists consistent—prefer dynamic discovery and health monitoring.
- Consider proxy/server-side routing to simplify clients.
- Embrace eventual consistency; treat cache loss as acceptable misses for speed.
- LRU: Eviction policy that removes least recently accessed items first.
- Consistent Hashing: Hashing scheme minimizing key movement on membership changes.
- Leader/Follower Replication: Writes go to leader; followers replicate and serve reads.
- Virtual Nodes: Multiple positions per server on ring to balance load.
- Domino Effect: Cascading failures when one node’s load overwhelms the next after failure.
- TTL: Time-to-live; expiry metadata for entries.
- Implement LRU (hashmap + doubly linked list).
- Prototype a toy consistent hashing ring with virtual nodes and Jump Hash.
- Add async replication to a toy cache; simulate failures and failover.
- Build a tiny config service (heartbeats, membership, client sync).
- Measure hit/miss/latency with and without local cache layer.
- Source Video: https://www.youtube.com/watch?v=iuqZvajTOyA
- Channel: System Design Interview
- Note: This document is a summary of the linked mock interview.
I’m Ali Sol, a PHP Developer.
- Website: alisol.ir
- LinkedIn: linkedin.com/in/alisolphp