Skip to content

Alpy16/Zero-KV

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Zero-KV

Rust Tokio License: MIT

About

Zero-KV is a disk-backed, read-optimized key-value storage engine for low-latency local environments.

The project explores systems-level performance techniques in Rust: immutable binary storage, memory-mapped reads, fixed-width binary frames, Unix Domain Sockets, batched request handling, and vectored writes.

The engine is intentionally scoped around a simple workload: fast local lookups from a pre-baked immutable dataset.

Features

  • Memory-mapped storage: Uses memmap2 to expose the baked database file through the OS page cache.
  • Fixed binary protocol: Uses 16-byte request frames and 8-byte response headers for predictable parsing.
  • Unix Domain Socket transport: Uses UDS for local inter-process communication without TCP loopback overhead.
  • Sorted immutable index: Stores fixed-width index entries and performs binary search over the mapped index region.
  • Input-driven baking: Compiles a simple key,value input file into an immutable binary storage.db.
  • Batched request handling: Reads 4KB request batches and processes up to 256 pipelined requests per batch.
  • Vectored responses: Uses write_vectored to write response headers and value slices without assembling a separate contiguous response buffer.
  • Partial-write safety: Handles partial vectored writes by advancing through written slices until the full response batch is flushed.
  • Cache-conscious layout: Aligns stored value regions to 8-byte boundaries to simplify offset calculation and reduce unaligned access concerns.
  • Reusable hot-path buffers: Reuses stack-allocated response metadata inside the server loop to reduce allocator pressure during steady-state handling.

Quickstart

Requirements

  • Rust stable
  • Linux or WSL2
  • Unix Domain Socket support

Installation

Clone the repository and build the release binaries:

git clone https://github.com/Alpy16/Zero-KV.git
cd Zero-KV
cargo build --release

Generate a Dataset

Zero-KV expects a simple CSV-like input file:

key,value

Example:

1,hello
2,world
100,First value content

Generate a 100,000-entry benchmark dataset:

seq 1 100000 | awk '{print $1 ",value_content_at_key_" $1}' > bench_input.csv

Bake the Storage File

Compile the input dataset into an immutable binary database:

cargo run --release --bin baker -- bench_input.csv

This creates:

storage.db

The server loads this file at startup.

Run the Server

RUST_LOG=error cargo run --release --bin kv_store

By default, the server listens on:

/tmp/zero-kv.sock

Run the Benchmark

In another terminal:

cargo run --release --bin benchmarker

The benchmarker launches 100 concurrent Unix socket clients, each sending 256-request pipelines against keys in the baked dataset.

Full Lifecycle Script

The repository includes cycle.sh, which runs the full build, bake, serve, and benchmark cycle.

chmod +x cycle.sh
./cycle.sh

The script performs the following steps:

  1. Builds release binaries
  2. Generates a 100,000-entry dataset
  3. Bakes storage.db
  4. Starts the server in the background
  5. Waits for the Unix socket to become available
  6. Runs the saturation benchmark
  7. Checks that the server survived the benchmark
  8. Cleans up temporary input files

Performance Metrics

The benchmark is designed to measure saturated local read throughput under pipelined Unix Domain Socket traffic.

Latest benchmark configuration:

Parameter Value
Dataset size 100,000 entries
Concurrent clients 100
Pipeline depth 256 requests
Run duration 10 seconds
Transport Unix Domain Sockets
Storage mode Immutable mmap-backed file
Workload Existing-key reads
Hit rate 100%

Latest observed result:

Metric Result
Throughput 10,405,284 requests/sec
Batch P50 latency 2.387ms
Batch P99 latency 5.413ms
Total requests 104,117,504

Latency values represent completion time for a full 256-request pipeline, not individual request latency.

Benchmark results are environment-dependent. The result above was collected in a local Linux/WSL2 environment using Unix Domain Sockets.

Performance Telemetry

Runtime behavior was inspected with samply and Firefox Profiler.

View interactive profile trace

In the captured steady-state profile, allocator activity was not visible in the request hot path. Most runtime was concentrated around socket polling, batched request handling, and mmap-backed lookup logic.

Architecture

Zero-KV
├── src/
│   ├── lib.rs              # Protocol types, binary frame layout, shared constants
│   ├── storage.rs          # Mmap-backed storage engine and index lookup
│   ├── main.rs             # UDS server and request handling loop
│   └── bin/
│       ├── baker.rs        # Dataset compiler: input file -> immutable storage.db
│       └── benchmarker.rs  # Saturation benchmark client
├── Cargo.toml
├── cycle.sh                # Build, bake, serve, benchmark lifecycle script
└── README.md

Storage Format

The baked database is immutable and laid out as:

+------------------+
| Header           |
+------------------+
| IndexEntry[0]    |
| IndexEntry[1]    |
| ...              |
| IndexEntry[n]    |
+------------------+
| Value bytes      |
| 8-byte padding   |
| Value bytes      |
| 8-byte padding   |
| ...              |
+------------------+

Each index entry stores:

key:        u64
val_offset: u64
val_len:    u32
padding:    u32

Because index entries are fixed-width and sorted by key, lookups use binary search over the mapped index region.

Protocol

Each request is a fixed 16-byte frame:

op:      u32
padding: u32
key:     u64

Supported operations:

Opcode Operation
0 Get
1 Exists

Each response begins with an 8-byte header:

status: u32
length: u32

Response statuses:

Status Meaning
0 Ok
1 Not found
2 Error

If length > 0, the response header is followed by the value bytes.

Architecture Decision Records

ADR 1: Unix Domain Sockets over TCP

Context: The engine is designed for local IPC workloads where TCP loopback introduces unnecessary protocol overhead.

Decision: Use Unix Domain Sockets as the default transport.

Consequence: The transport remains local-only, but avoids TCP/IP framing and checksum overhead for same-machine clients.

ADR 2: Immutable Baked Storage

Context: Supporting writes, deletes, and compaction would require additional synchronization and recovery logic.

Decision: Compile input data into a sorted immutable binary file before server startup.

Consequence: Runtime lookup logic stays simple and fast, while mutation support is intentionally out of scope.

ADR 3: Memory-Mapped I/O

Context: Standard read calls require explicit user-space buffers and repeated syscall interaction.

Decision: Use memmap2 to map the database file and let the OS page cache manage file-backed memory.

Consequence: Lookups can return borrowed slices from the mapped file without copying values into a separate storage buffer.

ADR 4: Fixed-Width Binary Frames

Context: Text protocols and variable-width request formats add parsing overhead and allocation pressure.

Decision: Use fixed-width request and response headers with explicit binary layouts.

Consequence: Request decoding becomes predictable and cheap, but the protocol is less flexible than a self-describing format.

ADR 5: Batched Reads and Vectored Writes

Context: Handling one request per syscall limits throughput under high request rates.

Decision: Read 4KB request batches and flush response headers/value slices through vectored I/O.

Consequence: Syscall cost is amortized across many pipelined requests.

ADR 6: Partial-Write-Safe Vectored Output

Context: A single vectored write is not guaranteed to flush every byte in the provided slice list.

Decision: Track the number of bytes written, advance through completed slices, trim partially written slices, and continue writing until the full response batch is sent.

Consequence: The server preserves response framing correctness while keeping the zero-copy response path.

Safety Notes

The storage engine uses a raw pointer internally to reference the mapped index region.

This avoids self-referential lifetime issues while allowing the read-only storage object to be shared across worker tasks. The pointer is derived from the owned mmap, the mmap length is validated before pointer construction, and the mapped file is never mutated by the engine after startup.

The server response path uses borrowed slices into preallocated response headers and mmap-backed values. Vectored writes are advanced carefully so partial writes do not corrupt the client-visible response stream.

Limitations

  • Read-only storage engine
  • No writes, deletes, compaction, or replication
  • No crash-recovery layer beyond immutable file loading
  • Local IPC only through Unix Domain Sockets
  • Benchmark is optimized for local saturated read throughput
  • Batch latency is reported per 256-request pipeline, not per individual request
  • Current input format is simple CSV-like key,value and does not support escaping commas inside values
  • Benchmark workload currently measures 100% existing-key reads

License

MIT

About

A high-performance, disk-backed storage engine leveraging memory-mapped I/O and a custom binary protocol to achieve zero-copy data transfers and sub-30μs median lookup latencies.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors