Zero-KV

About

Zero-KV is a disk-backed, read-optimized key-value storage engine for low-latency local environments.

The project explores systems-level performance techniques in Rust: immutable binary storage, memory-mapped reads, fixed-width binary frames, Unix Domain Sockets, batched request handling, and vectored writes.

The engine is intentionally scoped around a simple workload: fast local lookups from a pre-baked immutable dataset.

Features

Memory-mapped storage: Uses memmap2 to expose the baked database file through the OS page cache.
Fixed binary protocol: Uses 16-byte request frames and 8-byte response headers for predictable parsing.
Unix Domain Socket transport: Uses UDS for local inter-process communication without TCP loopback overhead.
Sorted immutable index: Stores fixed-width index entries and performs binary search over the mapped index region.
Input-driven baking: Compiles a simple key,value input file into an immutable binary storage.db.
Batched request handling: Reads 4KB request batches and processes up to 256 pipelined requests per batch.
Vectored responses: Uses write_vectored to write response headers and value slices without assembling a separate contiguous response buffer.
Partial-write safety: Handles partial vectored writes by advancing through written slices until the full response batch is flushed.
Cache-conscious layout: Aligns stored value regions to 8-byte boundaries to simplify offset calculation and reduce unaligned access concerns.
Reusable hot-path buffers: Reuses stack-allocated response metadata inside the server loop to reduce allocator pressure during steady-state handling.

Quickstart

Requirements

Rust stable
Linux or WSL2
Unix Domain Socket support

Installation

Clone the repository and build the release binaries:

git clone https://github.com/Alpy16/Zero-KV.git
cd Zero-KV
cargo build --release

Generate a Dataset

Zero-KV expects a simple CSV-like input file:

key,value

Example:

1,hello
2,world
100,First value content

Generate a 100,000-entry benchmark dataset:

seq 1 100000 | awk '{print $1 ",value_content_at_key_" $1}' > bench_input.csv

Bake the Storage File

Compile the input dataset into an immutable binary database:

cargo run --release --bin baker -- bench_input.csv

This creates:

storage.db

The server loads this file at startup.

Run the Server

RUST_LOG=error cargo run --release --bin kv_store

By default, the server listens on:

/tmp/zero-kv.sock

Run the Benchmark

In another terminal:

cargo run --release --bin benchmarker

The benchmarker launches 100 concurrent Unix socket clients, each sending 256-request pipelines against keys in the baked dataset.

Full Lifecycle Script

The repository includes cycle.sh, which runs the full build, bake, serve, and benchmark cycle.

chmod +x cycle.sh
./cycle.sh

The script performs the following steps:

Builds release binaries
Generates a 100,000-entry dataset
Bakes storage.db
Starts the server in the background
Waits for the Unix socket to become available
Runs the saturation benchmark
Checks that the server survived the benchmark
Cleans up temporary input files

Performance Metrics

The benchmark is designed to measure saturated local read throughput under pipelined Unix Domain Socket traffic.

Latest benchmark configuration:

Parameter	Value
Dataset size	100,000 entries
Concurrent clients	100
Pipeline depth	256 requests
Run duration	10 seconds
Transport	Unix Domain Sockets
Storage mode	Immutable mmap-backed file
Workload	Existing-key reads
Hit rate	100%

Latest observed result:

Metric	Result
Throughput	10,405,284 requests/sec
Batch P50 latency	2.387ms
Batch P99 latency	5.413ms
Total requests	104,117,504

Latency values represent completion time for a full 256-request pipeline, not individual request latency.

Benchmark results are environment-dependent. The result above was collected in a local Linux/WSL2 environment using Unix Domain Sockets.

Performance Telemetry

Runtime behavior was inspected with samply and Firefox Profiler.

View interactive profile trace

In the captured steady-state profile, allocator activity was not visible in the request hot path. Most runtime was concentrated around socket polling, batched request handling, and mmap-backed lookup logic.

Architecture

Zero-KV
├── src/
│   ├── lib.rs              # Protocol types, binary frame layout, shared constants
│   ├── storage.rs          # Mmap-backed storage engine and index lookup
│   ├── main.rs             # UDS server and request handling loop
│   └── bin/
│       ├── baker.rs        # Dataset compiler: input file -> immutable storage.db
│       └── benchmarker.rs  # Saturation benchmark client
├── Cargo.toml
├── cycle.sh                # Build, bake, serve, benchmark lifecycle script
└── README.md

Storage Format

The baked database is immutable and laid out as:

+------------------+
| Header           |
+------------------+
| IndexEntry[0]    |
| IndexEntry[1]    |
| ...              |
| IndexEntry[n]    |
+------------------+
| Value bytes      |
| 8-byte padding   |
| Value bytes      |
| 8-byte padding   |
| ...              |
+------------------+

Each index entry stores:

key:        u64
val_offset: u64
val_len:    u32
padding:    u32

Because index entries are fixed-width and sorted by key, lookups use binary search over the mapped index region.

Protocol

Each request is a fixed 16-byte frame:

op:      u32
padding: u32
key:     u64

Supported operations:

Opcode	Operation
0	Get
1	Exists

Each response begins with an 8-byte header:

status: u32
length: u32

Response statuses:

Status	Meaning
0	Ok
1	Not found
2	Error

If length > 0, the response header is followed by the value bytes.

Architecture Decision Records

ADR 1: Unix Domain Sockets over TCP

Context: The engine is designed for local IPC workloads where TCP loopback introduces unnecessary protocol overhead.

Decision: Use Unix Domain Sockets as the default transport.

Consequence: The transport remains local-only, but avoids TCP/IP framing and checksum overhead for same-machine clients.

ADR 2: Immutable Baked Storage

Context: Supporting writes, deletes, and compaction would require additional synchronization and recovery logic.

Decision: Compile input data into a sorted immutable binary file before server startup.

Consequence: Runtime lookup logic stays simple and fast, while mutation support is intentionally out of scope.

ADR 3: Memory-Mapped I/O

Context: Standard read calls require explicit user-space buffers and repeated syscall interaction.

Decision: Use memmap2 to map the database file and let the OS page cache manage file-backed memory.

Consequence: Lookups can return borrowed slices from the mapped file without copying values into a separate storage buffer.

ADR 4: Fixed-Width Binary Frames

Context: Text protocols and variable-width request formats add parsing overhead and allocation pressure.

Decision: Use fixed-width request and response headers with explicit binary layouts.

Consequence: Request decoding becomes predictable and cheap, but the protocol is less flexible than a self-describing format.

ADR 5: Batched Reads and Vectored Writes

Context: Handling one request per syscall limits throughput under high request rates.

Decision: Read 4KB request batches and flush response headers/value slices through vectored I/O.

Consequence: Syscall cost is amortized across many pipelined requests.

ADR 6: Partial-Write-Safe Vectored Output

Context: A single vectored write is not guaranteed to flush every byte in the provided slice list.

Decision: Track the number of bytes written, advance through completed slices, trim partially written slices, and continue writing until the full response batch is sent.

Consequence: The server preserves response framing correctness while keeping the zero-copy response path.

Safety Notes

The storage engine uses a raw pointer internally to reference the mapped index region.

This avoids self-referential lifetime issues while allowing the read-only storage object to be shared across worker tasks. The pointer is derived from the owned mmap, the mmap length is validated before pointer construction, and the mapped file is never mutated by the engine after startup.

The server response path uses borrowed slices into preallocated response headers and mmap-backed values. Vectored writes are advanced carefully so partial writes do not corrupt the client-visible response stream.

Limitations

Read-only storage engine
No writes, deletes, compaction, or replication
No crash-recovery layer beyond immutable file loading
Local IPC only through Unix Domain Sockets
Benchmark is optimized for local saturated read throughput
Batch latency is reported per 256-request pipeline, not per individual request
Current input format is simple CSV-like key,value and does not support escaping commas inside values
Benchmark workload currently measures 100% existing-key reads

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.vscode		.vscode
perf		perf
src		src
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md
cycle.sh		cycle.sh
profile.json.gz		profile.json.gz
server.log		server.log
storage.db		storage.db

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Zero-KV

About

Features

Quickstart

Requirements

Installation

Generate a Dataset

Bake the Storage File

Run the Server

Run the Benchmark

Full Lifecycle Script

Performance Metrics

Performance Telemetry

Architecture

Storage Format

Protocol

Architecture Decision Records

ADR 1: Unix Domain Sockets over TCP

ADR 2: Immutable Baked Storage

ADR 3: Memory-Mapped I/O

ADR 4: Fixed-Width Binary Frames

ADR 5: Batched Reads and Vectored Writes

ADR 6: Partial-Write-Safe Vectored Output

Safety Notes

Limitations

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Zero-KV

About

Features

Quickstart

Requirements

Installation

Generate a Dataset

Bake the Storage File

Run the Server

Run the Benchmark

Full Lifecycle Script

Performance Metrics

Performance Telemetry

Architecture

Storage Format

Protocol

Architecture Decision Records

ADR 1: Unix Domain Sockets over TCP

ADR 2: Immutable Baked Storage

ADR 3: Memory-Mapped I/O

ADR 4: Fixed-Width Binary Frames

ADR 5: Batched Reads and Vectored Writes

ADR 6: Partial-Write-Safe Vectored Output

Safety Notes

Limitations

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages