Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
#
# This project contains:
# - common/ : Host-DPU control channel library (CMake)
# - dpu-agent/ : BlueField DPU proxy service (CMake)
# - blue-cache/ : BlueField DPU proxy service (CMake)
# - examples/cpp/ : NIXL C++ example (CMake)
# - examples/standalone: Standalone host test tool (CMake)
# - nixl-plugin/ : NIXL backend plugin source. It is NOT built here directly;
Expand All @@ -29,7 +29,7 @@ option(BUILD_EXAMPLES "Build host-side examples (C++ NIXL example + standalone
add_subdirectory(common)

if(BUILD_DPU_AGENT)
add_subdirectory(dpu-agent)
add_subdirectory(blue-cache)
endif()

if(BUILD_EXAMPLES)
Expand Down
22 changes: 11 additions & 11 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,13 @@ Thank you for your interest in BlueCache. This document describes how to build,
## Repository Structure

- `common/` — Shared host-DPU wire protocol (`dma_transfer.h`).
- `nixl-plugin/` — NIXL `DOCA_DMA_PROXY` backend plugin source.
- `dpu-agent/` — BlueField DPU proxy service.
- `nixl-plugin/` — NIXL `BLUE_CACHE` backend plugin source.
- `blue-cache/` — BlueField DPU proxy service.
- `examples/` — Standalone C++ example and LMCache reference architecture.
- `scripts/` — `patch_nixl.sh`, `build_all.sh`, and helpers.
- `docs/` — Architecture and integration documentation.

## Building the DPU Agent
## Building blue-cache

Requirements:

Expand All @@ -25,7 +25,7 @@ mkdir build && cd build
export DOCA_DIR=/opt/mellanox/doca
export NIXL_ROOT=/opt/nvidia/nvda_nixl
cmake .. -DBUILD_EXAMPLES=OFF
make -j$(nproc) dpu_dma_copy
make -j$(nproc) blue-cache
```

## Patching NIXL with the Plugin
Expand All @@ -36,7 +36,7 @@ The plugin is designed to be injected into a NIXL source tree:
./scripts/patch_nixl.sh /path/to/nixl/source

cd /path/to/nixl/source
meson setup build -Denable_plugins=DOCA_DMA_PROXY
meson setup build -Denable_plugins=BLUE_CACHE
ninja -C build
```

Expand All @@ -59,19 +59,19 @@ cd examples/standalone
./scripts/build_host.sh

# On DPU
./build-dpu/dpu_dma_copy -p 0000:03:00.0 -m 256 -q 4
./build-dpu/blue-cache -p 0000:03:00.0 -m 256 -q 4

# On Host
./build-host/gpu_dma_copy -o push -p 0000:ba:00.0 -g 0 -f /tmp/test.bin -s 64
./build-host/gpu_dma_copy -o pull -p 0000:ba:00.0 -g 0 -f /tmp/test.bin -O /tmp/test.out
./build-host/blue-cache-host -o push -p 0000:ba:00.0 -g 0 -f /tmp/test.bin -s 64
./build-host/blue-cache-host -o pull -p 0000:ba:00.0 -g 0 -f /tmp/test.bin -O /tmp/test.out
```

### NIXL C++ example

Build with `-DBUILD_EXAMPLES=ON` and run after the DPU agent is started:

```bash
./build/examples/cpp/nixl_doca_dma_proxy_example 0000:ba:00.0 /tmp/dpu_object.bin
./build/examples/cpp/nixl_blue_cache_example 0000:ba:00.0 /tmp/dpu_object.bin
```

## Upstreaming to NIXL
Expand All @@ -82,10 +82,10 @@ When the plugin is ready for upstream NIXL:
2. Run `./scripts/patch_nixl.sh` against a clean NIXL checkout.
3. Review the diff in the NIXL tree.
4. Create a NIXL PR containing:
- Plugin source under `src/plugins/doca_dma_proxy/`
- Plugin source under `src/plugins/blue_cache/`
- Build integration in `meson.build` and `src/plugins/meson.build`
- Static plugin registration in `src/core/nixl_plugin_manager.cpp`
- Tests under `test/unit/plugins/doca_dma_proxy/` (when available)
- Tests under `test/unit/plugins/blue_cache/` (when available)
- Documentation updates

## Code Style
Expand Down
77 changes: 48 additions & 29 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,9 @@ A complete GPU KV-cache offload solution that moves KV tensors from Host GPU mem

This project provides an end-to-end pipeline for offloading GPU-resident data — primarily LLM KV caches — to storage attached to a local BlueField DPU. It is built from three integrated pieces:

1. **DPU Agent (`dpu-agent/`)** — Runs on the BlueField DPU ARM cores. It imports the remote GPU memory map, executes DOCA DMA operations, and writes incoming data to DPU-side storage backends.
2. **NIXL Plugin (`nixl-plugin/`)** — A host-side NIXL backend named `DOCA_DMA_PROXY`. It registers GPU buffers as `VRAM_SEG`, exports them over PCIe with DOCA DMA, and forwards transfer requests to the DPU agent.
3. **LMCache Integration (`examples/lmcache/`)** — A patch set and configuration example that enables LMCache v0.4.3 to use the `DOCA_DMA_PROXY` backend for transparent KV-cache tiering.
1. **blue-cache (`blue-cache/`)** — The DPU-side agent. It runs on the BlueField DPU ARM cores, imports the remote GPU memory map, executes DOCA DMA operations, and writes incoming data to DPU-side storage backends.
2. **NIXL Plugin (`nixl-plugin/`)** — A host-side NIXL backend named `BLUE_CACHE`. It registers GPU buffers as `VRAM_SEG`, exports them over PCIe with DOCA DMA, and forwards transfer requests to the DPU agent.
3. **LMCache Integration (`examples/lmcache/`)** — A patch set and configuration example that enables LMCache v0.4.3 to use the `BLUE_CACHE` backend for transparent KV-cache tiering.

Together these components let an application such as LMCache express a transfer as `VRAM_SEG ↔ OBJ_SEG` and have the actual PCIe DMA and storage I/O executed by the DPU.

Expand Down Expand Up @@ -45,7 +45,7 @@ By using the BlueField DPU's dedicated DOCA DMA engine, this solution:
│ Host │
│ ┌─────────────────────┐ ┌─────────────────────────────┐ │
│ │ LMCache / vLLM │ │ NIXL Agent │ │
│ │ (KV-cache manager) │───►│ + DOCA_DMA_PROXY backend │ │
│ │ (KV-cache manager) │───►│ + BLUE_CACHE backend │ │
│ └─────────────────────┘ │ - registers GPU VRAM │ │
│ │ - exports GPU mmap │ │
│ │ - sends transfer requests │ │
Expand All @@ -62,22 +62,25 @@ By using the BlueField DPU's dedicated DOCA DMA engine, this solution:
┌─────────────────────────────────────────────────────────────────────────────────────────┐
│ BlueField DPU │
│ ┌─────────────────────────────────────────────────────────────────────────────────┐ │
│ │ dpu_dma_copy agent │ │
│ │ blue-cache agent │ │
│ │ ┌───────────────┐ ┌───────────────┐ ┌─────────────────────────────────┐ │ │
│ │ │ DOCA DMA │───►│ staging buffer│───►│ NIXL storage backend │ │ │
│ │ │ engine │ │ (DPU DRAM) │ │ (posix / xdfs / xdfs_kv / ...) │ │ │
│ │ └───────────────┘ └───────────────┘ └─────────────────────────────────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ ┌───────────────┐ │ │
│ │ │ DPU-local │ │ │
│ │ │ NVMe / OBJ │ │ │
│ │ └───────────────┘ │ │
│ │ ┌──────────────┴──────────────┐ │ │
│ │ │ │ │ │
│ │ ▼ ▼ │ │
│ │ ┌─────────────┐ ┌─────────────────┐ │ │
│ │ │ DPU-local │ │ Remote Storage │ │ │
│ │ │ (posix) │ │ xdfs / xdfs_kv │ │ │
│ │ └─────────────┘ └─────────────────┘ │ │
│ └─────────────────────────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────────────────────┘
```

### DPU Agent
### blue-cache

The DPU agent is the piece that executes the offload. It runs as a service on the BlueField DPU and is intentionally separate from the NIXL library so it can evolve independently.

Expand All @@ -88,7 +91,7 @@ Responsibilities:
- Execute chunked, pipelined DOCA DMA with configurable queue depth.
- Forward received data to a NIXL storage backend running on the DPU, which in turn writes to local files or object storage.

Build and run instructions are in [`dpu-agent/README.md`](dpu-agent/README.md).
Build and run instructions are in [`blue-cache/README.md`](blue-cache/README.md).

### NIXL Plugin

Expand All @@ -105,7 +108,7 @@ Because NIXL loads backends dynamically, the plugin source is injected into a NI

[`examples/lmcache/`](examples/lmcache/) contains:

- `lmcache_integration.patch` — modifications to LMCache v0.4.3 to recognize and use the `DOCA_DMA_PROXY` backend.
- `lmcache_integration.patch` — modifications to LMCache v0.4.3 to recognize and use the `BLUE_CACHE` backend.
- `lmcache-config.yaml` — sample configuration.
- `patch_lmcache.sh` — helper that applies the patch idempotently.

Expand All @@ -117,7 +120,7 @@ After patching LMCache, you can configure a storage backend that points to the D
.
├── common/ # Shared host-DPU control channel + wire protocol (dma_transfer.h)
├── nixl-plugin/ # NIXL backend plugin source (patch into NIXL)
├── dpu-agent/ # BlueField DPU proxy service
├── blue-cache/ # BlueField DPU proxy service
├── examples/
│ ├── cpp/ # NIXL C++ example
│ ├── python/ # NIXL Python example
Expand All @@ -132,23 +135,21 @@ After patching LMCache, you can configure a storage backend that points to the D

## Quick Start

### 1. Build the DPU Agent
### 1. Build blue-cache

On the BlueField DPU:

```bash
export DOCA_DIR=/opt/mellanox/doca
export NIXL_ROOT=/opt/nvidia/nvda_nixl

mkdir -p build && cd build
cmake .. -DBUILD_EXAMPLES=OFF
make -j$(nproc) dpu_dma_copy
make -j$(nproc) blue-cache
```

Run the agent (TCP fallback mode for the easiest first test):

```bash
./dpu-agent/dpu_dma_copy -p 0000:03:00.0 -m 256 -q 4 -b posix -T
./blue-cache/blue-cache -p 0000:03:00.0 -m 256 -q 4 -b posix -T
```

Omit `-T` to use DOCA Comch mode.
Expand All @@ -161,7 +162,7 @@ On the host where NIXL is built:
./scripts/patch_nixl.sh /path/to/nixl/source

cd /path/to/nixl/source
meson setup build -Denable_plugins=DOCA_DMA_PROXY
meson setup build -Denable_plugins=BLUE_CACHE
ninja -C build
```

Expand All @@ -172,7 +173,7 @@ The patch script is idempotent; running it multiple times is safe.
```bash
export NIXL_PLUGIN_DIR=/opt/nvidia/nvda_nixl/lib/plugins

python3 examples/python/nixl_doca_dma_proxy_example.py \
python3 examples/python/nixl_blue_cache_example.py \
-o push \
-p 0000:ba:00.0 \
-g 0 \
Expand All @@ -192,7 +193,7 @@ This project has been verified against **NIXL v1.1.0**. Other NIXL versions may

- [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md) — Host plugin, DPU agent, control plane, and data plane design.
- [`docs/LMCache_INTEGRATION.md`](docs/LMCache_INTEGRATION.md) — KV-cache offload reference architecture.
- [`dpu-agent/README.md`](dpu-agent/README.md) — Build, run, and tune the DPU agent.
- [`blue-cache/README.md`](blue-cache/README.md) — Build, run, and tune the DPU-side agent.
- [`examples/python/README.md`](examples/python/README.md) — Python end-to-end example.
- `examples/standalone/` — Standalone host test tool that does not require NIXL.
- [`CONTRIBUTING.md`](CONTRIBUTING.md) — Build, test, and NIXL upstreaming workflow.
Expand All @@ -201,15 +202,33 @@ This project has been verified against **NIXL v1.1.0**. Other NIXL versions may

### NIXL build fails with `fatal error: toml++/toml.hpp: No such file or directory`

NIXL 1.1.0 uses `tomlplusplus` as a required dependency. If the telemetry plugin is enabled, the include path may not be propagated correctly.
NIXL 1.1.0 uses `tomlplusplus` as a required dependency. When the telemetry plugin is enabled, its `doca` backend may miss the `tomlplusplus` include path because `nixl_common_dep` is not listed in its dependencies.

Disable telemetry plugins before building:
**Recommended fix**: patch `src/plugins/telemetry/doca/meson.build` to add `nixl_common_dep`:

```diff
# In src/plugins/telemetry/doca/meson.build
- dependencies: [nixl_infra, absl_log_dep, doca_dep],
+ dependencies: [nixl_infra, nixl_common_dep, absl_log_dep, doca_dep],
```

Then rebuild:

```bash
cd /path/to/nixl/source
meson setup build --wipe -Denable_plugins=BLUE_CACHE
ninja -C build
```

This fix mirrors the upstream NIXL commit [`b98dd59`](https://github.com/ai-dynamo/nixl/commit/b98dd59f1f8854113ef38de5c3054b3e9294f0c9). It keeps telemetry enabled while correctly propagating the required include path.

**Fallback**: If you do not need telemetry, disable the telemetry plugins entirely:

```bash
cd /path/to/nixl/source
sed -i "s/^subdir('telemetry')/# subdir('telemetry')/" src/plugins/meson.build

meson setup build --wipe -Denable_plugins=DOCA_DMA_PROXY
meson setup build --wipe -Denable_plugins=BLUE_CACHE
ninja -C build
```

Expand All @@ -219,17 +238,17 @@ The C++ examples require CUDA Toolkit. On a machine without CUDA, disable exampl

```bash
cmake .. -DBUILD_EXAMPLES=OFF
make dpu_dma_copy
make blue-cache
```

Or build the DPU agent directly from the `dpu-agent/` directory:
Or build blue-cache directly from the `blue-cache/` directory:

```bash
cd dpu-agent
cd blue-cache
./scripts/build_dpu.sh
```

### `DOCA_DMA_PROXY` plugin not found at runtime
### `BLUE_CACHE` plugin not found at runtime

Set the plugin search path:

Expand All @@ -243,7 +262,7 @@ Or in Python/C++ code:
agent.add_plugin_directory("/opt/nvidia/nvda_nixl/lib/plugins")
```

If NIXL was built with `-Dstatic_plugins=DOCA_DMA_PROXY`, the plugin is linked into `libnixl.so` and no search path is needed.
If NIXL was built with `-Dstatic_plugins=BLUE_CACHE`, the plugin is linked into `libnixl.so` and no search path is needed.

### `doca_dma.h` not found

Expand Down
16 changes: 8 additions & 8 deletions dpu-agent/CMakeLists.txt → blue-cache/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
# BlueField DPU Agent for BlueCache.
#
# Builds:
# - dpu_dma_copy : DPU-side service (requires DOCA + NIXL)
# - blue-cache : DPU-side service (requires DOCA + NIXL)

cmake_minimum_required(VERSION 3.18)

Expand Down Expand Up @@ -66,22 +66,22 @@ endif()

include_directories(${NIXL_INCLUDE_DIR})

add_executable(dpu_dma_copy
src/dpu_dma_copy.c
add_executable(blue-cache
src/blue_cache_agent.c
src/storage_backend.cpp
)
set_source_files_properties(src/dpu_dma_copy.c PROPERTIES LANGUAGE CXX)
target_link_libraries(dpu_dma_copy
doca_dma_proxy_common
set_source_files_properties(src/blue_cache_agent.c PROPERTIES LANGUAGE CXX)
target_link_libraries(blue-cache
blue_cache_common
${DOCA_DMA_LIB}
${DOCA_COMMON_LIB}
${DOCA_COMCH_LIB}
${NIXL_LIBRARY}
${NIXL_BUILD_LIBRARY}
pthread
)
target_compile_options(dpu_dma_copy PRIVATE -Wall -Wextra)
set_target_properties(dpu_dma_copy PROPERTIES
target_compile_options(blue-cache PRIVATE -Wall -Wextra)
set_target_properties(blue-cache PROPERTIES
BUILD_RPATH "${NIXL_ROOT}/lib;${NIXL_ROOT}/lib64"
INSTALL_RPATH "${NIXL_ROOT}/lib;${NIXL_ROOT}/lib64"
)
Loading