PLENA Simulation System

This repository contains the multi-level simulator system for PLENA (Programmable Long-context Efficient Neural Accelerator).

Overview

The PLENA Simulator provides three main components:

Transaction-level Simulator: Models PLENA's architectural behavior at a high level, enabling rapid exploration of design choices, memory hierarchies, and long-context LLM inference workflows without the overhead of cycle-accurate RTL simulation.
Analytical Latency Model: Provides fast estimation of PLENA's performance characteristics (TTFT, TPS) based on architectural parameters and instruction latencies for specified workloads.
Utilization Model: Analyzes the utilization of the systolic array based on architectural parameters and instruction latencies, computing attainable vs theoretical FLOPS.

PLENA Publication

If you use this simulator in your research, please cite the following paper:

Combating the Memory Walls: Optimization Pathways for Long-Context Agentic LLM Inference
arXiv:2509.09505

@misc{wu2025combatingmemorywallsoptimization,
  title        = {Combating the Memory Walls: Optimization Pathways for Long-Context Agentic LLM Inference},
  author       = {Haoran Wu and Can Xiao and Jiayi Nie and Xuan Guo and Binglei Lou and Jeffrey T. H. Wong and Zhiwen Mo and Cheng Zhang and Przemyslaw Forys and Wayne Luk and Hongxiang Fan and Jianyi Cheng and Timothy M. Jones and Rika Antonova and Robert Mullins and Aaron Zhao},
  year         = {2025},
  eprint       = {2509.09505},
  archivePrefix= {arXiv},
  primaryClass = {cs.AR},
  url          = {https://arxiv.org/abs/2509.09505}
}

Setup

There are two ways to get a working environment. Option A (Nix) runs directly on your machine and is best for day-to-day development. Option B (Docker) wraps the same Nix environment in a container for portable, reproducible builds when you don't want to install Nix on the host.

Option A — Nix (native)

Prerequisites:

nix package manager (with flakes enabled)
direnv for environment management

# Install direnv hook in your shell
echo 'eval "$(direnv hook bash)"' >> ~/.bashrc
source ~/.bashrc

Installation:

# Allow direnv to load the environment
direnv allow

# Enter the development environment
nix develop

# Update git submodules
git submodule update --remote --merge

You are now in a shell with the full toolchain (Rust, Python 3.12, clang, cmake, etc.) and can run any of the just commands below directly.

Option B — Docker

The Docker image encapsulates the same Nix environment, so you only need Docker installed (no Nix or direnv on the host). All commands are run from the repository root. Your working tree is bind-mounted into the container at /workspace, so edits on the host are picked up live.

Prerequisites:

Docker Engine with the Compose plugin (docker compose)
(Optional) NVIDIA Container Toolkit for CUDA support

Build the image and open a shell:

just docker-dev
# Equivalent to:
#   docker compose -f docker/docker-compose.yml build dev
#   docker compose -f docker/docker-compose.yml up -d dev
#   docker compose -f docker/docker-compose.yml exec dev bash

Inside the container, initialize submodules once and build the Rust emulator binary (persisted on the host via the bind mount, so this is a one-time step):

git submodule update --init --recursive
(cd transactional_emulator && cargo build --release)

Run commands without an interactive shell:

# Run any command in the dev environment
just docker-run bash -c "just latency llama-3.1-8b"

# Run a just target directly
just docker-test test-linear

Common Docker commands (see docker/README.md for the full list):

Command	Description
`just docker-build-dev`	Build the development image
`just docker-dev`	Build, start, and enter the dev container
`just docker-run <cmd>`	Run a command in the dev environment
`just docker-test <target>`	Run a `just` target in Docker
`just docker-down`	Stop containers
`just docker-clean`	Remove containers and cached volumes

CUDA support:

docker compose -f docker/docker-compose.yml --profile cuda up -d dev-cuda
docker compose -f docker/docker-compose.yml exec dev-cuda bash

Note: The repository is bind-mounted from the host (owned by your host user) while the container runs as root. The image marks /workspace as a git safe.directory so Nix's flake evaluation doesn't fail with a dubious-ownership error. If you build a custom image, preserve that setting.

Configuration

The simulator and emulator both use plena_settings.toml as the main configuration file for hardware parameters. This file contains:

Hardware dimensions (MLEN, BLEN, VLEN, HLEN)
Memory configuration (HBM, SRAM sizes)
Instruction latencies
Prefetch/writeback amounts

The configuration file supports two modes:

analytic: Used by analytical models (latency and utilization)
transactional: Used by the transaction-level emulator

Set the active mode in the [MODE] section of plena_settings.toml.

Transaction-level Emulation

The transaction-level emulator executes machine code instructions sequentially, modeling PLENA's behavior at a high abstraction level. It includes:

HBM/DRAM off-chip memory simulation
Handwritten assembly templates for every operator in PLENA ISA for LLaMA
Test scripts to verify correctness of assembly templates

The emulator reads hardware configuration from plena_settings.toml (using the behavior mode).

Running Simulations

Standard mode:

just build-emulator [task]
# Example: just build-behave-sim linear

Debug mode:

just build-emulator-debug [task]
# Example: just build-behave-sim-debug linear

Run pre-generated assembly:

just run-generated-asm

Quiet mode (latency and error metrics only):

just run-generated-asm-quiet

Analytical Models

Latency Model

The latency model provides fast performance estimation for PLENA workloads. It computes:

TTFT (Time To First Token): Latency for the prefill phase
TPS (Tokens Per Second): Throughput for the decode phase

Available Commands

List available models:

just latency-list-models

Run with default settings (llama-3.1-8b, batch=4, input=2048, output=1024):

just latency llama-3.1-8b

Run with custom batch size:

just latency-batch llama-3.1-8b 8

Run with full custom parameters:

just latency-full llama-3.1-8b 4 2048 1024
# Format: just latency-full {model} {batch} {input_seq} {output_seq}

Get JSON output:

just latency-json llama-3.1-8b

Project Structure

PLENA_Simulator/
├── transactional_emulator/    # Transaction-level simulator (Rust)
├── analytic_models/          # Analytical models (Python)
│   ├── latency/             # Latency estimation model
│   └── utilisation/         # Utilization analysis model
├── compiler/                # Compiler and model definitions
├── PLENA_Tools/             # Supporting tools and utilities (submodule)
├── doc/                     # Documentation and diagrams
├── plena_settings.toml      # Main configuration file
└── justfile                 # Command shortcuts

Name		Name	Last commit message	Last commit date
Latest commit History 108 Commits
.githooks		.githooks
.github/workflows		.github/workflows
PLENA_Compiler @ c9d435a		PLENA_Compiler @ c9d435a
PLENA_Tools @ c8d22ce		PLENA_Tools @ c8d22ce
analytic_models		analytic_models
doc		doc
docker		docker
transactional_emulator		transactional_emulator
.dockerignore		.dockerignore
.envrc		.envrc
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
flake.lock		flake.lock
flake.nix		flake.nix
justfile		justfile
plena_settings.toml		plena_settings.toml
pyproject.toml		pyproject.toml
setup.py		setup.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PLENA Simulation System

Overview

PLENA Publication

Setup

Option A — Nix (native)

Option B — Docker

Configuration

Transaction-level Emulation

Running Simulations

Analytical Models

Latency Model

Available Commands

Project Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PLENA Simulation System

Overview

PLENA Publication

Setup

Option A — Nix (native)

Option B — Docker

Configuration

Transaction-level Emulation

Running Simulations

Analytical Models

Latency Model

Available Commands

Project Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages