r.A.I.d - Roleplay AI Dungeon

Autonomous AI player characters for Dungeons & Dragons 5th Edition. Locally hosted, fully offline, no cloud services required.

r.A.I.d lets a human Dungeon Master run a full adventuring party where some or all player characters are controlled by independent AI agents. Each character has its own personality, voice, memories, and decision-making. The entire system runs on a single gaming PC. Your campaign data never leaves your machine.

What It Does

Each AI-controlled character operates as an independent agent with its own:

Personality and backstory that shapes how it speaks, acts, and makes decisions
Character sheet awareness of abilities, spells, inventory, and hit points
Private knowledge and secrets that other party members cannot access
Long-term memory of past sessions, drawn from a searchable archive of everything the character has experienced
Distinct speaking voice generated locally with per-character voice profiles

When a character's turn comes up, the system assembles that character's unique perspective (what they know, what they have seen, what they remember, what they care about) and generates a response. That response includes both in-character dialogue and a structured game action: attack a target, cast a spell, move to a position, investigate an object, and so on.

The system integrates directly with Foundry VTT. Each AI character logs into Foundry as its own user account, tied to its assigned actor. From inside your Foundry session, the AI characters can read and respond to the current game state, send in-character chat messages, execute dice rolls, move tokens on the map, and track their own hit points, spell slots, and equipment.

From the DM's perspective, running a session looks much the same as it would with human players. You narrate the scene, set encounters, and manage the world. The AI party members respond in turn, staying in character, acting on what they know (and only what they know), and making choices consistent with who they are.

Key Features

Fully offline - no API keys, no subscriptions, no internet connection needed during play
Information hiding - characters only know what they should know. When the rogue scouts ahead alone, only the rogue receives those observations. No metagaming.
Persistent memory - characters remember past sessions and develop over time through a three-tier memory system (working, episodic, reflective)
Foundry VTT integration - AI characters appear as real players in your Foundry session with full access to game state, dice, chat, and token movement
Per-character voices - each character speaks with a distinct voice via local text-to-speech
Push-to-talk voice input - speak to your party and the system transcribes your words locally
Single GPU footprint - the full stack fits comfortably in 16 GB of VRAM with headroom to spare

Hardware Requirements

Component	Minimum	Recommended
GPU	NVIDIA GPU, 12 GB VRAM	RTX 4080 Super or RTX 4090 (16 GB)
CPU	Modern multi-core x86_64	AMD Ryzen 9 9950X3D or equivalent
RAM	16 GB	32 GB
OS	Windows 10/11 or Linux	Windows 11
CUDA	CUDA 12 + cuDNN 9	CUDA 12 + cuDNN 9

The full stack (LLM, STT, TTS, vector search, orchestrator) uses roughly 8-9 GB of VRAM in the default configuration, leaving ~7 GB of headroom on a 16 GB card.

Prerequisites

Install these before setting up r.A.I.d:

1. Python 3.13+

Download from python.org or use your system package manager.

2. Ollama

Install from ollama.com. See the Ollama Setup section below for which models to pull and how to verify everything is working.

3. Foundry VTT (licensed, v12 or v13)

Purchase and install from foundryvtt.com. Install the foundry-rest-api module by ThreeHats from within Foundry's module browser. Enable the module in your world settings and note your API key.

4. Piper TTS (for default voice output)

Install from rhasspy/piper. Download voice models for each character you want to voice.

5. Faster-Whisper (for voice input, optional)

Installed automatically as a Python dependency. On Windows, the easiest CUDA setup is via Purfview's whisper-standalone-win which bundles all NVIDIA libraries.

Ollama Setup

r.A.I.d uses Ollama to run two models locally: one for agent decision-making (the LLM) and one for memory search (the embedding model). Both run on your machine with no cloud calls.

Install Ollama

Download and run the installer from ollama.com. On Windows this installs Ollama as a background service that starts automatically. On Linux:

curl -fsSL https://ollama.com/install.sh | sh

After installation, verify Ollama is running:

curl http://localhost:11434/api/tags

You should see a JSON response (possibly with an empty models list if you haven't pulled anything yet). If you get a connection error, start the service:

# Linux
systemctl start ollama

# Windows -- Ollama runs as a tray application. Launch it from the Start menu.

Pull the required models

r.A.I.d needs two models. Pull them both before your first session:

# 1. LLM -- generates dialogue, actions, and decisions for each character
ollama pull llama3.1:8b

# 2. Embedding model -- converts text into vectors for memory search
ollama pull nomic-embed-text

What these models do:

Model	Tag in Ollama	Size on disk	Purpose	Runs on
LLaMA 3.1 8B	`llama3.1:8b`	~4.7 GB	Agent inference. Each character's turn sends a prompt to this model and gets back structured JSON with dialogue and actions.	GPU
Nomic Embed Text v1.5	`nomic-embed-text`	~275 MB	Memory embeddings. Every observation is converted to a 768-dimensional vector for semantic search. Retrieval queries are also embedded so the system can find relevant past memories.	CPU

The LLM is the only component that requires a GPU. The embedding model runs on CPU and adds negligible latency (~5-15 ms per embedding).

Verify both models are available

ollama list

You should see both models in the output:

NAME                      ID           SIZE     MODIFIED
llama3.1:8b               <hash>       4.7 GB   ...
nomic-embed-text:latest   <hash>       274 MB   ...

Test the LLM

Send a quick test prompt to confirm inference works:

ollama run llama3.1:8b "Say hello in one sentence."

You should get a short text response within a few seconds. If this hangs or errors, check that your GPU drivers are up to date and that Ollama detected your GPU (ollama ps shows which device is in use).

Test the embedding model

curl http://localhost:11434/api/embed -d '{"model": "nomic-embed-text", "input": "test embedding"}'

You should get a JSON response containing an embeddings array with one entry of 768 floating-point numbers. If you see {"error":"model not found"}, re-run ollama pull nomic-embed-text.

Choosing a different LLM (optional)

The default llama3.1:8b is a good balance of quality and speed on 16 GB cards. If you want to experiment:

Model	Tag	Size	Notes
LLaMA 3.1 8B (default)	`llama3.1:8b`	~4.7 GB	Recommended. Good instruction following, fits easily in 16 GB VRAM.
Mistral 7B	`mistral:7b`	~4.1 GB	Slightly smaller. Good at structured output.
LLaMA 3.1 8B Q4_K_M	`llama3.1:8b-instruct-q4_K_M`	~4.9 GB	More aggressive quantization tag. Marginally more accurate than default Q4.
Gemma 2 9B	`gemma2:9b`	~5.4 GB	Larger but still fits 16 GB. Strong at roleplay.
LLaMA 3.3 70B Q4	`llama3.3:70b-instruct-q4_K_M`	~40 GB	Requires 48 GB+ VRAM (e.g. dual GPUs or A6000). Much higher quality output.

To use a different model, pull it with ollama pull <tag> and update the ollama.model field in config.yaml. Do not change the embedding model unless you also update the vector dimension in the database schema (768 is specific to nomic-embed-text-v1.5).

VRAM budget

With the default configuration, VRAM usage breaks down as follows:

Component	Allocation
LLM weights (llama3.1:8b)	~4.7 GB
KV cache	~2.0-2.5 GB
CUDA overhead	~0.5-1.0 GB
Total	~7-8 GB
Remaining (on 16 GB card)	~8-9 GB

The embedding model and all other components (STT, TTS, orchestrator, SQLite) run on CPU and use zero VRAM.

Installation

# Clone the repository
git clone https://github.com/Heretyc/RAID.git
cd RAID

# Create and activate virtual environment
python -m venv .venv

# Windows
.venv\Scripts\activate

# Linux / macOS
source .venv/bin/activate

# Install runtime dependencies
python -m pip install -r raid/requirements.txt

# Install the sqlite-vec extension for memory search
python -m pip install sqlite-vec

For development:

# Install dev dependencies (testing, formatting, linting)
python -m pip install -r requirements-dev.txt

Verify the installation

# Confirm Ollama is reachable
python -c "import httpx; print(httpx.get('http://localhost:11434/api/tags').status_code)"

# Confirm sqlite-vec loads
python -c "import sqlite3, sqlite_vec; c = sqlite3.connect(':memory:'); c.enable_load_extension(True); sqlite_vec.load(c); print('sqlite-vec OK')"

# Run the test suite
pytest raid/tests/

Configuration

r.A.I.d uses a YAML configuration file that lives alongside the code at raid/config.yaml. A sample config ships with the repository. Copy and edit it for your setup.

Ollama settings

The ollama section controls which LLM model the agents use and how the orchestrator talks to Ollama:

ollama:
  base_url: "http://localhost:11434"   # Ollama API address
  model: "llama3.1:8b"                 # LLM for agent inference
  timeout_seconds: 30                  # HTTP timeout per request
  max_retries: 2                       # Retry count on server errors

base_url -- The address where Ollama is listening. Default is http://localhost:11434. Change this only if you run Ollama on a different machine or port.

model -- The Ollama model tag used for all agent LLM calls (dialogue, actions, reflections). This must match a model you have already pulled with ollama pull. The default llama3.1:8b works well on 16 GB cards. See Choosing a different LLM above for alternatives.

The embedding model (nomic-embed-text-v1.5) is not configurable in config.yaml. It is hardcoded because changing it would require updating the vector dimension in the database schema.

Game session settings

game:
  max_rounds: 100           # Safety limit on rounds per session
  reflection_interval: 10   # Rounds between reflective memory generation
  history_window: 10        # Number of recent events in working memory
  session_name: "goblin_ambush_test"  # Name used for log files

reflection_interval -- Every N rounds, each character generates a reflective summary of their recent experiences (e.g. "I'm starting to distrust the merchant"). Lower values produce more frequent character development at the cost of extra LLM calls. The default of 10 is a good balance.

Token budgets

Controls how much context each agent gets per LLM call. These are soft limits measured in estimated tokens (1 token ~ 4 characters):

token_budgets:
  system_prompt: 1000         # Personality, backstory, character sheet
  private_knowledge: 500      # Secrets only this character knows
  shared_state: 1000          # Current scene, HP, initiative order
  retrieved_memories: 1000    # RAG results from episodic/reflective memory
  conversation_history: 4000  # Recent visible events
  response_space: 500         # Reserved for model output

The total per call is roughly 4,500-8,500 tokens, well within any 8K+ context model.

Agent definitions

Each agent is defined with a personality, character sheet, and private knowledge:

agents:
  - id: "warrior"
    name: "Kael Ironbrand"
    role: "pc"
    personality: |
      You are Kael Ironbrand, a gruff human fighter who speaks bluntly...
    character_sheet:
      class: "Fighter"
      level: 5
      hp: 52
      ac: 18
      abilities: {str: 18, dex: 12, con: 16, int: 8, wis: 13, cha: 10}
      proficiencies: ["athletics", "intimidation", "perception"]
      equipment: ["longsword", "shield", "chain_mail", "javelins"]
    private_knowledge: |
      You secretly carry a letter from your dead brother...
    human_controlled: false

  - id: "dm"
    name: "Dungeon Master"
    role: "dm"
    personality: |
      You are the Dungeon Master for a D&D 5e campaign...
    scenario_seed: |
      The party has arrived at the Crossroads Inn at dusk...
    human_controlled: false

Set human_controlled: true on any agent to take manual control of that character via the CLI (or use --human warrior at launch).

Scenario

scenario:
  starting_phase: "exploration"   # exploration, combat, or social
  starting_scene: |
    The Crossroads Inn. A two-story timber building at the junction of
    the North Road and the Eastern Trail. Rain patters against the windows.

Full configuration reference

Key	Section	Description	Default
`ollama.base_url`	`ollama`	Ollama API URL	`http://localhost:11434`
`ollama.model`	`ollama`	LLM model tag for agent inference	`llama3.1:8b`
`ollama.timeout_seconds`	`ollama`	HTTP timeout per Ollama request	`30`
`ollama.max_retries`	`ollama`	Retry count on server errors	`2`
`game.max_rounds`	`game`	Maximum rounds before session ends	`100`
`game.reflection_interval`	`game`	Rounds between reflective memory generation	`10`
`game.history_window`	`game`	Recent events kept in working memory	`10`
`game.session_name`	`game`	Name for log files	`default_session`
`token_budgets.*`	`token_budgets`	Per-layer context token limits	See above
`scenario.starting_phase`	`scenario`	Initial game phase	`exploration`
`scenario.starting_scene`	`scenario`	Opening scene description	(empty)

Usage

Start a session

Make sure Ollama is running and both models are pulled (see Ollama Setup).

# Standard launch (package mode)
python -m raid --config raid/config.yaml

# Alternative: run the entry-point module directly
python -m raid.run --config raid/config.yaml

# Alternative: run the script directly
python raid/run.py --config raid/config.yaml

# Dry-run mode (no Ollama needed, uses placeholder responses)
python -m raid --config raid/config.yaml --dry-run

# Override a character as human-controlled
python -m raid --config raid/config.yaml --human warrior

# Debug logging (writes to ~/raid-debug.log)
python -m raid --config raid/config.yaml --debug

# Set log level without writing to file
python -m raid --config raid/config.yaml --log-level DEBUG

All three invocation methods (python -m raid, python -m raid.run, python raid/run.py) are equivalent and accept the same CLI flags.

What happens on launch

r.A.I.d loads the config and connects to Ollama to verify the configured model is available.
It connects to Foundry VTT via the REST API and reads the current world state (if a Foundry bridge is configured; otherwise stubs are used).
Each configured AI character authenticates as its own Foundry user account.
The orchestrator enters the game loop, processing one character at a time in turn order.

Example: a combat round

The DM sets the scene in Foundry and starts combat. On each AI character's turn:

The orchestrator reads the current game state from Foundry (initiative order, positions, HP, conditions).
It assembles the character's context: system prompt, private knowledge, shared game state, relevant memories from the vector database, and recent conversation history.
The LLM generates a structured response:

{
  "dialogue": "Behind me, Elara! I will hold the line!",
  "action": "attack",
  "target": "goblin_warleader",
  "weapon": "warhammer",
  "movement": "move 15 feet to flank the goblin near Elara"
}

The orchestrator executes the action in Foundry: posts the dialogue as an in-character chat message, triggers the attack roll, and updates the token position.
The character's observations from this turn are written to their episodic memory store.

Development commands

pytest raid/tests/            # Run tests
mypy raid/                    # Type check

Random Party Generator

r.A.I.d includes a built-in tool that generates a fully randomised adventuring party, places them in a tavern with an NPC, and optionally pushes everything into Foundry VTT -- all in one command.

What it generates

Player characters with random race (9 options), class (12 options), ability scores (4d6 drop lowest), equipment, proficiencies, spell slots, personality, backstory, and a private secret
One tavern NPC (innkeeper, barmaid, or mysterious stranger) with a name, description, and hidden secret for the DM to use
A tavern scene with atmosphere text and token positions
A complete config.yaml ready to run with run.py

Basic usage

# Generate 3 PCs at level 5 (writes raid/generated_config.yaml)
python raid/populate.py --num-pcs 3 --level 5

# Generate 4 PCs at level 3 with a fixed seed for reproducibility
python raid/populate.py --num-pcs 4 --level 3 --seed 42

# Preview without writing any files
python raid/populate.py --num-pcs 3 --dry-run

# Use your existing config.yaml for Ollama/Foundry settings
python raid/populate.py --num-pcs 3 --config raid/config.yaml

# Write to a custom output path
python raid/populate.py --num-pcs 3 --output raid/my_session.yaml

Running the generated session

# After generating:
python raid/run.py --config raid/generated_config.yaml

# Dry-run to test without Ollama:
python raid/run.py --config raid/generated_config.yaml --dry-run

# Take manual control of one character:
python raid/run.py --config raid/generated_config.yaml --human monk

Example output

============================================================
  The Red Rooster
============================================================

  [monk] Elric Blackwood
    Human Monk (Level 5)
    HP: 28  AC: 14  Background: Acolyte
    STR 18  DEX 19  CON 11  INT 13  WIS 10  CHA 11

  [rogue] Lirael Galanodel
    Elf Rogue (Level 5)
    HP: 28  AC: 14  Background: Sage
    STR 10  DEX 17  CON 11  INT 14  WIS 13  CHA 11

  [sorcerer] Eryn Siannodel
    Elf Sorcerer (Level 5)
    HP: 27  AC: 13  Background: Folk Hero
    STR 14  DEX 16  CON 12  INT 6  WIS 9  CHA 16

  [NPC] Lurg Goresmasher (innkeeper)
    Lurg Goresmasher, a matronly half-orc who treats every patron
    like family

  Scene: The Red Rooster. A cozy two-storey timber building on a
  well-traveled road. A fire crackles in the hearth...
============================================================

Pushing to Foundry VTT

If you have Foundry VTT running with the foundryvtt-rest-api module installed, the generator can create actors, a scene, and tokens directly in your world:

python raid/populate.py --num-pcs 3 --config raid/config.yaml --foundry

This requires an uncommented foundry section in your base config (see Foundry VTT Setup). The tool will:

Connect to Foundry via the WebSocket relay
Create an Actor document for each PC (type character) and the NPC (type npc) with full D&D 5e system data (abilities, HP, AC, movement, biography)
Create a tavern Scene (20x15 grid, 2000x1500 px)
Place tokens for every character on the scene at pre-set tavern positions (PCs around tables, NPC behind the bar)
Activate the scene so it appears for all connected players
Write a config.yaml with all Foundry UUID mappings filled in (agent_actor_map, agent_token_map, scene_id)

After population, launch the session and every AI character's chat messages, dice rolls, and actions will appear in Foundry:

python raid/run.py --config raid/generated_config.yaml

CLI reference

python raid/populate.py [options]

Options:
  --config PATH       Base config for Ollama/Foundry settings (optional)
  --output PATH       Output config path (default: raid/generated_config.yaml)
  --num-pcs N         Number of PCs to generate (default: 3)
  --level N           Character level, 1-10 (default: 5)
  --seed N            Random seed for reproducibility
  --foundry           Push actors, scene, and tokens to Foundry VTT
  --dry-run           Print generated scenario as JSON, write nothing
  --debug             Enable debug logging

Available races and classes

Races	Classes
Human, Elf, Dwarf, Halfling, Half-Elf	Fighter, Rogue, Wizard, Cleric, Ranger
Tiefling, Half-Orc, Gnome, Dragonborn	Bard, Paladin, Barbarian, Warlock, Sorcerer, Druid, Monk

Each race has correct ability bonuses, size, speed, darkvision, and languages. Each class has correct hit dice, primary ability, starting equipment, AC calculation, proficiencies, and spell slots (for levels 1-10).

Foundry VTT Setup

r.A.I.d integrates with Foundry VTT through the foundryvtt-rest-api module. RAID runs its own WebSocket server internally -- no external relay or middleware is needed.

Architecture

Foundry VTT                          RAID
┌─────────────┐    WebSocket    ┌──────────────┐
│ foundryvtt-  │ ──────────────>│ FoundryRelay │
│ rest-api     │    connects    │ (ws server)  │
│ module       │<───────────────│              │
└─────────────┘    responses    └──────┬───────┘
                                       │
                                ┌──────┴───────┐
                                │ FoundryGame  │
                                │ Bridge       │
                                │ (chat, dice, │
                                │  actors,     │
                                │  tokens)     │
                                └──────────────┘

The Foundry module connects outbound to RAID's WebSocket server. RAID sends requests (create actors, roll dice, post chat) and receives responses through this single connection.

Step 1: Install the Foundry module

In Foundry VTT, go to Settings > Manage Modules > Install Module and search for foundryvtt-rest-api by ThreeHats. Install and enable it in your world.

Step 2: Configure the module

In the module settings inside Foundry:

Set the WebSocket URL to ws://localhost:3015 (or whatever host:port RAID will listen on)
Set an API key -- this can be any string, but it must match what you put in RAID's config
Note the Client ID (auto-generated, you don't need to change it)

Step 3: Configure RAID

Uncomment and fill in the foundry section of your config.yaml:

foundry:
  api_key: "your-api-key"          # Must match the module setting
  ws_host: "localhost"              # Host for RAID's WebSocket server
  ws_port: 3015                    # Port the Foundry module connects to
  connection_timeout: 30.0          # Seconds to wait for the module
  timeout_seconds: 10
  max_retries: 3
  scene_id: "Scene.your_scene_id"
  agent_actor_map:                  # RAID agent ID -> Foundry actor UUID
    warrior: "Actor.actor_id_for_warrior"
    rogue:   "Actor.actor_id_for_rogue"
    mage:    "Actor.actor_id_for_mage"
    dm:      ""                     # DM has no actor
  agent_token_map:                  # RAID agent ID -> Foundry token UUID
    warrior: "Scene.scene_id.Token.token_id_for_warrior"

You can find actor and token UUIDs in Foundry by right-clicking an actor or token and selecting "Copy UUID".

Step 4: Launch

Start RAID -- it will begin listening for the Foundry module to connect:

python raid/run.py --config raid/config.yaml

Then open your Foundry world in a browser. The module will auto-connect to RAID's WebSocket server. You should see log output confirming the connection.

Using the random party generator with Foundry

The easiest way to set up a Foundry session is to let populate.py create everything for you:

# 1. Make sure your config.yaml has the foundry section uncommented
#    with at least api_key, ws_host, and ws_port filled in

# 2. Generate characters and push to Foundry
python raid/populate.py --num-pcs 3 --config raid/config.yaml --foundry

# 3. Run the session (config has all UUIDs pre-filled)
python raid/run.py --config raid/generated_config.yaml

This creates actors, a scene, and tokens automatically. No manual UUID copying required.

Resilient mode

By default, run.py uses ResilientFoundryBridge which wraps the real bridge in a try/except. If Foundry is unreachable or the module disconnects mid-session, the game continues using stub fallbacks (local dice rolls, logged chat). Foundry being down never crashes the game loop.

Project Structure

raid/                           # Main application package
    run.py                      # CLI entry point (game sessions)
    populate.py                 # CLI entry point (random party generation)
    generate.py                 # D&D 5e character/scenario generation engine
    orchestrator.py             # Turn-based agent loop, sequential LLM calls
    context_builder.py          # Per-agent prompt assembly with info hiding
    models.py                   # Pydantic schemas (config, actions, game state)
    interfaces.py               # Protocol ABCs (MemoryStore, GameBridge, Speech)
    ollama_client.py            # Synchronous Ollama HTTP client with retries
    stubs.py                    # Stub implementations for standalone testing
    config.yaml                 # Sample session configuration
    requirements.txt            # Runtime Python dependencies
    foundry/                    # Foundry VTT integration
        bridge.py               # GameBridge impl via foundryvtt-rest-api
        relay.py                # Internal WebSocket relay server
    memory/                     # RAG memory subsystem
        store.py                # SqliteMemoryStore (sqlite-vec + FTS5 hybrid)
    tests/                      # pytest-based tests
        conftest.py             # Shared fixtures
        test_models.py          # Model validation tests
        test_context.py         # Information hiding tests
        test_orchestrator.py    # Game loop and phase transition tests
        test_foundry_bridge.py  # Foundry bridge and relay tests
logs/                           # Session JSONL logs (created at runtime)

How It Works

The short version

One LLM instance (LLaMA 3.1 8B via Ollama) handles all characters sequentially. Each character gets a different prompt built from its own personality, knowledge, memories, and view of the game state. A lightweight Python orchestrator manages turn order and communicates with Foundry VTT. Memory is stored in SQLite with the sqlite-vec extension for semantic search and FTS5 for keyword search, merged via Reciprocal Rank Fusion. Voice I/O runs on CPU to keep all GPU memory available for the language model.

Architecture diagram

+--------------------------+      +--------------------------+
|  Ollama                  |      |  Ollama                  |
|  llama3.1:8b (GPU)       |      |  nomic-embed-text (CPU)  |
|  Agent inference         |      |  Memory embeddings       |
+----------+---------------+      +----------+---------------+
           |                                 |
           | Sequential calls,               | Embed on store()
           | one per character turn           | and retrieve()
     +-----+-----+-----+                     |
     |     |     |     |              +-------+--------+
   Warrior Rogue Mage  DM             |                |
     |     |     |     |              v                v
     +-----+-----+-----+      +------+------+  +------+------+
           |                   | sqlite-vec  |  | FTS5        |
+----------+---------------+   | KNN search  |  | BM25 search |
|  Python Orchestrator     |   +------+------+  +------+------+
|  Context assembly        |          |                |
|  Action parsing          |          +------- +-------+
|  Memory read/write       |                   |
+----------+---------------+           RRF merge (k=60)
           |                                   |
+----------+---------------+         +---------+---------+
|  SQLite                  |         |  Retrieved        |
|  chunks table            | <-----> |  memories (top k) |
|  Episodic + reflective   |         +-------------------+
+--------------------------+
           |
+----------+---------------+         +---------------------+
|  Foundry VTT             |         |  Faster-Whisper     |
|  via foundry-rest-api    |         |  (STT, CPU)         |
|  Game state, dice, chat  |         +---------------------+
+--------------------------+         +---------------------+
                                     |  Piper / Kokoro     |
                                     |  (TTS, CPU)         |
                                     +---------------------+

Information hiding

Characters only know what they should know. This is enforced at the prompt construction level, not as a post-processing filter. Each character's context is assembled from shared pools (current scene, initiative order, publicly visible events) and private pools (character-specific secrets, whispered conversations, solo scouting observations).

When the rogue scouts ahead alone, the rogue's context includes those observations. The fighter's context does not. This happens because the orchestrator never writes those events to the fighter's memory store or includes them in the fighter's prompt. There is no "filter out what they shouldn't see" step because the information is never there to filter.

Memory tiers

Working memory is the last 5-10 exchanges and the current scene, passed directly in the LLM prompt. Rebuilt every turn.

Episodic memory is every observation the character has made, stored as vector embeddings in sqlite-vec. Retrieved by semantic similarity to the current situation using nomic-embed-text-v1.5. Shared events go to all agents; private events go only to the relevant agent.

Reflective memory is generated every ~10 turns as a summary of recent experiences from the character's perspective. The orchestrator feeds the character's recent episodic memories back through the LLM and asks for a 2-4 sentence first-person reflection. These summaries capture character development ("I'm starting to distrust the merchant") and are stored as high-priority retrievable memories.

Memory retrieval uses hybrid search: both vector similarity (sqlite-vec) and keyword matching (FTS5) run independently, then results are merged using Reciprocal Rank Fusion (RRF). This means a relevant memory is found whether the match is semantic ("locked door with magic" finds "arcane runes on the gate") or lexical (exact keyword hit).

Contributing

r.A.I.d is in active early development. Contributions are welcome.

Before you start

Read AGENT.md. It defines the project's mandatory coding standards, documentation requirements, and architectural rules. It is authoritative.
All Python code targets 3.13+.
Every function, method, and class must have a PEP 257 compliant docstring in reStructuredText style. See AGENT.md for the exact format and when full vs. summary docstrings are required.
All code must include PEP 484 type hints.

Workflow

Fork the repository.
Create a feature branch from main.
Write your code. Write tests. Write docstrings.
Run the full check suite before submitting:
```
mypy raid/
pytest raid/tests/
```
Open a pull request against main with a clear description of what your change does and why.

What we care about

Correctness over speed. Don't skip docs or tests to ship faster.
Security. Never hardcode secrets. Validate all external input. See the Security Rules section in AGENT.md.
Clarity. If your code needs a wall of comments to explain, it probably needs to be restructured.
Consistency. Follow the patterns already in the codebase. If you think a pattern should change, open an issue first.

What we will not accept

Code without docstrings or type hints.
Changes that weaken security without prior discussion.
Additions of heavy multi-agent frameworks (AutoGen, CrewAI, LangGraph). The custom orchestrator is a deliberate architectural choice.
Cloud service dependencies. r.A.I.d is offline by design.

License

This project is licensed under the Apache License 2.0. See LICENSE for the full text.

Contact

Email: gitinquiry@ioc.dev
Repository: github.com/Heretyc/RAID
Issues: github.com/Heretyc/RAID/issues

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.claude		.claude
raid		raid
specs		specs
.gitignore		.gitignore
AGENT.md		AGENT.md
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
FIXPLAN.md		FIXPLAN.md
HISTORY.md		HISTORY.md
IMPLEMENTATION_PLAN.md		IMPLEMENTATION_PLAN.md
LICENSE		LICENSE
PROMPT.md		PROMPT.md
PROMPT_build.md		PROMPT_build.md
PROMPT_plan.md		PROMPT_plan.md
README.md		README.md
commit-gen.md		commit-gen.md
loop.sh		loop.sh
pyproject.toml		pyproject.toml
ralph.ps1		ralph.ps1
run_voice.cmd		run_voice.cmd
validate.ps1		validate.ps1

Folders and files

Latest commit

History

Repository files navigation

r.A.I.d - Roleplay AI Dungeon

Table of Contents

What It Does

Key Features

Hardware Requirements

Prerequisites

Ollama Setup

Install Ollama

Pull the required models

Verify both models are available

Test the LLM

Test the embedding model

Choosing a different LLM (optional)

VRAM budget

Installation

Verify the installation

Configuration

Ollama settings

Game session settings

Token budgets

Agent definitions

Scenario

Full configuration reference

Usage

Start a session

What happens on launch

Example: a combat round

Development commands

Random Party Generator

What it generates

Basic usage

Running the generated session

Example output

Pushing to Foundry VTT

CLI reference

Available races and classes

Foundry VTT Setup

Architecture

Step 1: Install the Foundry module

Step 2: Configure the module

Step 3: Configure RAID

Step 4: Launch

Using the random party generator with Foundry

Resilient mode

Project Structure

How It Works

The short version

Architecture diagram

Information hiding

Memory tiers

Contributing

Before you start

Workflow

What we care about

What we will not accept

License

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages