❄️ brr ❄️

Opinionated research infrastructure tooling. Launch clusters, get SSH access, start building.

Features

Shared filesystem — Nodes can share $HOME via EFS (AWS) or virtiofs (Nebius).
Coding tools — Install Claude Code, Codex, or Gemini. Connect with e.g. brr attach aws:cluster claude
Autoscaling — Ray-based cluster scaling with cached instances.
Project-based workflows — Per-repo cluster configs and project-specific dependencies.
Auto-shutdown — Monitors CPU, GPU, and SSH activity. Shuts down idle instances to save costs.
Dotfiles integration — Take your dev environment (vim, tmux, shell config) to every cluster node.

Prerequisites

uv (for installation)

Quick Start

# Install
uv tool install brr-cli

# Configure (interactive wizard)
brr configure      # or: brr configure nebius

# Launch a GPU instance
brr up aws:l4

# brr up nebius:h100

# Connect
brr attach aws:l4                # SSH
brr attach aws:l4 claude         # Claude Code on the cluster
brr vscode aws:l4                # VS Code remote

All templates use provider:name syntax (e.g. aws:l4, aws:dev). Inside a project, project templates are resolved first.

Supported clouds: AWS · Nebius

Projects

For per-repo cluster configs, initialize a project:

cd my-research-repo/
brr init

This creates:

.brr/
  setup.sh          # Project-specific dependencies (shared across providers)
  aws/
    dev.yaml        # Single GPU for development
    cluster.yaml    # CPU head + GPU workers

Templates are Ray cluster YAML — edit them or add your own. Inside a project:

brr up aws:dev          # launches .brr/aws/dev.yaml
brr up aws:cluster      # launches .brr/aws/cluster.yaml
brr attach aws:dev      # SSH into dev cluster
brr down aws:dev        # tear down

On first deploy, brr up clones the project repo to ~/code/{repo}/ on the head node.

If your project uses uv, brr init generates templates that use uv run ray start from your project directory. Add cluster dependencies to your project first: uv add 'ray[default]' boto3.

All global config lives in ~/.brr/config.env.

Templates

See docs/templates.md for the full template reference (placeholders, injection, overrides, Nebius fields).

Built-in templates

Template	Instance	GPU	Workers
`aws:cpu`	t3.2xlarge	—	0-2
`aws:l4`	gr6.4xlarge	1x L4	—
`aws:h100`	p5.4xlarge	1x H100	—
`aws:cpu-l4`	t3.2xlarge + g6.4xlarge	1x L4	0-4
`nebius:cpu`	8vcpu-32gb	—	0-2
`nebius:h100`	1gpu-16vcpu-200gb	1x H100	—
`nebius:cpu-h100s`	8vcpu-32gb + 8gpu-128vcpu-1600gb	8x H100	0-4

Overrides

Override template values inline:

brr up aws:cpu instance_type=t3.xlarge max_workers=4
brr up aws:l4 spot=true
brr up aws:dev region=us-west-2

Preview the rendered config without launching:

brr up aws:dev --dry-run

See available overrides for a template:

brr templates show aws:dev

Multi-provider

Both providers can run simultaneously:

brr up aws:l4
brr up nebius:h100
brr attach nebius:h100
brr down nebius:h100

Customization

Node setup

The built-in setup.sh runs on every node boot. It installs packages, mounts shared storage, sets up Python/Ray, GitHub SSH keys, AI coding tools, dotfiles, and the idle shutdown daemon. It updates automatically when you upgrade brr.

Project-specific dependencies go in .brr/setup.sh (created by brr init), which runs after the global setup.

uv integration

uv is installed to ~/.local/lib/ (via UV_INSTALL_DIR) with a routing wrapper at ~/.local/bin/uv that redirects storage to instance-local disk:

Environment variable	Value	Purpose
`UV_CACHE_DIR`	`/tmp/uv`	Download cache (per-instance)
`UV_PYTHON_INSTALL_DIR`	`/opt/uv/python`	Managed Python builds (persistent)
`UV_PROJECT_ENVIRONMENT`	`/opt/venvs/{project}`	Project venvs (persistent)

Both the binary and wrapper persist on EFS so new instances reuse them without reinstalling. uv self update updates the binary at ~/.local/lib/uv without touching the wrapper. Only the download cache (/tmp/uv) is per-instance; Python builds and venvs persist at /opt/ so they survive reboots (important for cached node restarts).

For uv-managed projects, Ray runs via uv run ray start from the project directory — add ray[default] and your cloud SDK (e.g. boto3) to your project's dependencies. For non-uv clusters, Ray runs from a standalone venv at /opt/brr/venv.

AI coding tools

Install AI coding assistants on every cluster node:

brr configure tools    # select Claude Code, Codex, and/or Gemini CLI

Then connect and start coding:

brr up aws:dev
brr attach aws:dev claude

Dotfiles

Set a dotfiles repo to sync your dev environment to every node:

brr config set DOTFILES_REPO "https://github.com/user/dotfiles"

The repo is cloned to ~/dotfiles and installed via install.sh (if present) or GNU Stow.

Idle shutdown

A systemd daemon monitors CPU, GPU, SSH activity, and network throughput. When all signals are idle for the configured timeout, the instance shuts down.

Configure in ~/.brr/config.env:

IDLE_SHUTDOWN_ENABLED="true"
IDLE_SHUTDOWN_TIMEOUT_MIN="30"
IDLE_SHUTDOWN_CPU_THRESHOLD="10"
IDLE_SHUTDOWN_NET_THRESHOLD_KBPS="100"
IDLE_SHUTDOWN_GRACE_MIN="15"

The grace period prevents shutdown during initial setup. Monitor on a node with journalctl -u idle-shutdown -f.

Node caching

By default, Nebius nodes are deleted on scale-down. Unlike AWS, stopped Nebius instances still incur disk charges, so deleting is cheaper.

To keep nodes stopped instead (faster restart, but you pay for disks while idle), enable caching in your template's provider config:

provider:
  cache_stopped_nodes: true

AWS nodes are cached (stopped) by default.

Commands

Command	Description
`brr up TEMPLATE [OVERRIDES...]`	Launch or update a cluster (`aws:l4`, `aws:dev`, or `path.yaml`)
`brr up TEMPLATE --dry-run`	Preview rendered config without launching
`brr down TEMPLATE`	Stop a cluster (instances preserved for fast restart)
`brr down TEMPLATE --delete`	Terminate all instances and remove staging files
`brr attach TEMPLATE [COMMAND]`	SSH into head node, optionally run a command (e.g. `claude`)
`brr list [--all]`	List clusters (project-scoped by default, `--all` for everything)
`brr clean [TEMPLATE]`	Terminate stopped (cached) instances
`brr vscode TEMPLATE`	Open VS Code on a running cluster
`brr templates list`	List built-in templates
`brr templates show TEMPLATE`	Show template config and overrides
`brr init`	Initialize a project (interactive provider selection)
`brr configure [cloud\|tools\|general]`	Interactive setup (cloud provider, AI tools, settings)
`brr config [list\|get\|set\|path]`	View and manage configuration
`brr completion [bash\|zsh\|fish]`	Shell completion (`--install` to add to shell rc)
`brr nuke [aws\|nebius]`	Tear down all cloud resources

Cloud Setup

AWS Setup

Attach the IAM policy to your IAM user
Install the AWS CLI and run aws configure
(Optional) For GitHub SSH access on clusters, authenticate the GitHub CLI:
```
gh auth login
gh auth refresh -h github.com -s admin:public_key
```
Run the setup wizard:
```
brr configure aws
```

Nebius Setup

Install the Nebius CLI and run nebius init

Create a service account with editor permissions:

TENANT_ID="<your-tenant-id>"  # from console.nebius.com → Administration

SA_ID=$(nebius iam service-account create \
  --name brr-cluster --format json | jq -r '.metadata.id')

EDITORS_GROUP_ID=$(nebius iam group get-by-name \
  --name editors --parent-id $TENANT_ID --format json | jq -r '.metadata.id')

nebius iam group-membership create \
  --parent-id $EDITORS_GROUP_ID --member-id $SA_ID

Generate credentials:

mkdir -p ~/.nebius
nebius iam auth-public-key generate \
  --service-account-id $SA_ID --output ~/.nebius/credentials.json

Run the setup wizard:
```
brr configure nebius
```

Acknowledgments

This project started as a fork of aws_wiz by Bes and has been inspired by discussions with colleagues from the Encode: AI for Science Fellowship.

Name		Name	Last commit message	Last commit date
Latest commit History 109 Commits
.brr		.brr
.claude/skills/release		.claude/skills/release
.github/workflows		.github/workflows
brr		brr
docs		docs
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

❄️ brr ❄️

Features

Prerequisites

Quick Start

Projects

Templates

Built-in templates

Overrides

Multi-provider

Customization

Node setup

uv integration

AI coding tools

Dotfiles

Idle shutdown

Node caching

Commands

Cloud Setup

AWS Setup

Nebius Setup

Acknowledgments

About

Uh oh!

Releases 44

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

❄️ brr ❄️

Features

Prerequisites

Quick Start

Projects

Templates

Built-in templates

Overrides

Multi-provider

Customization

Node setup

uv integration

AI coding tools

Dotfiles

Idle shutdown

Node caching

Commands

Cloud Setup

AWS Setup

Nebius Setup

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 44

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages