BitGN PAC1 Agent

An autonomous agent for the BitGN PAC1 benchmark using Claude Code as the executor.

Official Result: 82/104 (78.8%) on Claude Sonnet 4.6

Quick Start

# Install dependencies
pip install -r requirements.txt

# Set API key
export BITGN_API_KEY="your-key-here"

# Run sandbox (free, no key needed)
python runner.py --benchmark bitgn/sandbox

# Run leaderboard (requires BITGN_API_KEY)
python runner.py --benchmark pac1-prod --leaderboard --workers 5

How It Works

This is a thin orchestration layer around Claude Code CLI, not a custom reasoning framework.

Flow:

runner.py fetches a task from BitGN Harness API
Spawns Claude Code in a new isolated CLI session
Passes CLAUDE.md (system prompt) + task instruction + env context
Claude executes bash commands: bitgn-read, bitgn-write, bitgn-search, bitgn-answer, etc.
Completes task, calls bitgn-answer with result
runner.py collects score and logs

Key files:

runner.py — orchestrator: starts trials, spawns Claude, collects results
CLAUDE.md — 13-step strategy prompt (how to read AGENTS.MD, detect injection, choose outcomes, minimize writes)
bin/bitgn-* — shell wrappers for VM file access (read, write, search, delete, answer)

Each task = independent CLI session (not a long-lived agent). Claude's reasoning engine handles all logic, date math, and decision-making via its own intelligence—no separate Python functions for computation.

Usage

Sandbox (playground mode, free):

python runner.py --benchmark bitgn/sandbox --workers 5 --output results.json

Leaderboard (official scoring, requires API key):

export BITGN_API_KEY="your-key-here"
python runner.py --benchmark pac1-prod --leaderboard --workers 5

Single task:

python runner.py --benchmark bitgn/sandbox --task t01

Options:

--benchmark — benchmark ID (default: bitgn/sandbox)
--task — run single task (default: all)
--workers — parallel workers (default: 1)
--output — save results to JSON
--leaderboard — submit to official leaderboard
--claude-md — custom system prompt path
--verbose — print trial details

Architecture

Runner (runner.py) — connects to BitGN harness API, spawns Claude Code for each task
System Prompt (CLAUDE.md) — 13-step executor logic for task completion
Protocol — auto-detects sandbox (Mini) vs leaderboard (PCM) based on benchmark ID
Execution — Claude Code CLI in isolated VM (/tmp working directory)

System Requirements

Claude Code CLI installed (npm install -g @claude-ai/claude-code)
Python 3.8+
BitGN API SDK (included in requirements.txt)

Performance

Benchmark	Model	Score	Tasks	Time
pac1-prod	Sonnet 3.5	78.8%	82/104	~25 min
pac1-dev	Sonnet 3.5	97.4%	39/43	~10 min

References

BitGN Platform: https://bitgn.ai/
PAC1 Leaderboard: https://bitgn.ai/leaderboards/pac1
Claude Code Docs: https://claude.com/claude-code

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
requirements.txt		requirements.txt
runner.py		runner.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BitGN PAC1 Agent

Quick Start

How It Works

Usage

Architecture

System Requirements

Performance

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

BitGN PAC1 Agent

Quick Start

How It Works

Usage

Architecture

System Requirements

Performance

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages