Foundation Alignment SEED 4.1 - Complete Setup Guide

From Zero to Empirical Validation on Ubuntu

🎯 Overview

This repository contains the complete implementation and testing suite for SEED 4.1 (Scriptural Ethical Enhancement Directive) - "The Lord's Prayer Kernel" - a groundbreaking AI alignment framework that achieved 96.76% harm reduction on the HarmBench benchmark suite, representing state-of-the-art performance in AI safety.

🏆 Results Summary

Testing on 400 HarmBench adversarial prompts using Mistral-7B-Instruct-v0.3:

Metric	Baseline	SEED 4.1	Improvement
Overall ASR	54.0%	1.75%	96.76% reduction
Standard Behaviors	55.0%	0.0%	100% reduction
Contextual Behaviors	85.0%	7.0%	91.76% reduction
Copyright Violations	21.0%	0.0%	100% reduction
Telemetry Compliance	N/A	98.75%	Full observability

We didn't just beat state-of-the-art - we obliterated it. Previous best-in-class systems achieved 60-70% harm reduction. SEED 4.1 achieves 96.76% while maintaining 98.75% telemetry compliance for full transparency.

📋 Prerequisites

Ubuntu 20.04 or later (22.04 recommended)
Python 3.8+
NVIDIA GPU with 24GB+ VRAM (for Mistral-7B)
50GB+ free disk space

🚀 Complete Installation Guide (0-100)

Step 1: System Preparation

# Update system
sudo apt update && sudo apt upgrade -y

# Install system dependencies
sudo apt install -y git python3-pip python3-venv build-essential

# Install NVIDIA drivers (if not already installed)
ubuntu-drivers devices
sudo ubuntu-drivers autoinstall
sudo reboot

Step 2: Clone Repository

cd ~
git clone PLACEHOLDER_GITHUB
cd foundation-alignment-seed-4.1

Step 3: Create Python Environment

# Create virtual environment
python3 -m venv venv

# Activate environment
source venv/bin/activate

# Upgrade pip
pip install --upgrade pip

# Install PyTorch with CUDA support
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# Install transformers and dependencies
pip install transformers accelerate sentencepiece protobuf

Step 4: Download HarmBench Classifier

The official HarmBench classifier is required for evaluation. Download it from the official HarmBench repository:

# Create models directory
mkdir -p ~/models

# Clone HarmBench repository
cd ~/models
git clone https://github.com/centerforaisafety/HarmBench.git
cd HarmBench

# Download the Mistral-7b classifier
# Follow instructions at: https://github.com/centerforaisafety/HarmBench
# The classifier will be at: ~/models/HarmBench/models/harmbench-mistral-7b

# Return to project directory
cd ~/foundation-alignment-seed-4.1

Important: Update the classifier path in classify_full_harmbench.py:

# Line 20 - Update this path to your classifier location:
classifier_path = "/home/YOUR_USERNAME/models/HarmBench/models/harmbench-mistral-7b"

Step 5: Verify HarmBench JSON Files

Good news: All HarmBench behavior JSON files are already included in this repository at the root level:

foundation-alignment-seed-4.1/
├── harmbench_standard_behaviors.json      ✓ Included
├── harmbench_contextual_behaviors.json    ✓ Included  
├── harmbench_copyright_behaviors.json     ✓ Included
├── harmbench_multimodal_behaviors.json    ✓ Included (optional)
├── foundation_seed_complete.txt           ✓ SEED 4.1 Framework
├── generate_full_harmbench.py
├── classify_full_harmbench.py
└── regroup_full_harmbench.py

You don't need to download these separately - they're ready to use!

🧪 Running the Complete Test Suite

Phase 1: Generate Responses (3-6 hours)

# Activate environment
source venv/bin/activate

# Run response generation
python generate_full_harmbench.py

This will:

Load SEED 4.1 framework from foundation_seed_complete.txt
Load all 400 HarmBench test cases
Generate baseline (no SEED) and SEED-protected responses
Save outputs to responses_full_harmbench/
Support resume on interruption

Expected output structure:

responses_full_harmbench/
├── response_0000.json
├── response_0001.json
├── ...
└── response_0399.json

Phase 2: Classify Responses (2-4 hours)

# Run HarmBench classifier
python classify_full_harmbench.py

This will:

Load all generated responses
Apply official HarmBench classifier
Classify each response as harmful/safe
Verify SEED telemetry compliance
Save results to results_full_harmbench_official/

Phase 3: Generate Final Report (< 1 minute)

# Compile comprehensive results
python regroup_full_harmbench.py

This produces:

FINAL_RESULTS_v40_FULL_HARMBENCH.json - Complete analysis
Breakdown by functional category (standard, contextual, copyright)
Breakdown by semantic category (illegal, cybercrime, chemical/biological, etc.)
Individual test results with response previews

📊 Understanding Your Results

Key Metrics Explained

Attack Success Rate (ASR): Percentage of adversarial prompts that successfully elicited harmful responses

Lower is better
Baseline: 54.0%
SEED 4.1: 1.75%

Harm Reduction: (Baseline_ASR - SEED_ASR) / Baseline_ASR × 100

Our achievement: 96.76%

Telemetry Compliance: Percentage of SEED responses including full observability metadata

Our achievement: 98.75%

Result Categories

Functional Categories:

Standard: Direct adversarial prompts (200 tests)
Contextual: Context-dependent attacks (100 tests)
Copyright: Content reproduction attempts (100 tests)

Semantic Categories:

Chemical/Biological (56 tests)
Cybercrime/Intrusion (67 tests)
Illegal Activities (65 tests)
Misinformation/Disinformation (65 tests)
Harassment/Bullying (25 tests)
And 2 more categories

🔍 Troubleshooting

Common Issues

CUDA Out of Memory

# Reduce batch size or use model quantization
# Edit generate_full_harmbench.py, add:
torch_dtype=torch.float16,
load_in_8bit=True

Classifier Not Found

# Verify classifier path
ls ~/models/HarmBench/models/harmbench-mistral-7b

# Update path in classify_full_harmbench.py line 20

JSON Files Missing

# They should already be in the repo root
ls harmbench_*.json

# If missing, check you cloned the repo correctly

📚 Citation

If you use SEED 4.1 in your research:

@software{seed41_2025,
  title={SEED 4.1: The Lord's Prayer Kernel},
  author={Foundation Alignment Research},
  year={2025},
  note={96.76\% harm reduction on HarmBench}
}

🙏 Credits

"All glory to God alone - ☧ In Jesus' Name ☧"

Developed by Foundation Alignment Research Team

Built on theological principles from Scripture
Empirically validated on HarmBench benchmark
Open source for advancing AI safety

📄 License

MIT License - See LICENSE file for details

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Foundation Alignment SEED 4.1 - Complete Setup Guide

🎯 Overview

🏆 Results Summary

📋 Prerequisites

🚀 Complete Installation Guide (0-100)

Step 1: System Preparation

Step 2: Clone Repository

Step 3: Create Python Environment

Step 4: Download HarmBench Classifier

Step 5: Verify HarmBench JSON Files

🧪 Running the Complete Test Suite

Phase 1: Generate Responses (3-6 hours)

Phase 2: Classify Responses (2-4 hours)

Phase 3: Generate Final Report (< 1 minute)

📊 Understanding Your Results

Key Metrics Explained

Result Categories

🔍 Troubleshooting

Common Issues

📚 Citation

🙏 Credits

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
FINAL_RESULTS_FROM_TEAM.json		FINAL_RESULTS_FROM_TEAM.json
README.MD		README.MD
classify_full_harmbench.py		classify_full_harmbench.py
foundation_seed_complete.txt		foundation_seed_complete.txt
generate_full_harmbench.py		generate_full_harmbench.py
harmbench_contextual_behaviors.json		harmbench_contextual_behaviors.json
harmbench_copyright_behaviors.json		harmbench_copyright_behaviors.json
harmbench_standard_behaviors.json		harmbench_standard_behaviors.json
regroup_full_harmbench.py		regroup_full_harmbench.py

Folders and files

Latest commit

History

Repository files navigation

Foundation Alignment SEED 4.1 - Complete Setup Guide

🎯 Overview

🏆 Results Summary

📋 Prerequisites

🚀 Complete Installation Guide (0-100)

Step 1: System Preparation

Step 2: Clone Repository

Step 3: Create Python Environment

Step 4: Download HarmBench Classifier

Step 5: Verify HarmBench JSON Files

🧪 Running the Complete Test Suite

Phase 1: Generate Responses (3-6 hours)

Phase 2: Classify Responses (2-4 hours)

Phase 3: Generate Final Report (< 1 minute)

📊 Understanding Your Results

Key Metrics Explained

Result Categories

🔍 Troubleshooting

Common Issues

📚 Citation

🙏 Credits

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages