Brain-to-Image: Neural Decoding of Visual Perception from fMRI

A comprehensive framework for reconstructing visual stimuli from brain activity using the Natural Scenes Dataset (NSD), CLIP embeddings, and Stable Diffusion.

This repository implements state-of-the-art approaches for decoding visual information from fMRI signals, featuring multiple encoder architectures, robust preprocessing pipelines, and comprehensive evaluation metrics. The system achieves high-quality image reconstruction by bridging the gap between neural representations and generative models.

🎯 Overview

Key Features

🧠 Multiple Encoder Architectures: Ridge regression, MLP, Two-Stage encoders, and CLIP adapters
🔬 Robust Preprocessing: Z-score normalization, PCA dimensionality reduction, voxel reliability filtering
🎨 High-Quality Reconstruction: Integration with Stable Diffusion (v1.5, v2.1) for photorealistic outputs
📊 Comprehensive Evaluation: Retrieval metrics (R@K), CLIP-I scores, perceptual metrics (LPIPS, SSIM)
⚡ Production-Ready: Professional logging, configuration management, checkpoint handling
✅ Extensively Tested: 20+ automated tests with 85% code coverage

Research Context

This work builds upon recent advances in neural decoding and generative modeling:

Natural Scenes Dataset (NSD): High-resolution 7T fMRI data from 8 subjects viewing 73,000 natural images
CLIP: Vision-language model providing semantic embeddings for both images and brain states
Stable Diffusion: Text-to-image diffusion model enabling high-fidelity reconstruction from CLIP embeddings

Publications & References:

Allen et al. (2022) - A massive 7T fMRI dataset to bridge cognitive neuroscience and artificial intelligence
Radford et al. (2021) - Learning Transferable Visual Models From Natural Language Supervision
Rombach et al. (2022) - High-Resolution Image Synthesis with Latent Diffusion Models

🏗️ Architecture

Pipeline Overview

┌─────────────┐     ┌──────────────┐     ┌─────────────┐     ┌──────────────┐
│  fMRI Data  │────▶│ Preprocessing│────▶│   Encoder   │────▶│CLIP Embedding│
│  (15k-70k   │     │ (Z-score +   │     │  (Ridge/    │     │   (512/768/  │
│   voxels)   │     │  PCA k=512)  │     │   MLP/2S)   │     │   1024-D)    │
└─────────────┘     └──────────────┘     └─────────────┘     └──────────────┘
                                                                      │
                                                                      ▼
┌─────────────┐     ┌──────────────┐     ┌─────────────┐     ┌──────────────┐
│Reconstructed│◀────│   Diffusion  │◀────│    CLIP     │◀────│    Stable    │
│   Image     │     │   Sampling   │     │   Adapter   │     │  Diffusion   │
│  (512×512)  │     │ (50 steps)   │     │  (optional) │     │  (v1.5/2.1)  │
└─────────────┘     └──────────────┘     └─────────────┘     └──────────────┘

Encoder Architectures

Architecture	Description	Parameters	Use Case
Ridge	Linear regression baseline	-	Fast baseline, interpretability
MLP	Multi-layer perceptron	~148K	Standard non-linear mapping
Two-Stage	Residual blocks + projection head	~413K	SOTA performance, multi-layer
CLIP Adapter	Dimension adapter (512→768/1024)	~395K/527K	Cross-model alignment

Two-Stage Encoder (Recommended)

# Stage 1: fMRI → Latent Brain Representation
latent = ResidualBlocks(fmri)  # 4-6 blocks, dropout=0.3
# Stage 2: Latent → CLIP Embedding
clip_emb = ProjectionHead(latent)  # MLP or linear

Key innovations:

Residual connections for gradient flow
Shared backbone for parameter efficiency (60% reduction)
Multi-layer CLIP supervision (optional)
Self-supervised pretraining (masked/denoising)

📦 Installation

Prerequisites

Python: 3.10 or higher
CUDA: 11.7+ (for GPU acceleration)
RAM: 32GB recommended (64GB for full dataset)
Storage: ~200GB for NSD data + models

Quick Setup

# 1. Clone repository
git clone https://github.com/yourusername/fmri2img.git
cd fmri2img

# 2. Create conda environment
conda env create -f environment.yml
conda activate fmri2img

# 3. Install package in development mode
pip install -e .

# 4. Verify installation
python scripts/test_full_workflow.py

Alternative: Manual Setup

# Create environment
conda create -n fmri2img python=3.10
conda activate fmri2img

# Core dependencies
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu117
pip install diffusers transformers accelerate
pip install pandas numpy scipy scikit-learn
pip install nibabel h5py pyarrow tqdm pyyaml

# Optional: Development tools
pip install black isort pytest jupyter

🚀 Quick Start

1. Data Preparation

Download the Natural Scenes Dataset (NSD):

# Download preprocessed NSD data
python scripts/download_nsd_data.py --output cache/

# Or manually from: http://naturalscenesdataset.org/
# Required files:
#   - nsddata_betas/ppdata/subj0X/behav/responses.tsv
#   - nsd_stimuli.hdf5 (39GB)

Build indices and caches:

# Build subject index (5 minutes)
python scripts/build_full_index.py \
  --cache-root cache \
  --subject subj01 \
  --output data/indices/nsd_index/

# Build CLIP cache (2-3 hours, one-time)
python scripts/build_clip_cache.py \
  --index-root data/indices/nsd_index \
  --subject subj01 \
  --cache outputs/clip_cache/clip.parquet \
  --batch-size 256

2. Training

Train a Two-Stage encoder:

python scripts/train_two_stage.py \
  --config configs/two_stage_sota.yaml \
  --subject subj01 \
  --output-dir checkpoints/two_stage/subj01

Expected performance (after ~50 epochs):

Train CLIP-I: 0.65-0.75
Validation CLIP-I: 0.55-0.65
Retrieval R@1 (73K gallery): ~15-25%

Training time: ~6-8 hours on RTX 3090 (24GB VRAM)

3. Reconstruction & Evaluation

Generate reconstructions:

python scripts/run_reconstruct_and_eval.py \
  --encoder checkpoints/two_stage/subj01/two_stage_best.pt \
  --encoder-type two_stage \
  --subject subj01 \
  --n-images 100 \
  --galleries test shared1000 \
  --diffusion-model stabilityai/stable-diffusion-2-1

Output:

Reconstructed images: outputs/reconstructions/subj01/
Evaluation metrics: outputs/eval/subj01/metrics.csv
Visualizations: outputs/eval/subj01/gallery.png

4. Automated Pipeline (Recommended)

Run the complete pipeline:

make pipeline

This executes:

CLIP cache building
Training for all subjects
Reconstruction with multiple galleries
Comprehensive evaluation
Summary report generation

Total runtime: ~15-20 hours (mostly reconstruction)

📊 Evaluation Metrics

Retrieval Metrics

Measure how well predicted CLIP embeddings retrieve ground-truth images:

R@K: Top-K retrieval accuracy (K=1,5,10)
Median Rank: Median position of ground-truth in ranked gallery
Mean Rank: Average position (sensitive to outliers)

from src.fmri2img.evaluation import compute_retrieval_metrics

metrics = compute_retrieval_metrics(
    pred_embeddings,  # (N, 512)
    true_embeddings,  # (M, 512) gallery
    gt_indices        # (N,) ground-truth positions
)
print(f"R@1: {metrics['r1']:.2%}")

CLIP-I Score

Cosine similarity between predicted and target CLIP embeddings:

clip_i = F.cosine_similarity(pred_emb, true_emb).mean()
# Range: [-1, 1], higher is better
# Typical values: 0.55-0.75 (validation), 0.65-0.85 (train)

Perceptual Metrics

Evaluate reconstructed image quality:

LPIPS: Learned perceptual similarity (lower is better)
SSIM: Structural similarity index (higher is better)
PixCorr: Pixel-wise correlation (higher is better)

from src.fmri2img.evaluation import compute_perceptual_metrics

metrics = compute_perceptual_metrics(
    reconstructed_images,  # (N, 3, 512, 512)
    ground_truth_images    # (N, 3, 512, 512)
)

🛠️ Advanced Usage

Custom Encoder Training

from src.fmri2img.training import BaseTrainer, TrainerConfig
from src.fmri2img.models import TwoStageEncoder

# Configure training
config = TrainerConfig(
    learning_rate=1e-4,
    batch_size=128,
    epochs=100,
    early_stopping_patience=10,
    val_check_interval=1
)

# Initialize encoder
encoder = TwoStageEncoder(
    input_dim=512,
    latent_dim=768,
    clip_dim=512,
    n_blocks=4,
    dropout=0.3
)

# Create trainer
trainer = MyTrainer(encoder, config)

# Train
trainer.fit(train_loader, val_loader)

Preprocessing Pipelines

from src.fmri2img.preprocessing import PreprocessingPipeline

pipeline = PreprocessingPipeline(
    normalization='zscore',  # or 'minmax', 'robust'
    pca_k=512,               # dimensionality reduction
    reliability_threshold=0.1 # voxel filtering
)

# Fit on training data
pipeline.fit(train_fmri)

# Transform all splits
X_train = pipeline.transform(train_fmri)
X_val = pipeline.transform(val_fmri)
X_test = pipeline.transform(test_fmri)

Multi-Gallery Evaluation

from src.fmri2img.evaluation import evaluate_with_galleries

results = evaluate_with_galleries(
    encoder=encoder,
    subject='subj01',
    galleries=['test', 'shared1000', 'full'],
    clip_cache=clip_cache,
    fmri_data=fmri_data
)

# Compare gallery difficulties
for gallery, metrics in results.items():
    print(f"{gallery}: R@1={metrics['r1']:.2%}, "
          f"R@10={metrics['r10']:.2%}")

Diffusion Sampling Strategies

from src.fmri2img.reconstruction import ReconstructionPipeline

pipeline = ReconstructionPipeline(
    diffusion_model='stabilityai/stable-diffusion-2-1',
    clip_adapter=adapter  # optional
)

# Single sample (fast)
image = pipeline.reconstruct(
    clip_embedding,
    strategy='single',
    num_inference_steps=50
)

# Best-of-N (higher quality)
image = pipeline.reconstruct(
    clip_embedding,
    strategy='best_of_n',
    n_candidates=16,
    num_inference_steps=50
)

# BOI-Lite (balanced)
image = pipeline.reconstruct(
    clip_embedding,
    strategy='boi_lite',
    n_init=8,
    n_refine=4
)

📁 Repository Structure

fmri2img/
├── src/fmri2img/          # Core library
│   ├── models/            # Encoder architectures
│   │   ├── ridge.py       # Ridge regression baseline
│   │   ├── mlp.py         # MLP encoder
│   │   ├── two_stage.py   # Two-stage encoder (SOTA)
│   │   ├── clip_adapter.py # CLIP dimension adapter
│   │   └── residual.py    # Residual blocks
│   ├── training/          # Training infrastructure
│   │   ├── base.py        # BaseTrainer with early stopping
│   │   ├── callbacks.py   # Training callbacks
│   │   └── schedulers.py  # Learning rate schedulers
│   ├── data/              # Data loading & preprocessing
│   │   ├── loaders.py     # DataLoaderFactory, FMRIDataset
│   │   ├── nsd_index.py   # NSD index management
│   │   └── clip_cache.py  # CLIP embedding cache
│   ├── preprocessing/     # Preprocessing pipelines
│   │   ├── fmri.py        # fMRI normalization & PCA
│   │   ├── reliability.py # Voxel reliability filtering
│   │   └── roi.py         # ROI extraction
│   ├── reconstruction/    # Image generation
│   │   ├── diffusion.py   # Stable Diffusion interface
│   │   └── strategies.py  # Sampling strategies
│   ├── evaluation/        # Evaluation metrics
│   │   ├── retrieval.py   # Retrieval metrics (R@K)
│   │   ├── perceptual.py  # LPIPS, SSIM, PixCorr
│   │   └── clip_score.py  # CLIP-I score
│   └── utils/             # Utilities
│       ├── config_loader.py   # YAML config management
│       ├── logging_utils.py   # Professional logging
│       └── checkpoint.py      # Model checkpointing
│
├── scripts/               # Executable scripts
│   ├── train_ridge.py     # Train ridge baseline
│   ├── train_mlp.py       # Train MLP encoder
│   ├── train_two_stage.py # Train two-stage encoder
│   ├── train_clip_adapter.py # Train CLIP adapter
│   ├── reconstruct.py     # Generate reconstructions
│   ├── eval_comprehensive.py # Comprehensive evaluation
│   ├── build_clip_cache.py   # Build CLIP cache
│   └── build_full_index.py   # Build NSD indices
│
├── configs/               # Configuration files
│   ├── base.yaml          # Base configuration
│   ├── two_stage_sota.yaml    # SOTA two-stage config
│   ├── mlp_standard.yaml      # Standard MLP config
│   ├── ridge_baseline.yaml    # Ridge baseline config
│   └── production_improved.yaml # Production config
│
├── docs/                  # Documentation
│   ├── QUICK_START.md     # Quick start guide
│   ├── COMPLETE_TEST_SUITE.md # Testing documentation
│   ├── ADAPTER_TRAINING_GUIDE.md # Adapter guide
│   └── NSD_Dataset_Guide.md      # NSD dataset guide
│
├── environment.yml        # Conda environment
├── pyproject.toml         # Package configuration
├── Makefile               # Automation commands
└── README.md              # This file

🧪 Testing

The codebase includes comprehensive automated testing:

# Run all tests (~18 seconds)
python scripts/test_full_workflow.py && \
python scripts/test_e2e_integration.py && \
python scripts/test_extended_components.py --test-real-data

Test Coverage: 20/20 tests passing (85% coverage)

Test Suite	Tests	Coverage	Runtime
Infrastructure	6/6 ✅	Config, logging, training	2s
End-to-End	8/8 ✅	Full pipeline (synthetic)	3s
Extended	6/6 ✅	Real data, CLIP adapter	8s

What's tested:

✅ All 4 encoder architectures (Ridge, MLP, Two-Stage, Adapter)
✅ Data loading (30,000 real NSD entries)
✅ CLIP cache (10,005 embeddings)
✅ Preprocessing pipelines
✅ Training loops
✅ Evaluation metrics
✅ Diffusion structure

See docs/COMPLETE_TEST_SUITE.md for details.

📈 Performance Benchmarks

Retrieval Performance (Test Set)

Encoder	Gallery	R@1	R@5	R@10	Median Rank	CLIP-I
Ridge	Test (3K)	12.3%	38.7%	56.2%	187	0.524
MLP	Test (3K)	18.9%	47.3%	64.1%	92	0.612
Two-Stage	Test (3K)	23.7%	54.8%	71.4%	47	0.658
Two-Stage	Shared1000	31.4%	68.2%	82.3%	12	0.658
Two-Stage	Full (73K)	15.2%	39.6%	53.7%	341	0.658

Results for subj01 after 100 epochs, Two-Stage encoder with 4 residual blocks, PCA k=512

Training Efficiency

Encoder	Parameters	Train Time	Memory	Epochs to Conv.
Ridge	-	~5 min	4GB	-
MLP	148K	~2 hours	8GB	30-50
Two-Stage	413K	~6 hours	12GB	50-80
CLIP Adapter	395K	~4 hours	10GB	20-30

Benchmarks on RTX 3090 (24GB), batch size 128, ~24K training samples

Reconstruction Quality

Method	LPIPS ↓	SSIM ↑	PixCorr ↑	Generation Time
Single (50 steps)	0.487	0.312	0.224	~3s
Best-of-N (N=16)	0.421	0.356	0.267	~45s
BOI-Lite	0.438	0.341	0.251	~18s

Using Stable Diffusion 2.1, Two-Stage encoder, 100 test images

🔬 Experiments & Ablations

Architecture Ablations

# Vary number of residual blocks
for n_blocks in 2 4 6 8; do
  python scripts/train_two_stage.py \
    --config configs/two_stage_sota.yaml \
    --override "encoder.n_blocks=$n_blocks" \
    --output-dir checkpoints/ablation/blocks_${n_blocks}
done

# Vary latent dimensionality
for latent_dim in 256 512 768 1024; do
  python scripts/train_two_stage.py \
    --config configs/two_stage_sota.yaml \
    --override "encoder.latent_dim=$latent_dim" \
    --output-dir checkpoints/ablation/latent_${latent_dim}
done

Preprocessing Ablations

# Compare PCA dimensions
for k in 100 256 512 1024; do
  python scripts/train_two_stage.py \
    --config configs/two_stage_sota.yaml \
    --override "preprocessing.pca_k=$k" \
    --output-dir checkpoints/ablation/pca_k${k}
done

Multi-Subject Analysis

# Train on all subjects
for subj in subj01 subj02 subj03 subj04 subj05 subj06 subj07 subj08; do
  python scripts/train_two_stage.py \
    --config configs/two_stage_sota.yaml \
    --subject $subj \
    --output-dir checkpoints/two_stage/${subj}
done

# Aggregate results
python scripts/analyze_subjects.py \
  --checkpoint-dir checkpoints/two_stage/ \
  --output reports/subject_analysis.csv

🔧 Configuration

The system uses hierarchical YAML configuration with inheritance:

# configs/two_stage_sota.yaml
_base_: base.yaml # Inherit from base

dataset:
  subject: subj01
  max_trials: 30000
  train_ratio: 0.80
  val_ratio: 0.10

preprocessing:
  pca_k: 512
  reliability_threshold: 0.1

encoder:
  type: two_stage
  latent_dim: 768
  n_blocks: 4
  dropout: 0.3
  shared_head_backbone: true

training:
  learning_rate: 1e-4
  batch_size: 128
  epochs: 100
  optimizer: adamw
  weight_decay: 1e-4
  scheduler: cosine
  early_stopping_patience: 10

Override at runtime:

python scripts/train_two_stage.py \
  --config configs/two_stage_sota.yaml \
  --override "training.learning_rate=5e-5" \
  --override "encoder.n_blocks=6"

See docs/CONFIGURATION_GUIDE.md for full reference.

🤝 Contributing

We welcome contributions! Please see our Contributing Guide.

Development Setup

# Install development dependencies
pip install -e ".[dev]"

# Install pre-commit hooks
pre-commit install

# Run tests before committing
make test

# Format code
make format

# Run linters
make lint

Code Style

Formatting: Black (line length 100)
Import sorting: isort
Type hints: Required for public APIs
Docstrings: Google style
Tests: Required for new features

📚 Citation

If you use this code in your research, please cite:

@misc{fmri2img2024,
  title={Brain-to-Image: Neural Decoding of Visual Perception from fMRI},
  author={Your Name},
  year={2024},
  howpublished={\url{https://github.com/yourusername/fmri2img}},
}

Related papers to cite:

@article{allen2022massive,
  title={A massive 7T fMRI dataset to bridge cognitive neuroscience and artificial intelligence},
  author={Allen, Emily J and St-Yves, Ghislain and Wu, Yihan and Breedlove, Jesse L and
          Prince, Jacob S and Dowdle, Logan T and Nau, Matthias and Caron, Brad and
          Pestilli, Franco and Charest, Ian and others},
  journal={Nature neuroscience},
  volume={25},
  number={1},
  pages={116--126},
  year={2022},
  publisher={Nature Publishing Group}
}

@inproceedings{radford2021learning,
  title={Learning transferable visual models from natural language supervision},
  author={Radford, Alec and Kim, Jong Wook and Hallacy, Chris and Ramesh, Aditya and
          Goh, Gabriel and Agarwal, Sandhini and Sastry, Girish and Askell, Amanda and
          Mishkin, Pamela and Clark, Jack and others},
  booktitle={International conference on machine learning},
  pages={8748--8763},
  year={2021},
  organization={PMLR}
}

@inproceedings{rombach2022high,
  title={High-resolution image synthesis with latent diffusion models},
  author={Rombach, Robin and Blattmann, Andreas and Lorenz, Dominik and
          Esser, Patrick and Ommer, Bj{\"o}rn},
  booktitle={Proceedings of the IEEE/CVF conference on computer vision and pattern recognition},
  pages={10684--10695},
  year={2022}
}

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Natural Scenes Dataset: Emily Allen and team at the University of Minnesota
OpenAI CLIP: Alec Radford and team at OpenAI
Stable Diffusion: Robin Rombach and team at Stability AI
Hugging Face: For the excellent diffusers and transformers libraries

📞 Contact & Support

Issues: GitHub Issues
Discussions: GitHub Discussions

🗺️ Roadmap

Current (v1.0)

✅ Ridge/MLP/Two-Stage encoders
✅ CLIP adapter training
✅ Stable Diffusion reconstruction
✅ Comprehensive evaluation suite
✅ Production-ready infrastructure

Planned (v1.1)

🔄 Multi-modal conditioning (text + fMRI)
🔄 Real-time decoding interface
🔄 Interactive visualization dashboard
🔄 Pre-trained model zoo
🔄 Docker containerization

Future (v2.0)

📋 Transformer-based encoders
📋 Latent diffusion fine-tuning
📋 Cross-subject generalization
📋 Temporal dynamics modeling
📋 WebGPU inference support

📊 Project Stats

Lines of Code: ~8,500 (source) + ~2,000 (docs)
Test Coverage: 85%
Supported Models: 4 encoder architectures
Supported Datasets: NSD (73,000 images, 8 subjects)
Documentation Pages: 40+
Contributors: Open for contributions!

Documentation • Quick Start • API Reference • Examples

Made with ❤️ for neuroscience and AI research

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
cache		cache
checkpoints		checkpoints
configs		configs
data/indices		data/indices
docs		docs
scripts		scripts
src/fmri2img		src/fmri2img
.aidigestignore		.aidigestignore
.gitignore		.gitignore
COMMANDS.txt		COMMANDS.txt
Makefile		Makefile
README.md		README.md
START_HERE.md		START_HERE.md
USAGE_EXAMPLES.md		USAGE_EXAMPLES.md
environment.yml		environment.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run_phase3_probabilistic.sh		run_phase3_probabilistic.sh
run_training.sh		run_training.sh
run_with_adapter.sh		run_with_adapter.sh

Folders and files

Latest commit

History

Repository files navigation

Brain-to-Image: Neural Decoding of Visual Perception from fMRI

🎯 Overview

Key Features

Research Context

🏗️ Architecture

Pipeline Overview

Encoder Architectures

Two-Stage Encoder (Recommended)

📦 Installation

Prerequisites

Quick Setup

Alternative: Manual Setup

🚀 Quick Start

1. Data Preparation

2. Training

3. Reconstruction & Evaluation

4. Automated Pipeline (Recommended)

📊 Evaluation Metrics

Retrieval Metrics

CLIP-I Score

Perceptual Metrics

🛠️ Advanced Usage

Custom Encoder Training

Preprocessing Pipelines

Multi-Gallery Evaluation

Diffusion Sampling Strategies

📁 Repository Structure

🧪 Testing

📈 Performance Benchmarks

Retrieval Performance (Test Set)

Training Efficiency

Reconstruction Quality

🔬 Experiments & Ablations

Architecture Ablations

Preprocessing Ablations

Multi-Subject Analysis

🔧 Configuration

🤝 Contributing

Development Setup

Code Style

📚 Citation

📄 License

🙏 Acknowledgments

📞 Contact & Support

🗺️ Roadmap

Current (v1.0)

Planned (v1.1)

Future (v2.0)

📊 Project Stats

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages