This repository contains the official PyTorch implementation for training and inference of MambaEye.
Key features of the architecture include:
- Flexible Image Understanding: Processes multi-resolution images and arbitrary aspect ratios, including partial images.
- Variable-Length Processing: Natively handles sequential inputs of varying lengths.
- Efficient Scaling: Achieves linear memory and computational complexity (by number of patches) powered by Mamba2 layers. (Constant memory for inference.)
- Update for Camera-Ready version.
- Write a blog post (if #of stars > 100)
Ensure you have Python 3.12+ installed and a CUDA-capable environment. We strongly recommend creating an isolated environment:
# Environment setup (first time only)
source scripts/env_setup.sh
# Activate the environment (after setup)
source .venv/bin/activate(Note: mamba-ssm==2.2.4 and causal-conv1d may require specific CUDA versions. Check the official Mamba repo if you face compilation issues. We tested this environment with RTX 4090, L40S, A100, H100, H200 GPUs.)
Test environment with python script:
import torch
from mamba_ssm import Mamba2
model=Mamba2(256, 4, 2, 1).cuda()
x = torch.randn(1, 10, 256).cuda()
y = model(x)All the model weights are uploaded in HuggingFace. You can download them from here.
| Model | Params(M) | Trained Sequence Length | Top-1 Accuracy (512x512) | Link |
|---|---|---|---|---|
| MambaEye-Tiny | 5.8 | 1024 | 66.2% | Link |
| MambaEye-Tiny (FT) | 5.8 | 2048 | 67.2% | Link |
| MambaEye-Small | 11.0 | 1024 | 72.7% | Link |
| MambaEye-Small (FT) | 11.0 | 2048 | 73.1% | Link |
| MambaEye-Base | 21.3 | 1024 | 73.5% | Link |
| MambaEye-Base (FT) | 21.3 | 2048 | 75.0% | Link |
To evaluate our model at different sequence lengths and resolutions (as reported in the paper), you can use the provided inference scripts:
For a single image, you can use the following command:
python eval.py \
image_path=/path/to/image.jpg \
ckpt_path=/path/to/checkpoint.ckpt \
scan_pattern=random \
resize_mode=none
# Use official model weights
python eval.py \
image_path=/path/to/image.jpg \
model_name=small-ftDownload ImageNet val dataset and organize it in the standard PyTorch format as Training section.
# Using a local checkpoint
python eval.py \
dataset.val.img_dir=/path/to/val \
ckpt_path=/path/to/checkpoint.ckpt \
scan_pattern=random \
resize_mode=none
# Or automatically download and use an official model by its alias
# (Options: tiny, tiny-ft, small, small-ft, base, base-ft)
python eval.py \
dataset.val.img_dir=/path/to/val \
model_name=base-ftOrganize your ImageNet dataset in the standard PyTorch format:
data/imagenet/
train/
n01440764/
n01440764_10026.JPEG
...
val/
n01440764/
ILSVRC2012_val_00000293.JPEG
...
We use PyTorch Lightning for training. All model layers and data settings are configured using Hydra, with YAML files located in configs/.
Important
Before training, please ensure your environment configurations are set appropriately. You can either modify the YAML files directly or override them via the command line:
- Dataset Path: Edit
configs/dataset/default.yamland setdataset.train.img_diranddataset.val.img_dirto point to your local ImageNet directories. - GPU Settings: Adjust
configs/trainer/default.yamland setdataloader.train.batch_sizeandtrainer.accumulate_grad_batchesdepending on your GPU memory. - W&B Logging: Add
wandb.entity=YOUR_ENTITY wandb.project=YOUR_PROJECTto enable Weights & Biases logging (otherwise it will fall back to CSV logging).
# Example: Training a 48-layer base model on ImageNet
python train.py model=base_48layersTo finetune an existing checkpoint:
python train.py \
model=base_48layers \
fine_tuning=true \
ckpt_path=/path/to/checkpoint.ckpt # or mambaeye_small.ptMambaEye/
├── assets/ # Asset files
├── configs/ # YAML configuration files
├── mambaeye/ # Core module
│ ├── __init__.py
│ ├── dataset.py # Dataset loading for ImageNet
│ ├── loss.py # Custom loss functions
│ ├── mambaeye_pl.py # PyTorch Lightning module definitions
│ ├── model.py # Core MambaEye SSM architecture
│ ├── positional_encoding.py # Positional encoding
│ └── scan.py # Scan pattern generation
├── scripts/ # Utility scripts
├── eval.py # Standard inference script
├── train.py # Training script for MambaEye
├── requirements.txt # Dependency requirements
└── README.md # This file
If you find this code or our paper useful for your research, please cite our CVPR 2026 Findings track paper:
@inproceedings{mambaeye2026,
title={MambaEye: A Size-Agnostic Visual Encoder with Causal Sequential Processing},
author={Changho Choi and Minho Kim and Jinkyu Kim},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Findings},
year={2026},
note={Accepted}
}
