MambaEye

A Size-Agnostic Visual Encoder with Causal Sequential Processing

Changho Choi¹ ,Minho Kim² ,Jinkyu Kim^1†

¹ Korea University ² MIT

^† Corresponding author.

Overview

This repository contains the official PyTorch implementation for training and inference of MambaEye.

Key features of the architecture include:

Flexible Image Understanding: Processes multi-resolution images and arbitrary aspect ratios, including partial images.
Variable-Length Processing: Natively handles sequential inputs of varying lengths.
Efficient Scaling: Achieves linear memory and computational complexity (by number of patches) powered by Mamba2 layers. (Constant memory for inference.)

TODO

Update for Camera-Ready version.
Write a blog post (if #of stars > 100)

Installation

Ensure you have Python 3.12+ installed and a CUDA-capable environment. We strongly recommend creating an isolated environment:

# Environment setup (first time only)
source scripts/env_setup.sh

# Activate the environment (after setup)
source .venv/bin/activate

(Note: mamba-ssm==2.2.4 and causal-conv1d may require specific CUDA versions. Check the official Mamba repo if you face compilation issues. We tested this environment with RTX 4090, L40S, A100, H100, H200 GPUs.)

Test environment with python script:

import torch
from mamba_ssm import Mamba2
model=Mamba2(256, 4, 2, 1).cuda()
x = torch.randn(1, 10, 256).cuda()
y = model(x)

Inference

Model Weights

All the model weights are uploaded in HuggingFace. You can download them from here.

Model	Params(M)	Trained Sequence Length	Top-1 Accuracy (512x512)	Link
MambaEye-Tiny	5.8	1024	66.2%	Link
MambaEye-Tiny (FT)	5.8	2048	67.2%	Link
MambaEye-Small	11.0	1024	72.7%	Link
MambaEye-Small (FT)	11.0	2048	73.1%	Link
MambaEye-Base	21.3	1024	73.5%	Link
MambaEye-Base (FT)	21.3	2048	75.0%	Link

Inference Command

To evaluate our model at different sequence lengths and resolutions (as reported in the paper), you can use the provided inference scripts:

Single Image

For a single image, you can use the following command:

python eval.py \
    image_path=/path/to/image.jpg \
    ckpt_path=/path/to/checkpoint.ckpt \
    scan_pattern=random \
    resize_mode=none

# Use official model weights
python eval.py \
    image_path=/path/to/image.jpg \
    model_name=small-ft

ImageNet Validation Set

Download ImageNet val dataset and organize it in the standard PyTorch format as Training section.

# Using a local checkpoint
python eval.py \
    dataset.val.img_dir=/path/to/val \
    ckpt_path=/path/to/checkpoint.ckpt \
    scan_pattern=random \
    resize_mode=none

# Or automatically download and use an official model by its alias
# (Options: tiny, tiny-ft, small, small-ft, base, base-ft)
python eval.py \
    dataset.val.img_dir=/path/to/val \
    model_name=base-ft

Training

Data Preparation

Organize your ImageNet dataset in the standard PyTorch format:

data/imagenet/
  train/
    n01440764/
      n01440764_10026.JPEG
      ...
  val/
    n01440764/
      ILSVRC2012_val_00000293.JPEG
      ...

Training Command

We use PyTorch Lightning for training. All model layers and data settings are configured using Hydra, with YAML files located in configs/.

Important

Before training, please ensure your environment configurations are set appropriately. You can either modify the YAML files directly or override them via the command line:

Dataset Path: Edit configs/dataset/default.yaml and set dataset.train.img_dir and dataset.val.img_dir to point to your local ImageNet directories.
GPU Settings: Adjust configs/trainer/default.yaml and set dataloader.train.batch_size and trainer.accumulate_grad_batches depending on your GPU memory.
W&B Logging: Add wandb.entity=YOUR_ENTITY wandb.project=YOUR_PROJECT to enable Weights & Biases logging (otherwise it will fall back to CSV logging).

# Example: Training a 48-layer base model on ImageNet
python train.py model=base_48layers

To finetune an existing checkpoint:

python train.py \
  model=base_48layers \
  fine_tuning=true \
  ckpt_path=/path/to/checkpoint.ckpt # or mambaeye_small.pt

Project Structure

MambaEye/
├── assets/                       # Asset files
├── configs/                      # YAML configuration files
├── mambaeye/                     # Core module
│   ├── __init__.py
│   ├── dataset.py                # Dataset loading for ImageNet
│   ├── loss.py                   # Custom loss functions
│   ├── mambaeye_pl.py            # PyTorch Lightning module definitions
│   ├── model.py                  # Core MambaEye SSM architecture
│   ├── positional_encoding.py    # Positional encoding
│   └── scan.py                   # Scan pattern generation
├── scripts/                      # Utility scripts
├── eval.py                       # Standard inference script
├── train.py                      # Training script for MambaEye
├── requirements.txt              # Dependency requirements
└── README.md                     # This file

Citation

If you find this code or our paper useful for your research, please cite our CVPR 2026 Findings track paper:

@inproceedings{mambaeye2026,
  title={MambaEye: A Size-Agnostic Visual Encoder with Causal Sequential Processing},
  author={Changho Choi and Minho Kim and Jinkyu Kim},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Findings},
  year={2026},
  note={Accepted}
}

Reference

Official Mamba Implementation

License

MIT License

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MambaEye

A Size-Agnostic Visual Encoder with Causal Sequential Processing

Overview

TODO

Installation

Inference

Model Weights

Inference Command

Single Image

ImageNet Validation Set

Training

Data Preparation

Training Command

Project Structure

Citation

Reference

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
assets		assets
configs		configs
mambaeye		mambaeye
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
eval.py		eval.py
requirements.txt		requirements.txt
train.py		train.py

Folders and files

Latest commit

History

Repository files navigation

MambaEye

A Size-Agnostic Visual Encoder with Causal Sequential Processing

Overview

TODO

Installation

Inference

Model Weights

Inference Command

Single Image

ImageNet Validation Set

Training

Data Preparation

Training Command

Project Structure

Citation

Reference

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages