Skip to content

usingcolor/MambaEye

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MambaEye

A Size-Agnostic Visual Encoder with Causal Sequential Processing

Changho Choi1 ,Minho Kim2 ,Jinkyu Kim1†

1 Korea University 2 MIT

Corresponding author.

Venue HuggingFace Checkpoints arXiv Project Page License

Overview

This repository contains the official PyTorch implementation for training and inference of MambaEye.

Key features of the architecture include:

  • Flexible Image Understanding: Processes multi-resolution images and arbitrary aspect ratios, including partial images.
  • Variable-Length Processing: Natively handles sequential inputs of varying lengths.
  • Efficient Scaling: Achieves linear memory and computational complexity (by number of patches) powered by Mamba2 layers. (Constant memory for inference.)

TODO

  • Update for Camera-Ready version.
  • Write a blog post (if #of stars > 100)

Installation

Ensure you have Python 3.12+ installed and a CUDA-capable environment. We strongly recommend creating an isolated environment:

# Environment setup (first time only)
source scripts/env_setup.sh

# Activate the environment (after setup)
source .venv/bin/activate

(Note: mamba-ssm==2.2.4 and causal-conv1d may require specific CUDA versions. Check the official Mamba repo if you face compilation issues. We tested this environment with RTX 4090, L40S, A100, H100, H200 GPUs.)

Test environment with python script:

import torch
from mamba_ssm import Mamba2
model=Mamba2(256, 4, 2, 1).cuda()
x = torch.randn(1, 10, 256).cuda()
y = model(x)

Inference

Model Weights

All the model weights are uploaded in HuggingFace. You can download them from here.

Model Params(M) Trained Sequence Length Top-1 Accuracy (512x512) Link
MambaEye-Tiny 5.8 1024 66.2% Link
MambaEye-Tiny (FT) 5.8 2048 67.2% Link
MambaEye-Small 11.0 1024 72.7% Link
MambaEye-Small (FT) 11.0 2048 73.1% Link
MambaEye-Base 21.3 1024 73.5% Link
MambaEye-Base (FT) 21.3 2048 75.0% Link

Inference Command

To evaluate our model at different sequence lengths and resolutions (as reported in the paper), you can use the provided inference scripts:

Single Image

For a single image, you can use the following command:

python eval.py \
    image_path=/path/to/image.jpg \
    ckpt_path=/path/to/checkpoint.ckpt \
    scan_pattern=random \
    resize_mode=none

# Use official model weights
python eval.py \
    image_path=/path/to/image.jpg \
    model_name=small-ft

ImageNet Validation Set

Download ImageNet val dataset and organize it in the standard PyTorch format as Training section.

# Using a local checkpoint
python eval.py \
    dataset.val.img_dir=/path/to/val \
    ckpt_path=/path/to/checkpoint.ckpt \
    scan_pattern=random \
    resize_mode=none

# Or automatically download and use an official model by its alias
# (Options: tiny, tiny-ft, small, small-ft, base, base-ft)
python eval.py \
    dataset.val.img_dir=/path/to/val \
    model_name=base-ft

Training

Data Preparation

Organize your ImageNet dataset in the standard PyTorch format:

data/imagenet/
  train/
    n01440764/
      n01440764_10026.JPEG
      ...
  val/
    n01440764/
      ILSVRC2012_val_00000293.JPEG
      ...

Training Command

We use PyTorch Lightning for training. All model layers and data settings are configured using Hydra, with YAML files located in configs/.

Important

Before training, please ensure your environment configurations are set appropriately. You can either modify the YAML files directly or override them via the command line:

# Example: Training a 48-layer base model on ImageNet
python train.py model=base_48layers

To finetune an existing checkpoint:

python train.py \
  model=base_48layers \
  fine_tuning=true \
  ckpt_path=/path/to/checkpoint.ckpt # or mambaeye_small.pt

Project Structure

MambaEye/
├── assets/                       # Asset files
├── configs/                      # YAML configuration files
├── mambaeye/                     # Core module
│   ├── __init__.py
│   ├── dataset.py                # Dataset loading for ImageNet
│   ├── loss.py                   # Custom loss functions
│   ├── mambaeye_pl.py            # PyTorch Lightning module definitions
│   ├── model.py                  # Core MambaEye SSM architecture
│   ├── positional_encoding.py    # Positional encoding
│   └── scan.py                   # Scan pattern generation
├── scripts/                      # Utility scripts
├── eval.py                       # Standard inference script
├── train.py                      # Training script for MambaEye
├── requirements.txt              # Dependency requirements
└── README.md                     # This file

Citation

If you find this code or our paper useful for your research, please cite our CVPR 2026 Findings track paper:

@inproceedings{mambaeye2026,
  title={MambaEye: A Size-Agnostic Visual Encoder with Causal Sequential Processing},
  author={Changho Choi and Minho Kim and Jinkyu Kim},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Findings},
  year={2026},
  note={Accepted}
}

Reference

License

MIT License

About

[CVPR 2026 Findings] MambaEye: A Size-Agnostic Visual Encoder with Causal Sequential Processing

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages