COMPOSE

Hypergraph Cover Optimization for
Multi-view 3D Human Pose Estimation

Tony Danjun Wang · Tolga Birdal · Nassir Navab · Lennart Bastian

_{Technical University of Munich · Munich Center for Machine Learning · Imperial College London}

_{From pairwise to higher-order: COMPOSE filters hyperedges rather than pairwise edges (consistent relations in green), resolving cross-view association as a single hypergraph cover before triangulating 3D poses.}

COMPOSE is training-free: no 3D supervision, only 2D detections and camera calibration. It estimates 3D human poses from sparse, calibrated multi-view camera systems by casting cross-view association as a weighted exact-cover optimization over a hypergraph of person hypotheses. Rather than stitching pairwise matches together with post-hoc consistency checks, it unifies them into a single combinatorial objective — solved exactly with Integer Linear Programming or approximately (and fast) with loopy Belief Propagation — then triangulates the matched correspondences into 3D skeletons.

Installation

Important

Tested with Python 3.11 + CUDA 12.8 + PyTorch 2.7. Git LFS is required for the bundled 2D detections, and the Panoptic data scripts need ffmpeg and wget (or curl) on your PATH.

# Package manager: uv (https://docs.astral.sh/uv/)
curl -LsSf https://astral.sh/uv/install.sh | sh

git clone https://github.com/wngTn/COMPOSE && cd COMPOSE
git lfs install && git lfs pull   # pulls the bundled 2D detections
uv sync                           # creates the env and installs the `compose` package

Quickstart

The Panoptic test-split detections are bundled (data/detections/panoptic_test.pkl), so the pipeline runs end-to-end once the Panoptic images are in place (Data):

# Estimate 3D poses on the Panoptic test split
# (prints the run directory + ready-to-paste next-step commands when it finishes)
python tools/generate.py --config experiments/panoptic.yaml --split test

# Evaluate a completed run against ground truth
python tools/evaluate.py --run-dir ./output/panoptic/<run_dir>

# Visualize a run (omit --run-dir to pick the most recent run)
python tools/visualize.py --run-dir ./output/panoptic/<run_dir> --dataset panoptic

--config selects the dataset (panoptic, shelf, campus; mm_or ships as an optional, unpublished extra config that needs its own detections); --matching-solver is ilp (exact), bp (fast), or greedy (baseline). A CUDA GPU is recommended for triangulation and the BP solver.

Tip

Swap the solver, toggle video, or override any top-level config field ad hoc (cue parameters live in the YAML cues: block):

python tools/generate.py --config experiments/panoptic.yaml --split test --matching-solver bp --save-video --set lam=3.0 pixel_threshold=1024

Data

Evaluated on CMU Panoptic, Shelf, and Campus. Only the Panoptic test split runs out-of-the-box — its 2D detections are bundled; the other datasets need their own images and a detections pickle (see below). Download the Panoptic test images (several GB of HD video; the script defaults to the 4 test sequences and the 5 CMU0 cameras):

bash scripts/data/panoptic/0_get_data.sh   # download CMU Panoptic (test sequences, 5 cameras)
bash scripts/data/panoptic/1_extract.sh    # extract HD frames (needs ffmpeg)

Note

COMPOSE consumes per-frame, per-camera 2D detections (keypoints + bounding boxes, stored as a pickle) alongside each dataset's images and camera calibration. To use another dataset or your own data, place a detections pickle under data/ and point the split's detections_path in experiments/<dataset>.yaml at it.

Expected directory layout

data/
├── detections/
│   └── panoptic_test.pkl            # bundled via Git LFS
└── panoptic/
    └── <sequence>/
        ├── hdImgs/                  # 00_03/ 00_06/ 00_12/ 00_13/ 00_23/
        ├── hdPose3d_stage1_coco19/  # 3D ground truth
        └── calibration_<sequence>.json

Using your own 2D detections

A detections pickle is db[sequence][frame]["2D"][camera] -> dict, where camera is a "<node>_<panel>" string for Panoptic (e.g. "00_03") or an integer index otherwise. Each per-camera dict holds:

keypoints_xys: (M, J, 3) float — M detections, J joints, [x, y, confidence]
bbox_xywhs: (M, 5) float — [x, y, w, h, score]
(optional) <name>_features: (M, D) float — per-detection embeddings for a photometric cue (e.g. reid_features, dinov2_features)

J and the joint order follow the dataset's convention (compose/skeleton.py). Point the split's detections_path at your pickle. The scripts/data/<dataset>/ stage scripts produce these from raw images (detect → filter → optionally extract features).

Sanity-check a dataset's camera calibration with python tools/verify.py --dataset panoptic.

Configuration

Each dataset has one YAML in experiments/ with shared base parameters and a splits: section.

cues:
  - kind: geometric
    name: reprojection
    score_type: reprojection   # reprojection | epipolar | sampson | trifocal
    pixel_threshold: 1024.0    # px²; -1 disables geometric pre-filtering

Precedence is defaults < YAML base < YAML split < CLI flags, all merged by PipelineConfig.from_yaml. New cues are added by registering a class in compose.cues.CUE_REGISTRY.

Pipeline & key CLI flags

Each frame is processed by compose.pipeline.process_frame:

Hypergraph construction: grow a $V$-partite hypergraph of person hypotheses hierarchically over the valid 2D detections (level-by-level pruning, no $O(N^V)$ enumeration).
Cue scoring: rank every surviving hyperedge with a configurable cue chain (geometric, with optional photometric refinement).
Correspondence matching: select a cover via ILP (exact), Belief Propagation (approximate), or a greedy baseline.
Triangulation: algebraic SVD triangulation, Hungarian re-matching of leftover detections, and a final re-triangulation with per-joint worst-view dropout.

flag	meaning
`--config`	dataset YAML (required)
`--split`	dataset split (only `test` is shipped)
`--matching-solver`	`ilp` · `bp` · `greedy`
`--cameras`	camera preset (e.g. `CMU0`, `Shelf5`)
`--lam`	matching penalty per chosen hyperedge
`--pixel-threshold`	geometric pre-filter (px²)
`--save-video` / `--no-save-video`	toggle MP4 output
`--set KEY=VALUE ...`	override any top-level config field

Python API

The installable compose package exposes a small, stable API:

from compose import PipelineConfig, load_dataset, build_cues, process_frame

config = PipelineConfig.from_yaml("experiments/panoptic.yaml", split="test")
dataset = load_dataset(config)
cues = build_cues(config.cues, num_views=config.V, device="cuda")
# process_frame(...) runs the full per-frame pipeline and returns the 3D prediction.

Citation

@article{wang2026compose,
  title   = {COMPOSE: Hypergraph Cover Optimization for Multi-view 3D Human Pose Estimation},
  author  = {Wang, Tony Danjun and Birdal, Tolga and Navab, Nassir and Bastian, Lennart},
  journal = {arXiv preprint arXiv:2601.09698},
  year    = {2026},
}

License

Released under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
assets		assets
compose		compose
data/detections		data/detections
docs		docs
experiments		experiments
scripts/data		scripts/data
tools		tools
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

COMPOSE

Installation

Quickstart

Data

Configuration

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

COMPOSE

Installation

Quickstart

Data

Configuration

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages