Skip to content

wngTn/COMPOSE

Repository files navigation

COMPOSE

Hypergraph Cover Optimization for
Multi-view 3D Human Pose Estimation

Tony Danjun Wang  ·  Tolga Birdal  ·  Nassir Navab  ·  Lennart Bastian

Technical University of Munich  ·  Munich Center for Machine Learning  ·  Imperial College London

Project Page arXiv Paper BibTeX

Python 3.11 PyTorch 2.7 CUDA 12.8 MIT License

COMPOSE replaces pairwise correspondence filtering with higher-order hyperedge filtering: geometrically consistent cross-view relations form a single hypergraph cover that is then triangulated into 3D poses.

From pairwise to higher-order: COMPOSE filters hyperedges rather than pairwise edges (consistent relations in green), resolving cross-view association as a single hypergraph cover before triangulating 3D poses.


COMPOSE is training-free: no 3D supervision, only 2D detections and camera calibration. It estimates 3D human poses from sparse, calibrated multi-view camera systems by casting cross-view association as a weighted exact-cover optimization over a hypergraph of person hypotheses. Rather than stitching pairwise matches together with post-hoc consistency checks, it unifies them into a single combinatorial objective — solved exactly with Integer Linear Programming or approximately (and fast) with loopy Belief Propagation — then triangulates the matched correspondences into 3D skeletons.

Installation

Important

Tested with Python 3.11 + CUDA 12.8 + PyTorch 2.7. Git LFS is required for the bundled 2D detections, and the Panoptic data scripts need ffmpeg and wget (or curl) on your PATH.

# Package manager: uv (https://docs.astral.sh/uv/)
curl -LsSf https://astral.sh/uv/install.sh | sh

git clone https://github.com/wngTn/COMPOSE && cd COMPOSE
git lfs install && git lfs pull   # pulls the bundled 2D detections
uv sync                           # creates the env and installs the `compose` package

Quickstart

The Panoptic test-split detections are bundled (data/detections/panoptic_test.pkl), so the pipeline runs end-to-end once the Panoptic images are in place (Data):

# Estimate 3D poses on the Panoptic test split
# (prints the run directory + ready-to-paste next-step commands when it finishes)
python tools/generate.py --config experiments/panoptic.yaml --split test

# Evaluate a completed run against ground truth
python tools/evaluate.py --run-dir ./output/panoptic/<run_dir>

# Visualize a run (omit --run-dir to pick the most recent run)
python tools/visualize.py --run-dir ./output/panoptic/<run_dir> --dataset panoptic

--config selects the dataset (panoptic, shelf, campus; mm_or ships as an optional, unpublished extra config that needs its own detections); --matching-solver is ilp (exact), bp (fast), or greedy (baseline). A CUDA GPU is recommended for triangulation and the BP solver.

Tip

Swap the solver, toggle video, or override any top-level config field ad hoc (cue parameters live in the YAML cues: block):

python tools/generate.py --config experiments/panoptic.yaml --split test --matching-solver bp --save-video --set lam=3.0 pixel_threshold=1024

Data

Evaluated on CMU Panoptic, Shelf, and Campus. Only the Panoptic test split runs out-of-the-box — its 2D detections are bundled; the other datasets need their own images and a detections pickle (see below). Download the Panoptic test images (several GB of HD video; the script defaults to the 4 test sequences and the 5 CMU0 cameras):

bash scripts/data/panoptic/0_get_data.sh   # download CMU Panoptic (test sequences, 5 cameras)
bash scripts/data/panoptic/1_extract.sh    # extract HD frames (needs ffmpeg)

Note

COMPOSE consumes per-frame, per-camera 2D detections (keypoints + bounding boxes, stored as a pickle) alongside each dataset's images and camera calibration. To use another dataset or your own data, place a detections pickle under data/ and point the split's detections_path in experiments/<dataset>.yaml at it.

Expected directory layout
data/
├── detections/
│   └── panoptic_test.pkl            # bundled via Git LFS
└── panoptic/
    └── <sequence>/
        ├── hdImgs/                  # 00_03/ 00_06/ 00_12/ 00_13/ 00_23/
        ├── hdPose3d_stage1_coco19/  # 3D ground truth
        └── calibration_<sequence>.json
Using your own 2D detections

A detections pickle is db[sequence][frame]["2D"][camera] -> dict, where camera is a "<node>_<panel>" string for Panoptic (e.g. "00_03") or an integer index otherwise. Each per-camera dict holds:

  • keypoints_xys: (M, J, 3) float — M detections, J joints, [x, y, confidence]
  • bbox_xywhs: (M, 5) float — [x, y, w, h, score]
  • (optional) <name>_features: (M, D) float — per-detection embeddings for a photometric cue (e.g. reid_features, dinov2_features)

J and the joint order follow the dataset's convention (compose/skeleton.py). Point the split's detections_path at your pickle. The scripts/data/<dataset>/ stage scripts produce these from raw images (detect → filter → optionally extract features).

Sanity-check a dataset's camera calibration with python tools/verify.py --dataset panoptic.

Configuration

Each dataset has one YAML in experiments/ with shared base parameters and a splits: section.

cues:
  - kind: geometric
    name: reprojection
    score_type: reprojection   # reprojection | epipolar | sampson | trifocal
    pixel_threshold: 1024.0    # px²; -1 disables geometric pre-filtering

Precedence is defaults < YAML base < YAML split < CLI flags, all merged by PipelineConfig.from_yaml. New cues are added by registering a class in compose.cues.CUE_REGISTRY.

Pipeline & key CLI flags

Each frame is processed by compose.pipeline.process_frame:

  1. Hypergraph construction: grow a $V$-partite hypergraph of person hypotheses hierarchically over the valid 2D detections (level-by-level pruning, no $O(N^V)$ enumeration).
  2. Cue scoring: rank every surviving hyperedge with a configurable cue chain (geometric, with optional photometric refinement).
  3. Correspondence matching: select a cover via ILP (exact), Belief Propagation (approximate), or a greedy baseline.
  4. Triangulation: algebraic SVD triangulation, Hungarian re-matching of leftover detections, and a final re-triangulation with per-joint worst-view dropout.
flag meaning
--config dataset YAML (required)
--split dataset split (only test is shipped)
--matching-solver ilp · bp · greedy
--cameras camera preset (e.g. CMU0, Shelf5)
--lam matching penalty per chosen hyperedge
--pixel-threshold geometric pre-filter (px²)
--save-video / --no-save-video toggle MP4 output
--set KEY=VALUE ... override any top-level config field
Python API

The installable compose package exposes a small, stable API:

from compose import PipelineConfig, load_dataset, build_cues, process_frame

config = PipelineConfig.from_yaml("experiments/panoptic.yaml", split="test")
dataset = load_dataset(config)
cues = build_cues(config.cues, num_views=config.V, device="cuda")
# process_frame(...) runs the full per-frame pipeline and returns the 3D prediction.

Citation

@article{wang2026compose,
  title   = {COMPOSE: Hypergraph Cover Optimization for Multi-view 3D Human Pose Estimation},
  author  = {Wang, Tony Danjun and Birdal, Tolga and Navab, Nassir and Bastian, Lennart},
  journal = {arXiv preprint arXiv:2601.09698},
  year    = {2026},
}

License

Released under the MIT License.

About

[arXiv 2601.09698] Implementation of COMPOSE

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors