🏹 ARTEMIS

Agent-guided Reliability-aware Temporal Mask Evolution for Imperfectly Supervised Video Polyp Segmentation

Tong Wang^1,2, Siwen Wang², Yaolei Qi¹, Jinxing Zhou², Yuting He³, Guanyu Yang¹, Yutong Xie²

¹ Southeast University, China
² Mohamed bin Zayed University of Artificial Intelligence, UAE
³ Case Western Reserve University, USA

Guanyu Yang and Yutong Xie are corresponding authors. Work was done when Tong Wang was visiting MBZUAI.

🚀 News

[2026/06] ARTEMIS is now available on arXiv.
[2026/06] Pseudo-label data, prediction maps, and model snapshots are released for reproducibility.
[2026/06] ARTEMIS repository page is initialized for the submission-stage manuscript.

📌 Abstract

Imperfectly supervised video polyp segmentation (VPS) aims to learn dense and temporally consistent masks from inexpensive supervision, including weak annotations such as points and scribbles, as well as semi-supervision with only a small subset of densely labeled frames. Although SAM2 can convert sparse or partial annotations into dense masks, direct pseudo labeling remains limited by geometry-degraded masks, underused temporal propagation, and reliability-blind supervision.

We propose ARTEMIS, a unified framework for imperfectly supervised VPS driven by agent-guided reliability-aware temporal mask evolution. ARTEMIS first initializes coarse masks from available supervision, then uses a debate-and-judge vision-language agent to select reliable temporal anchors under weak supervision. These anchors are propagated bidirectionally with SAM2 to refine unreliable or unlabeled frames. Finally, ARTEMIS trains the segmenter with temporal reliability-aware robust learning, including reliability-guided reference selection, a Reference Prototype Transport Module, and reliability-aware robust loss. Experiments on SUN-SEG and CVC-ClinicDB-612 under scribble, point, and limited-label settings demonstrate state-of-the-art performance.

✨ Highlights

Unified imperfect supervision. ARTEMIS handles weakly supervised and semi-supervised VPS in one complete-then-learn framework.
Agent-guided anchor selection. A debate-and-judge vision-language agent identifies reliable temporal anchors from noisy SAM2-generated masks.
Bidirectional temporal mask evolution. Reliable anchors are propagated forward and backward with SAM2 to complete sparse or missing annotations.
Reliability-aware robust learning. Reliability-guided reference selection, RPTM, and robust loss suppress residual pseudo-label noise while preserving difficult samples.

🏗️ Framework Overview

ARTEMIS follows a two-stage pipeline.

Stage 1: Agent-guided bidirectional mask evolution. Available point, scribble, or sparse dense labels are converted into temporally consistent pseudo masks through reliable anchor selection and SAM2-based propagation.
Stage 2: Temporal reliability-aware robust learning. The final segmenter is trained with reliability-guided reference selection, reference prototype transport, and reliability-aware robust supervision.

Stage 1: Reliable temporal anchors are selected and propagated bidirectionally to evolve pseudo masks.

Stage 2: Reliable reference identity is transported across frames and noisy supervision is down-weighted.

📊 Main Results

We evaluate ARTEMIS on SUN-SEG under weakly supervised and semi-supervised settings. The tables below report the complete SUN-SEG results of ARTEMIS across Easy/Hard and Seen/Unseen splits. Full comparisons with competing methods and ablation studies are provided in the paper.

🧪 Weakly Supervised SUN-SEG Results

Supervision	Split	Sα↑	Eφ↑	Fβ↑	Dice↑	IoU↑	MAE↓
Scribble	Easy-Seen	89.7	92.4	84.1	85.2	78.3	3.4
Scribble	Easy-Unseen	78.6	79.7	65.9	66.7	58.7	4.3
Scribble	Hard-Seen	84.5	87.8	77.3	78.8	71.3	7.4
Scribble	Hard-Unseen	79.6	81.6	67.5	68.7	60.9	4.8
Point	Easy-Seen	86.3	88.7	65.4	81.2	73.2	6.8
Point	Easy-Unseen	77.0	76.9	52.9	62.8	53.9	8.0
Point	Hard-Seen	81.5	84.0	59.2	74.8	66.0	10.0
Point	Hard-Unseen	77.0	78.6	52.7	64.4	55.5	8.0

🧬 Semi-supervised SUN-SEG Results

Supervision	Split	Sα↑	Eφ↑	Fβ↑	Dice↑	IoU↑	MAE↓
1/8 labeled	Easy-Seen	90.7	93.6	85.2	86.5	79.9	2.7
1/8 labeled	Easy-Unseen	79.4	81.1	67.1	68.2	60.4	4.8
1/8 labeled	Hard-Seen	86.2	90.1	78.2	80.0	72.2	4.5
1/8 labeled	Hard-Unseen	80.8	83.9	69.4	70.9	63.1	4.6
1/16 labeled	Easy-Seen	89.9	92.4	84.3	85.5	78.9	3.4
1/16 labeled	Easy-Unseen	78.3	79.8	66.0	66.9	58.9	4.9
1/16 labeled	Hard-Seen	84.5	87.9	77.4	79.1	71.5	7.6
1/16 labeled	Hard-Unseen	79.2	81.4	67.1	68.2	60.2	5.0

All values are percentages (%). Higher is better except MAE.

🖼️ Qualitative Results

Qualitative comparison under scribble supervision. ARTEMIS better preserves blurry, drifting, and small polyp regions.

Qualitative comparison under the 1/8 labeled training data setting. ARTEMIS reduces over-segmentation and under-segmentation for background-like polyps.

📂 Download Resources

We provide the processed pseudo-label data, prediction maps, and trained model snapshots via OneDrive. These resources are released for reproducibility and comparison while the source code remains unavailable during peer review.

🗂️ Dataset and Pseudo Labels

Resource	Description	Link
Dataset	Dataset package used by ARTEMIS experiments	Download
Scribble / Point	Weak annotation files	Download
Point2Mask	Coarse masks generated from point prompts	Download
Mask2Mask Forward	Forward-evolved pseudo masks	Download
Mask2Mask Forward Logit	Forward propagation logits	Download
Mask2Mask Backward	Backward-evolved pseudo masks	Download
Mask2Mask Backward Logit	Backward propagation logits	Download
Reliable Reference Cache	Cached reliability-guided reference information	Download

🖼️ Prediction Maps

Resource	Link
All Predictions	Download
ARTEMIS Scribble Predictions	Download
ARTEMIS Point Predictions	Download
ARTEMIS 1/8 Data Predictions	Download
ARTEMIS 1/16 Data Predictions	Download

🧩 Model Snapshots

Resource	Link
All Snapshots	Download
ARTEMIS Scribble Snapshot	Download
ARTEMIS Point Snapshot	Download
ARTEMIS 1/8 Data Snapshot	Download
ARTEMIS 1/16 Data Snapshot	Download

🛠️ Code Status

This repository is currently maintained for the submission-stage manuscript. Source code, training scripts, and testing scripts are not released during peer review.

The full code will be made available after acceptance.

📖 Citation

If you find ARTEMIS useful, please consider citing our work.

@article{wang2026artemis,
  title={ARTEMIS: Agent-guided Reliability-aware Temporal Mask Evolution for Imperfectly Supervised Video Polyp Segmentation},
  author={Wang, Tong and Wang, Siwen and Qi, Yaolei and Zhou, Jinxing and He, Yuting and Yang, Guanyu and Xie, Yutong},
  journal={arXiv preprint arXiv:2606.20161},
  year={2026}
}

📬 Contact

For questions about the paper or future code release, please contact tongwangnj@qq.com.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
figure		figure
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🏹 ARTEMIS

Agent-guided Reliability-aware Temporal Mask Evolution for Imperfectly Supervised Video Polyp Segmentation

🚀 News

📌 Abstract

✨ Highlights

🏗️ Framework Overview

📊 Main Results

🧪 Weakly Supervised SUN-SEG Results

🧬 Semi-supervised SUN-SEG Results

🖼️ Qualitative Results

📂 Download Resources

🗂️ Dataset and Pseudo Labels

🖼️ Prediction Maps

🧩 Model Snapshots

🛠️ Code Status

📖 Citation

📬 Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

🏹 ARTEMIS

Agent-guided Reliability-aware Temporal Mask Evolution for Imperfectly Supervised Video Polyp Segmentation

🚀 News

📌 Abstract

✨ Highlights

🏗️ Framework Overview

📊 Main Results

🧪 Weakly Supervised SUN-SEG Results

🧬 Semi-supervised SUN-SEG Results

🖼️ Qualitative Results

📂 Download Resources

🗂️ Dataset and Pseudo Labels

🖼️ Prediction Maps

🧩 Model Snapshots

🛠️ Code Status

📖 Citation

📬 Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages