SAM 3: Segment Anything with Concepts

Overview

SAM 3 is a unified foundation model for promptable segmentation in images and videos. It can detect, segment, and track objects using text or visual prompts such as points, boxes, and masks. Compared to its predecessor SAM 2, SAM 3 introduces the ability to exhaustively segment all instances of an open-vocabulary concept specified by a short text phrase or exemplars.

X-AnyLabeling supports SAM 3 in two deployment modes:

Mode	Backend	Visual Prompting	Speed
Server-side	X-AnyLabeling-Server (PyTorch)	Supported	Fast
Client-side	Local ONNX	Not supported	Slow

Server-side

Installation

Please refer to X-AnyLabeling-Server for download, installation, and server setup instructions.

Usage

Launch the X-AnyLabeling client, press Ctrl+A or click the AI button in the left menu bar to open the auto-labeling panel. In the model dropdown list, select Remote-Server, then choose Segment Anything 3.

Text Prompting

sam3-text-prompt.mp4

Enter object names in the text field (e.g., person, car, bicycle)
Separate multiple classes with periods or commas: person.car.bicycle or dog,cat,tree
Click Send to initiate detection

Visual Prompting

sam3-visual-prompt.mp4

Click +Rect or -Rect to activate drawing mode
Draw bounding boxes around target objects or regions of interest (use +Rect for positive prompts, -Rect for negative prompts)
Add multiple prompts for different object instances
Click Run Rect to process visual cues
Click Finish (or press f) to complete the object, enter the label category and confirm, or use Clear to remove all visual prompts

Client-side

The client-side path runs the full SAM 3 pipeline locally using ONNX Runtime — no server required. All three ONNX models (image encoder, language encoder, decoder) are loaded directly inside X-AnyLabeling.

Warning

Known limitations of the client-side mode:

Slow inference: The full ViT-H image encoder runs on CPU/GPU via ONNX. Encoding a single image typically takes several seconds, which is noticeably slower than the server-side PyTorch path.
Text prompts only: Visual prompting (points, boxes) is not supported in this mode. Only text-based grounding is available.

Installation

See the installation guide (English | Chinese) for environment setup details.

When you select the model for the first time, X-AnyLabeling will automatically download the six required files (three .onnx files plus their .onnx.data sidecars) to your local cache.

If your network connection is unstable, you can download them manually from the Model Zoo and place them in a local directory, then update the path fields in the config file as described in Model Configuration below.

Note

If you prefer to export the ONNX files yourself from the original PyTorch weights, see tools/onnx_exporter/export_sam3_onnx.py for the full setup instructions.

Model Configuration

The default config is at anylabeling/configs/auto_labeling/sam3_vit_h.yaml. If you place the model files in a custom directory, update the six path fields accordingly:

encoder_model_path:        /path/to/sam3_image_encoder.onnx
encoder_model_data_path:   /path/to/sam3_image_encoder.onnx.data
language_encoder_path:     /path/to/sam3_language_encoder.onnx
language_encoder_data_path: /path/to/sam3_language_encoder.onnx.data
decoder_model_path:        /path/to/sam3_decoder.onnx
decoder_model_data_path:   /path/to/sam3_decoder.onnx.data

You can also tune the following inference parameters:

Parameter	Default	Description
`conf_threshold`	`0.5`	Minimum score to keep a predicted mask
`epsilon`	`0.001`	Polygon approximation precision (smaller = finer contours)

Usage

Launch X-AnyLabeling, press Ctrl+A or click the AI button in the left menu bar to open the auto-labeling panel. In the model dropdown list, select Segment Anything 3 (ViT-H).

Text Prompting

Enter one or more object names in the text field (e.g., person, truck)
Separate multiple classes with commas or periods: person,truck or person.truck
Select the desired output mode: Polygon, Rectangle, or Rotation
Adjust Confidence and Mask Fineness as needed
Click Send to run inference

Note

The image embedding is cached after the first run on each image, so repeated queries on the same image skip the encoder and only re-run the language encoder and decoder.

Batch Processing

Press Ctrl+M to apply the current text prompt across all images in the folder.

Tip

Toggle Replace (On/Off) to control whether results overwrite existing annotations or are appended alongside them.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SAM 3: Segment Anything with Concepts

Overview

Server-side

Installation

Usage

Text Prompting

Visual Prompting

Client-side

Installation

Model Configuration

Usage

Text Prompting

Batch Processing

Uh oh!

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

SAM 3: Segment Anything with Concepts

Overview

Server-side

Installation

Usage

Text Prompting

Visual Prompting

Client-side

Installation

Model Configuration

Usage

Text Prompting

Batch Processing