Rex-Omni Example

Introduction

Rex-Omni is a 3B-parameter Multimodal Large Language Model (MLLM) that redefines object detection and a wide range of other visual perception tasks as a simple next-token prediction problem.

Here, we will show you how to use Rex-Omni on X-AnyLabeling to perform various vision tasks.

Let's get started!

Installation

You'll need to get X-AnyLabeling-Server up and running first. Check out the installation guide for the details. Make sure you're running at least v0.0.5 of the server and v3.3.6 of the X-AnyLabeling client, otherwise you might run into compatibility issues.

Once that's done, head over to configs/models.yaml and enable rexomni. There's an example config you can reference if you're not sure how to set it up.

You can tweak the settings in rexomni.yaml to fit your needs. By default, the backend is set to "transformers". For faster inference, we recommend configuring backend: "vllm" and attn_implementation: "flash_attention_2" to accelerate model inference.

Note

You need to install compatible versions of vllm and flash-attn packages separately if you want to use these acceleration options.

Getting Started

Rex-Omni supports multiple vision tasks. Select the desired task from the task dropdown in the X-AnyLabeling interface.

Object Detection

The Detection task detects objects and returns bounding boxes based on text prompts.

Usage:

Select the "Detection" task from the task dropdown
Enter object categories in the text input field (use dots to separate multiple classes, e.g., person.car.bicycle)
Click the "Send" button to run inference

Keypoint Detection

Rex-Omni supports two keypoint detection modes:

Keypoint (Person/Hand)

Detects human/hand keypoints with skeleton visualization. No text prompt required.

Usage:

Select the "Keypoint (Person/Hand)" task from the task dropdown
Click the "Run" button to detect person keypoints

Keypoint (Animal)

Detects animal keypoints with skeleton visualization. Requires a text prompt specifying the animal category.

Usage:

Select the "Keypoint (Animal)" task from the task dropdown
Enter a single animal category (e.g., cat, dog, horse) or animal to detect all animal types
Click the "Send" button to run inference

Optical Character Recognition (OCR)

Rex-Omni provides four OCR modes with different output formats:

OCR Box (Word Level)

Word-level text detection and recognition in bounding boxes.

Usage:

Select the "OCR Box (Word Level)" task from the task dropdown
Click the "Run" button to detect and recognize text at word level

OCR Box (Text Line Level)

Text line-level text detection and recognition in bounding boxes.

Usage:

Select the "OCR Box (Text Line Level)" task from the task dropdown
Click the "Run" button to detect and recognize text at line level

OCR Polygon (Word Level)

Word-level text detection and recognition in polygon shapes.

Usage:

Select the "OCR Polygon (Word Level)" task from the task dropdown
Click the "Run" button to detect and recognize text at word level with polygon shapes

OCR Polygon (Text Line Level)

Text line-level text detection and recognition in polygon shapes.

Usage:

Select the "OCR Polygon (Text Line Level)" task from the task dropdown
Click the "Run" button to detect and recognize text at line level with polygon shapes

Pointing

The Pointing task points to objects based on text descriptions.

Usage:

Select the "Pointing" task from the task dropdown
Enter an object description in the text input field
Click the "Send" button to run inference

Visual Prompting

The Visual Prompting task finds similar objects based on visual examples (reference boxes).

Note

This task requires interactive rectangle prompts and does not support batch processing.

Usage:

Select the "Visual Prompting" task from the task dropdown
Click the "Add Positive Rect" button to add reference bounding boxes
Draw rectangles around example objects you want to find
Click the "Run Rect" button to detect similar objects
Use "Clear" to remove all prompts or "Finish Object" to complete the current annotation

Tip

All tasks except Visual Prompting support batch processing. You can run inference on the entire dataset of the current task with a single click using the batch processing feature in X-AnyLabeling.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Rex-Omni Example

Introduction

Installation

Getting Started

Object Detection

Keypoint Detection

Keypoint (Person/Hand)

Keypoint (Animal)

Optical Character Recognition (OCR)

OCR Box (Word Level)

OCR Box (Text Line Level)

OCR Polygon (Word Level)

OCR Polygon (Text Line Level)

Pointing

Visual Prompting

Uh oh!

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Rex-Omni Example

Introduction

Installation

Getting Started

Object Detection

Keypoint Detection

Keypoint (Person/Hand)

Keypoint (Animal)

Optical Character Recognition (OCR)

OCR Box (Word Level)

OCR Box (Text Line Level)

OCR Polygon (Word Level)

OCR Polygon (Text Line Level)

Pointing

Visual Prompting