High-performance unified API for real-time person detection and pose estimation.
PoseDetector combines YOLO12 object detection with RTMW pose estimation in a single, optimized interface. Designed for speed and ease of use with convenient methods for web elements.
Models are loaded automatically from HuggingFace if not specified.
npm install rtmlib-tsimport { PoseDetector } from 'rtmlib-ts';
// Initialize with default models from HuggingFace
const detector = new PoseDetector();
await detector.init();
const canvas = document.getElementById('canvas') as HTMLCanvasElement;
const people = await detector.detectFromCanvas(canvas);import { PoseDetector } from 'rtmlib-ts';
const detector = new PoseDetector({
detModel: 'https://huggingface.co/demon2233/rtmlib-ts/resolve/main/yolo/yolov12n.onnx',
poseModel: 'https://huggingface.co/demon2233/rtmlib-ts/resolve/main/rtmpose/end2end.onnx',
});
await detector.init();
const canvas = document.getElementById('canvas') as HTMLCanvasElement;
const people = await detector.detectFromCanvas(canvas);const video = document.getElementById('video') as HTMLVideoElement;
const people = await detector.detectFromVideo(video);const img = document.getElementById('image') as HTMLImageElement;
const people = await detector.detectFromImage(img);const fileInput = document.getElementById('file') as HTMLInputElement;
fileInput.addEventListener('change', async (e) => {
const file = (e.target as HTMLInputElement).files?.[0];
if (file) {
const people = await detector.detectFromFile(file);
}
});const stream = await navigator.mediaDevices.getUserMedia({ video: true });
const video = document.querySelector('video');
video.srcObject = stream;
video.addEventListener('play', async () => {
const people = await detector.detectFromVideo(video);
});new PoseDetector(config?: PoseDetectorConfig)Configuration Options:
| Option | Type | Default | Description |
|---|---|---|---|
detModel |
string |
optional | Path to YOLO12 detection model |
poseModel |
string |
optional | Path to RTMW pose model |
detInputSize |
[number, number] |
[416, 416] |
Detection input size |
poseInputSize |
[number, number] |
[384, 288] |
Pose input size |
detConfidence |
number |
0.5 |
Detection confidence threshold |
nmsThreshold |
number |
0.45 |
NMS IoU threshold |
poseConfidence |
number |
0.3 |
Keypoint visibility threshold |
backend |
'wasm' | 'webgpu' |
'wasm' |
Execution backend |
cache |
boolean |
true |
Enable model caching |
If detModel and poseModel are not specified, the following default models are used:
- Detector:
https://huggingface.co/demon2233/rtmlib-ts/resolve/main/yolo/yolov12n.onnx - Pose:
https://huggingface.co/demon2233/rtmlib-ts/resolve/main/rtmpose/end2end.onnx
Initialize both detection and pose models.
await detector.init();Detect poses from HTMLCanvasElement.
async detectFromCanvas(canvas: HTMLCanvasElement): Promise<Person[]>Detect poses from HTMLVideoElement (for real-time video processing).
async detectFromVideo(
video: HTMLVideoElement,
targetCanvas?: HTMLCanvasElement
): Promise<Person[]>Detect poses from HTMLImageElement.
async detectFromImage(
image: HTMLImageElement,
targetCanvas?: HTMLCanvasElement
): Promise<Person[]>Detect poses from File object (for file uploads).
async detectFromFile(
file: File,
targetCanvas?: HTMLCanvasElement
): Promise<Person[]>Detect poses from Blob (for camera capture or downloads).
async detectFromBlob(
blob: Blob,
targetCanvas?: HTMLCanvasElement
): Promise<Person[]>Low-level method for raw image data.
async detect(
imageData: Uint8Array,
width: number,
height: number
): Promise<Person[]>Release resources and models.
detector.dispose();interface Person {
bbox: {
x1: number;
y1: number;
x2: number;
y2: number;
confidence: number;
};
keypoints: Keypoint[];
scores: number[];
}interface Keypoint {
x: number;
y: number;
score: number;
visible: boolean;
name: string;
}Keypoint Names (COCO17):
0. nose
left_eyeright_eyeleft_earright_earleft_shoulderright_shoulderleft_elbowright_elbowleft_wristright_wristleft_hipright_hipleft_kneeright_kneeleft_ankleright_ankle
Performance statistics attached to results:
interface PoseStats {
personCount: number;
detTime: number; // Detection time (ms)
poseTime: number; // Pose estimation time (ms)
totalTime: number; // Total processing time (ms)
}Access via: (people as any).stats
const detector = new PoseDetector({
detModel: 'https://huggingface.co/demon2233/rtmlib-ts/resolve/main/yolo/yolov12n.onnx',
poseModel: 'https://huggingface.co/demon2233/rtmlib-ts/resolve/main/rtmpose/end2end.onnx',
backend: 'webgpu', // Faster than WASM
});Smaller input sizes = faster inference:
// Fast (lower accuracy)
const detector = new PoseDetector({
detInputSize: [416, 416],
poseInputSize: [256, 192],
});
// Balanced
const detector = new PoseDetector({
detInputSize: [640, 640],
poseInputSize: [384, 288],
});Higher thresholds = fewer detections but faster:
const detector = new PoseDetector({
detConfidence: 0.6, // Skip low-confidence detections
poseConfidence: 0.4, // Only show confident keypoints
});// ❌ Don't create new detector for each frame
const detector = new PoseDetector(config);
// ✅ Reuse same instance
for (const frame of videoFrames) {
const people = await detector.detect(frame.data, frame.width, frame.height);
}// Process images sequentially with same detector
const detector = new PoseDetector(config);
await detector.init();
const results = await Promise.all(
images.map(img => detector.detect(img.data, img.width, img.height))
);import { PoseDetector } from 'rtmlib-ts';
async function main() {
// Initialize
const detector = new PoseDetector({
detModel: 'https://huggingface.co/demon2233/rtmlib-ts/resolve/main/yolo/yolov12n.onnx',
poseModel: 'https://huggingface.co/demon2233/rtmlib-ts/resolve/main/rtmpose/end2end.onnx',
detInputSize: [640, 640],
poseInputSize: [384, 288],
detConfidence: 0.5,
nmsThreshold: 0.45,
poseConfidence: 0.3,
backend: 'wasm',
});
await detector.init();
// Load image
const response = await fetch('image.jpg');
const blob = await response.blob();
const imageBitmap = await createImageBitmap(blob);
const canvas = document.createElement('canvas');
canvas.width = imageBitmap.width;
canvas.height = imageBitmap.height;
const ctx = canvas.getContext('2d')!;
ctx.drawImage(imageBitmap, 0, 0);
const imageData = ctx.getImageData(0, 0, canvas.width, canvas.height);
const data = new Uint8Array(imageData.data);
// Detect
const startTime = performance.now();
const people = await detector.detect(data, canvas.width, canvas.height);
const endTime = performance.now();
// Print stats
const stats = (people as any).stats;
console.log(`Detected ${stats.personCount} people in ${stats.totalTime}ms`);
console.log(` Detection: ${stats.detTime}ms`);
console.log(` Pose: ${stats.poseTime}ms`);
// Draw results
people.forEach((person, i) => {
// Draw bounding box
ctx.strokeStyle = `hsl(${i * 60}, 80%, 50%)`;
ctx.lineWidth = 2;
ctx.strokeRect(
person.bbox.x1,
person.bbox.y1,
person.bbox.x2 - person.bbox.x1,
person.bbox.y2 - person.bbox.y1
);
// Draw keypoints
person.keypoints.forEach(kp => {
if (!kp.visible) return;
ctx.fillStyle = '#00ff00';
ctx.beginPath();
ctx.arc(kp.x, kp.y, 4, 0, Math.PI * 2);
ctx.fill();
});
});
// Cleanup
detector.dispose();
}
main();| Browser | Version | Backend |
|---|---|---|
| Chrome | 94+ | WASM, WebGPU |
| Edge | 94+ | WASM, WebGPU |
| Firefox | 95+ | WASM |
| Safari | 16.4+ | WASM |
Typical inference times on M1 MacBook Pro:
| Configuration | Detection | Pose (per person) | Total (3 people) |
|---|---|---|---|
| WASM, 640×640 | 80ms | 25ms | 155ms |
| WASM, 416×416 | 40ms | 15ms | 85ms |
| WebGPU, 640×640 | 30ms | 10ms | 60ms |
- Ensure models are accessible via HTTP (not
file://protocol) - Use a local server:
python -m http.server 8080 - Check CORS headers
- Switch to WebGPU backend if available
- Reduce input sizes
- Increase confidence thresholds
- Process every Nth frame instead of all frames
- Lower
detConfidencethreshold - Ensure person is visible and reasonably sized
- Check image format (RGB, not grayscale)
Apache 2.0