This repo contains an end-to-end benchmark for wearable event detection on XRF V2 (2025), with a practical sensor setup: earbuds + smart glasses.
- Defines a deploy-oriented event task: Mobility Transition Presence
- Trains and evaluates lightweight models on profile-restricted sensors
- Calibrates operating points under a false-positive budget (
FP/hour) - Produces machine-readable artifacts for every run
- Positive labels:
Walking (24),Standing Up (26),Lying Down (27) - Input: time-series windows from XRF V2 modalities (
airpods, split IMU receivers, optional Wi-Fi) - Output: event triggers + profile-level metrics + calibrated threshold/cooldown
- Main decision metric:
sample_presenceF1 withFP/hour <= 10
The scope is intentionally narrow: this is a reliability-first event track, not a broad activity suite.
| Profile | Sensors | Purpose |
|---|---|---|
earbuds_glasses |
airpods + imu_gl |
default profile |
glasses_only |
imu_gl |
fallback profile |
all_imu |
all IMU streams | diagnostic upper bound |
wifi_all |
Wi-Fi + all IMU | non-product upper bound |
pip install -e ".[dev]"
xrfv2-edge-tal event-train --config configs/event_presence_mobility.yaml --adapter dummy
xrfv2-edge-tal event-eval --config configs/event_presence_mobility.yaml --adapter dummy --checkpoint runs/<train_run_id>/checkpoints/last.npz --profiles earbuds_glasses,glasses_onlyXRF V2 is not redistributed in this repository.
Expected local layout:
data/raw/xrfv2_kaggle/
train_data.h5
train_label.json
test_data.h5
test_label.json
info.json
Canonical handling in this repo:
imuis exposed asimu_gl,imu_lh,imu_rh,imu_lp,imu_rpairpodsis reduced to 6 channels (acc + rot)
Reference runs:
- train:
runs/20260227_021605_0bc9e9f1 - eval:
runs/20260227_030049_5a32e2cf - calibrate:
runs/20260227_030614_5a32e2cf
| Profile | F1 (%) | Precision (%) | Recall (%) | FP/hour | Threshold | Cooldown(s) |
|---|---|---|---|---|---|---|
earbuds_glasses |
61.17 | 80.14 | 49.46 | 6.77 | 0.835 | 0.0 |
glasses_only |
58.16 | 84.14 | 44.33 | 5.67 | 0.900 | 0.0 |
| Profile | Sample F1 (%) | Precision (%) | Recall (%) | FP/hour |
|---|---|---|---|---|
earbuds_glasses |
61.17 | 80.14 | 49.46 | 6.77 |
glasses_only |
71.81 | 75.20 | 68.71 | 12.51 |
glasses_only looks higher on raw F1, but misses the FP/hour budget at default threshold.
The calibrated table above is the deploy decision table.
Additional signal (earbuds_glasses): within_segment F1 = 40.52, onset_strict F1 = 0.13.
Detailed ledger: docs/event/results_latest.md.
python scripts/reproduce_full_run.py \
--config configs/event_presence_mobility.yaml \
--adapter xrfv2 \
--data-root data/raw/xrfv2_kaggle \
--seed 42 \
--train-profile earbuds_glasses \
--profiles earbuds_glasses,glasses_only \
--train-device auto \
--eval-device autoGenerated manifests:
runs/repro_full_latest.jsonruns/<calibrate_run_id>/repro_manifest.json
For closest numeric repeatability across machines, use --train-device cpu --eval-device cpu.
xrfv2-edge-tal inspect --adapter xrfv2 --data-root data/raw/xrfv2_kaggle --list-modalities --show-shapes
xrfv2-edge-tal event-train \
--config configs/event_presence_mobility.yaml \
--adapter xrfv2 \
--data-root data/raw/xrfv2_kaggle \
--profile earbuds_glasses \
--override train.max_train_samples=0
xrfv2-edge-tal event-eval \
--config configs/event_presence_mobility.yaml \
--adapter xrfv2 \
--data-root data/raw/xrfv2_kaggle \
--checkpoint runs/<train_run_id>/checkpoints/last.npz \
--profiles earbuds_glasses,glasses_only \
--override eval.max_eval_samples=0
xrfv2-edge-tal event-calibrate \
--config configs/event_presence_mobility.yaml \
--adapter xrfv2 \
--data-root data/raw/xrfv2_kaggle \
--checkpoint runs/<train_run_id>/checkpoints/last.npz \
--profiles earbuds_glasses,glasses_only \
--metric-mode sample_presence \
--fp-hour-budget 10 \
--override eval.max_eval_samples=0Solid:
- Reproducible run trail with structured artifacts
- Clear profile separation (default + fallback + upper bounds)
- Budget-based calibration instead of raw threshold reporting
Still improving:
onset_strictis low for this sensor/task setup- recall can be higher at the chosen FP budget
- richer semantic events remain harder with wearable-only inputs
- deploy spec:
docs/event/mobility_transition_spec.md - dataset notes:
docs/dataset_xrfv2.md - artifact contract:
docs/artifact_contract.md