Skip to content

Commit d0f53db

Browse files
committed
Refactor pose analysis and bone rotation prompts
Simplified and clarified `step_1_prompt_analyse_image_and_get_pose_description.md`: - Removed redundant comments and examples. - Expanded `OUTPUT` section with a `Notes` field and increased word limit. - Improved `LATERALITY LOCK` with conflict handling instructions. Enhanced `step_2_prompt_generate_bone_rotations.md`: - Added detailed workflows for torso, head/neck, arms, legs, and self-checks. - Clarified `INPUT` and `OUTPUT` sections for realistic and consistent results. - Updated `JOINT RANGE CONSTRAINTS` and added `COMMON ERRORS TO AVOID`. - Improved internal workflows for parsing pose text and mapping biomechanical terms. General formatting and consistency improvements across both files.
1 parent 1bc3448 commit d0f53db

2 files changed

Lines changed: 139 additions & 53 deletions

File tree

src/DesktopApp/ImageToPose.Desktop/Assets/step_1_prompt_analyse_image_and_get_pose_description.md

Lines changed: 2 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,3 @@
1-
# Before pasting into chat, delete the following comments section, including this line.
2-
# You need to fill out the USER-SET ANCHORS section as shown in the examples to ensure the model can best describe the pose correctly.
3-
# Fill USER-SET ANCHORS and they become ground truth
4-
# Examples:
5-
# Examples (collapsed by field):
6-
# Left hand=<at 10 o’clock on wheel; clasped with right before chest; clasped with right behind back; cradling infant; crossed over right biceps; forward swing; grabbing bar; holding book; holding crutch; holding leash; holding mic; holding suitcase; in coat pocket (occluded); in left pocket (occluded); lead glove forward; occluded (likely on hip); occluded behind torso (likely on hip); on abdomen; on backpack strap; on floor; on forearm support; on guitar neck (fretboard); on handlebar; on hip; on keyboard; on pushrim; on railing; on thigh (partly occluded); palms together overhead; pressed together with right at sternum; raised overhead (open palm); rear rack; resting on knee; resting on left cheek; resting on stair rail; side extension (T); tossing ball; wave at shoulder height>
7-
# Right hand=<adjusting scarf; at side; back swing; chin support; clasped with left; clasped with left behind back; crossed over abdomen; crossed under left arm; forward reach; free; free swing; gesturing outward; grabbing bar; guarding chin; handlebar; holding cup; holding phone at chest; holding tote; mid-swing; on backpack strap; on brake hood; on forearm support; on gear lever; on hip; on lap; on mouse; on phone taking selfie; on pillow; on right thigh; on thigh; over strings; palms together overhead; pointing at screen; pressed together with left; racket back; reaching overhead to shelf; supporting back; unknown (fully occluded by bag)>
8-
# Left foot=<on right thigh (tree pose)>
9-
# Right foot=<occluded; off frame; on floor>
10-
# Facing=<frontal; left; left-profile; prone; right-profile; supine; up-stairs; ¾ left; ¾ right>
11-
# Notes=<air squat; ankles dorsiflexed; arms folded; bicycle, seat-weighted; boxing stance; camera slightly above; dance arabesque, left leg back; dog outside frame; head tilt right; kneeling on right knee; lightly forward head; lying on back; mark right hand uncertain; mild spine extension; mirror present—do NOT flip laterality; mountain pose reach; on stage; phone blocks wrist view; plank position, feet unseen; power stance, feet wide; pull-up hang, feet off frame; road bike drops; running stride; seated at desk; seated driver; seated think pose; seated, elbow on table; shirt hides wrist; shoulders retracted; standing guitarist; standing relaxed; standing, slight forward lean; tennis serve prep; tree pose, left foot on right thigh; visor shadow may hide wrist angles; walking toward camera; walking with carry; weight biased to right leg; wheelchair, slight trunk flexion; winter wear covers neck>
12-
# Facing=uncertain (likely left)
13-
# Keep Facing to one of: frontal, left-profile, right-profile, ¾ left, ¾ right, supine, prone, or a brief context like up-stairs.
14-
# Use occluded, partly occluded, or unknown where visibility is limited.
151
ROLE (system)
162
You are a vision-and-kinematics analyst.
173

@@ -28,8 +14,8 @@ LATERALITY LOCK (MANDATORY)
2814
3) If [USER_ANCHORS] is present and seems to conflict with the image, write Check=FLAG and still use USER_ANCHORS, adding a one-clause note of the conflict.
2915

3016
OUTPUT (exactly two blocks)
31-
[ANCHORS] Left hand=?, Right hand=?, Left foot=?, Right foot?, Facing=?, Check=PASS|FLAG
32-
[POSE] One paragraph (4–6 sentences, ≤140 words) covering: camera/view; global stance & lean; head/neck (yaw/pitch/tilt); torso orientation/lean/side-bend; each shoulder/arm (flexion/abduction/protraction), elbow bend, forearm rotation; hands (gesture + palm orientation); hips/legs/knees/feet/ankles; brief occlusions/uncertainties.
17+
[ANCHORS] Left hand=?, Right hand=?, Left foot=?, Right foot?, Facing=?, Notes=?, Check=PASS|FLAG
18+
[POSE] One paragraph (4–6 sentences, ≤150 words) covering: camera/view; global stance & lean; head/neck (yaw/pitch/tilt); torso orientation/lean/side-bend; each shoulder/arm (flexion/abduction/protraction), elbow bend, forearm rotation; hands (gesture + palm orientation); hips/legs/knees/feet/ankles; brief occlusions/uncertainties.
3319

3420
RULES
3521
- Use the subject’s anatomical left/right only.

src/DesktopApp/ImageToPose.Desktop/Assets/step_2_prompt_generate_bone_rotations.md

Lines changed: 137 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -9,9 +9,9 @@ Given a precise pose text and, optionally, a reference photo to resolve ambiguit
99
- Rotate bones directly (no controllers). Do **not** add keyframes, change locations, scales, parenting, or constraints.
1010

1111
# INPUT
12-
1) POSE TEXT (PRIMARY authoritative): Between the markers below, the user inserts a concise pose paragraph. This is the main source of truth.
12+
1) POSE TEXT (PRIMARY authoritative): Between the markers below, the user inserts a concise pose paragraph. This is the main source of truth.
1313
POSE_TEXT_START
14-
(Insert pose paragraph here — anatomical left/right, global stance/lean, head/neck yaw/pitch/tilt, torso orientation/lean/side-bend, shoulders/arms/elbows/forearms/hands, hips/legs/knees/feet, occlusions/uncertainties.)
14+
{USER_POSE_DESCRIPTION_HERE}
1515
POSE_TEXT_END
1616

1717
2) OPTIONAL PHOTO: A single image of the same pose. Use only to **clarify** details that the POSE TEXT leaves ambiguous.
@@ -21,6 +21,7 @@ POSE_TEXT_END
2121
- a single top-level assignment: `POSE_DEGREES = { ... }`
2222
- No other text, comments, imports, printing, or helper code.
2323
- Each bone value is a list `[X, Y, Z]` in **degrees**. Use integers or floats.
24+
- **CRITICAL**: Avoid round numbers (0, 10, 20, 90) unless genuinely appropriate. Real poses use values like 18, -41, 72, 25.
2425
- If uncertain, put `[0, 0, 0]` (prefer 0 over guessing).
2526

2627
# LATERALITY LOCK (MANDATORY)
@@ -69,47 +70,146 @@ Legs/Feet
6970
# T-POSE COMPENSATION CRITICAL
7071
The MPFB rig's T-pose has inherent bends that must be compensated:
7172
- Elbows are slightly bent in T-pose: To achieve real T pose with a straight arm, set `lowerarm_*.X = -45` and adjust `upperarm_*.Z` (left: +45, right: -45)
73+
- **TRIGGERS for applying compensation**: "straight arm", "extended arm", "nearly straight", "elbow locked", "arm trailing behind", "arm hanging down", "arm at rest"
7274
- These compensations are ONLY applied when the pose requires straight/extended arms
7375

74-
# ANCHOR-DRIVEN ESTIMATION WORKFLOW (INTERNAL — do not output)
75-
1) Parse POSE TEXT into joints: global stance/lean; pelvis/torso; left/right shoulders/arms/elbows/forearms/hands; hips/knees/feet; head/neck.
76-
2) Determine facing/orientation; align verbs to anchors (e.g., "leans forward" → +X on spine; "turns head right" → **negative** Y if +Y is "turn left"). Flip signs to comply with anchors.
77-
3) Arms/legs: respect hinge-only joints (forearms, calves) → only X varies; set Y/Z=0.
78-
4) If POSE TEXT is ambiguous, consult the image to refine magnitudes/signs, but do not contradict anchors.
79-
5) Prefer small magnitudes when uncertain. Symmetry checks: contrapposto often implies pelvis/head counter-rotation.
80-
6) Clamp to plausible human ranges; if still unsure, use 0.
81-
7) Torso Analysis: Distribute forward/backward lean across spine segments. Pelvis often counter-balances upper body lean.
82-
8) Arm Straightness: If pose describes "straight arm" or "fully extended arm", apply T-pose compensation.
83-
84-
# SIGN MAP (INTERNAL—do not output). Convert biomechanical words to your axis signs:
85-
- Hip flexion ⇒ thigh_ X = negative* (leg forward).
86-
- Hip extension ⇒ thigh_ X = positive* (leg back).
87-
- Shoulder flexion (reach forward) ⇒ upperarm_ X = positive*.
88-
- Shoulder abduction/elevation ⇒ upperarm_ Y = positive*.
89-
- Lower/bring arm down: right: upperarm_r Z positive lowers; left: upperarm_l Z positive raises (per your meanings); use Z for small height trims.
76+
# ANCHOR-DRIVEN ESTIMATION WORKFLOW (INTERNAL – do not output)
77+
78+
## PHASE 1: TORSO ORIENTATION (CRITICAL)
79+
1. **Determine camera position** relative to subject (front, back, left side, right side, 3/4 view)
80+
2. **Identify torso facing direction** relative to camera:
81+
- If "views her from the right" or "right side" → subject faces away from camera (toward left)
82+
- If "views her from the left" or "left side" → subject faces toward camera (toward right)
83+
- If "front view" → subject faces camera
84+
3. **Map facing to Y-axis rotations**:
85+
- Subject turning LEFT (from subject's POV) = POSITIVE Y on spine/neck/head
86+
- Subject turning RIGHT (from subject's POV) = NEGATIVE Y on spine/neck/head
87+
- **DOUBLE CHECK**: Re-read camera position and verify Y signs match
88+
4. **Distribute spine rotations**:
89+
- Torso lean (forward/back) → use X-axis across spine_01, spine_02, spine_03
90+
- Torso twist/turn (left/right) → use Y-axis across spine_01, spine_02, spine_03
91+
- **NEVER leave all spine segments at [0,0,0]** unless truly standing perfectly straight
92+
- Example distribution: If leaning forward 15° total, use spine_01:X=10, spine_02:X=0, spine_03:X=-15 (can vary)
93+
- Example twist: If turning left 45°, use spine_02:Y=15, spine_03:Y=20
94+
95+
## PHASE 2: HEAD & NECK
96+
1. Parse head orientation relative to torso
97+
2. Apply Y-axis turn using **same sign convention** as torso
98+
3. Add X-axis for chin up/down
99+
4. Add Z-axis tilt only if explicitly mentioned
100+
101+
## PHASE 3: ARMS (THINK IN 3D)
102+
**CRITICAL**: Arms almost always need multi-axis rotations. Follow this checklist for each arm:
103+
104+
### For each arm, ask:
105+
1. **Height of hand** (high/low relative to shoulder):
106+
- Left arm: Use upperarm_l.Z (positive=up)
107+
- Right arm: Use upperarm_r.Y (positive=up) and upperarm_r.Z (negative=down)
108+
- Typical values: -45° to +135° for shoulder elevation
109+
110+
2. **Forward/back position of hand**:
111+
- Use upperarm.X (positive=forward)
112+
- Typical range: -30° to +90°
113+
114+
3. **Elbow bend angle**:
115+
- Straight/extended → lowerarm.X = -45 (T-pose compensation)
116+
- Slightly bent (~20°) → lowerarm.X = 15 to 25
117+
- 90° bend → lowerarm.X = 45 to 55
118+
- Fully bent → lowerarm.X = 90 to 135
119+
- **NEVER use negative values for bent elbows** (except -45 for straight)
120+
121+
4. **Hand orientation**:
122+
- Palm facing: Use hand.Y (clockwise turn)
123+
- Palm up/down: Use hand.Z (left=up, right=down)
124+
- Wrist bend: Use hand.X
125+
- **DON'T leave at [0,0,0]** if pose describes grip, gesture, or specific hand position
126+
127+
5. **Clavicle adjustment**:
128+
- Shoulder protracted (forward) → clavicle.X positive
129+
- Shoulder elevated (shrugged) → clavicle.Y/Z (varies by side)
130+
- Typical range: -15° to +30°
131+
132+
### Common arm pose patterns:
133+
- **Reaching forward horizontally**: upperarm.X=60-80, upperarm.Y/Z=small, lowerarm.X=-45 to 35
134+
- **Reaching up to head/face**: upperarm.X=small, upperarm.Y=40-60 (right) or upperarm.Z=45-90 (left), lowerarm.X=45-90
135+
- **Arm at side relaxed**: upperarm=[0,0,±45], lowerarm.X=-45
136+
- **Arm behind body**: upperarm.X=-20 to -40, lowerarm.X=-45
137+
138+
## PHASE 4: LEGS
139+
1. **Hip flexion** (thigh forward):
140+
- Negative X = leg forward
141+
- Typical range: -135° (high knee) to -10° (slight forward)
142+
- **Don't over-flex**: -90° is extreme, most poses use -30° to -60°
143+
144+
2. **Hip abduction** (leg sideways):
145+
- Positive Z = leg moves outward
146+
- Typical range: -10° to +30°
147+
- **Don't ignore this**: Seated poses often have 15-25° abduction
148+
149+
3. **Knee bend**:
150+
- 0° = straight (locked knee)
151+
- Positive X = bent
152+
- Typical values: 5° (nearly straight), 30° (slight), 60° (moderate), 90-100° (sharp), 135° (full)
153+
- **Match thigh flexion**: If thigh=-80°, calf should be 70-100° to keep foot reasonable
154+
155+
4. **Foot/ankle**:
156+
- Small adjustments for plantarflexion (X) and rotation (Y/Z)
157+
- Usually -10° to +15°
158+
159+
## PHASE 5: SELF-CHECK (MANDATORY BEFORE OUTPUT)
160+
Run these checks on your estimated dictionary:
161+
162+
1. **Spine Check**: At least ONE spine segment has non-zero value (unless perfect T-pose)
163+
2. **Torso-Head Coherence**: If spine turns left (+Y), head should also turn left or be neutral
164+
3. **Camera-Face Alignment**:
165+
- Right side camera view + "facing camera" → head/spine should turn left (+Y)
166+
- Left side camera view + "facing camera" → head/spine should turn right (-Y)
167+
4. **Multi-Axis Arms**: Each upperarm should have at least TWO non-zero axes (unless truly T-pose)
168+
5. **Elbow Sign Check**:
169+
- Straight arm → lowerarm = -45
170+
- Bent arm → lowerarm = POSITIVE (never negative except -45)
171+
6. **Hand Orientation**: If pose describes hand position/grip → hand should NOT be [0,0,0]
172+
7. **Leg Abduction**: If seated, straddling, or wide stance → thigh.Z should be non-zero
173+
8. **Magnitude Reality**: Values should vary (avoid all 0, 10, 20, 90 patterns)
174+
9. **Hinge Compliance**: lowerarm and calf have Y=0, Z=0 always
175+
10. **Range Validation**: Check values against limits below
90176

91177
# JOINT RANGE CONSTRAINTS
92178
Respect these anatomical limits unless pose explicitly exceeds them:
93-
- Spine forward lean: `spine_*.X` typically ≤ 30° each
94-
- Head rotation: `head.Y` from -80° to +80° (right/left), typically ≤ +-30°
95-
- Shoulder elevation: `upperarm_l.Z` from -45° to +135°, `upperarm_r.Z` from +45° to -135° where +-45° is the the lowest position and +-135° is the highest
96-
- Elbow extension: `lowerarm_*.X` from -45° to 135° (-45° = straight, 135° = fully bent)
97-
- Hip rotation: `thigh_*.X` from -135° to +30° (forward/back)
98-
- Knee extension: `calf_*.X` from 0° to +135° (0° = straight, 135° = fully bent)
99-
100-
# SELF-CHECK GATES (must pass before output)
101-
- **Anchor Sign Check**: Each non-zero angle's sign matches the anchor meanings above.
102-
- **Left/Right Consistency**: No inadvertent mirroring; bone names match anatomical sides in POSE TEXT.
103-
- **Hinge Compliance**: `lowerarm_*` and `calf_*` have non-zero X only; their Y/Z = 0.
104-
- **Ball Joints**: `ball_*` remain `[0,0,0]` unless the rig truly requires toe bend (default to 0).
105-
- **Global Coherence**: Pelvis/torso/head directions make sense together (e.g., if pelvis Y=+ left turn, head may counter-rotate per POSE TEXT).
106-
- **Shoulder Elevation Guardrail**: Unless POSE TEXT explicitly says "abducted/raised/elevated," constrain clavicle_ Y ∈ [-5°, +5°]* and upperarm_ Y ∈ [-5°, +5°]**.
107-
- **Arm Straightness Check**: If pose implies straight arms, verify `lowerarm_*.X = -45` and corresponding `upperarm_*.Z` adjustment
108-
- **T-pose Compensation**: Remember T-pose has inherent elbow bend requiring compensation for straight arms
109-
- **Range Validation**: All joint angles within anatomical limits specified in JOINT RANGE CONSTRAINTS
179+
- Spine forward/back lean: `spine_*.X` typically -30° to +30° each
180+
- Spine turn: `spine_*.Y` typically 0° to +45° for visible rotation
181+
- Head rotation: `head.Y` from -80° to +80° (right/left)
182+
- Shoulder elevation:
183+
- `upperarm_l.Z` from -45° (lowest) to +135° (overhead)
184+
- `upperarm_r.Y` from -45° to +135°, `upperarm_r.Z` from +45° (lowest) to -135° (overhead)
185+
- Elbow: `lowerarm_*.X` from -45° (straight) to 135° (fully bent)
186+
- Hip flexion: `thigh_*.X` from -135° (knee to chest) to +30° (leg back)
187+
- Hip abduction: `thigh_*.Z` typically -10° to +30°
188+
- Knee: `calf_*.X` from 0° (straight) to +135° (fully bent)
189+
190+
# COMMON ERRORS TO AVOID
191+
1. ❌ Setting all spine segments to [0,0,0]
192+
2. ❌ Using only one axis for arm rotations
193+
3. ❌ Getting head turn direction backwards (re-check camera position)
194+
4. ❌ Forgetting T-pose compensation for straight arms
195+
5. ❌ Leaving hands at [0,0,0] when they have specific grips
196+
6. ❌ Over-flexing legs (don't default to -90° hip flexion)
197+
7. ❌ Ignoring hip abduction (Z-axis) in wide or seated poses
198+
8. ❌ Using negative lowerarm values for bent elbows
199+
9. ❌ Using only round numbers (0, 10, 20, 90)
200+
10. ❌ Forgetting clavicle rotations when shoulders are protracted/elevated
110201

111202
# OUTPUT RULES (HARD)
112203
- Respond with **one** fenced code block only, language tag `python`.
113204
- Inside it, output exactly:
114-
POSE_DEGREES = { ... } # with the keys from the BONE LIST above
115-
- No extra text before/after, no comments, no prints, no placeholders.
205+
```python
206+
POSE_DEGREES = {
207+
"pelvis": [X, Y, Z],
208+
# ... all 22 bones
209+
}
210+
```
211+
- No extra text before/after, no comments inside the dict, no prints, no placeholders.
212+
- Use realistic angle values (not just 0, 10, 20, 90).
213+
- Every bone must be present with a [X, Y, Z] list.
214+
215+
Generate the pose following the enhanced workflow above.

0 commit comments

Comments
 (0)