Interactive animations explaining core concepts in robotics and embodied intelligence.
Live site: arpitg1304.github.io/embodied-ai-visuals
| Animation | Category | Description |
|---|---|---|
| VLA Model Explainer | Perception | Step-by-step walkthrough of Vision-Language-Action models — from camera input to robot action output |
| Sim-to-Real Gap Explainer | Learning | Why sim-trained policies fail in the real world and how domain randomization bridges the gap |
| Reward Shaping — Sparse vs Dense | Learning | How reward function design shapes learning — sparse rewards, dense gradients, and potential-based shaping |
| Video Action Models & Latent Space | Learning | How video-conditioned policies use temporal context and latent space predictions to generate robot actions |
| Diffusion Policy | Learning | How denoising diffusion refines random noise into smooth action trajectories, handling multimodal demonstrations |
| Flow Matching | Learning | How flow matching learns straight-line velocity fields to transport noise into action distributions — a faster, simpler alternative to diffusion |
| World Models — Predict Before You Act | Learning | How robots imagine multiple futures in latent space, score each outcome, and pick the best action before moving |
| Learning Physics from Video | Learning | How watching billions of internet videos teaches robots gravity, collisions, and object permanence — no physics engine required |
| Action Chunking — Predict Trajectories, Not Steps | Learning | Why modern robot policies predict K actions at once — the secret behind smooth motion in ACT, Diffusion Policy, and π0 |
| SLAM — Mapping the Unknown | Perception | How robots simultaneously build a map and figure out where they are — the chicken-and-egg problem at the heart of navigation |
- No dependencies — pure HTML/CSS/JS, zero build step
- Dark themed — easy on the eyes
- Embeddable — copy iframe embed code for any animation to use in your blog or slides
- Mobile friendly — responsive layout that works on any device
- Auto-deploy — push to
mainand GitHub Actions deploys to Pages
Planned animations, roughly ordered by pedagogical flow. Contributions welcome!
- Visual Encoders Compared — CNN vs ViT vs DINOv2: how each architecture turns pixels into features, and why foundation vision models changed robotics
- Point Cloud Processing — From raw depth sensor → voxel grid → PointNet features. Show how 3D understanding feeds into grasp planning
- Spatial Action Maps — How pixel-space affordance maps let robots decide where to act directly from images
- World Models — Predict Before You Act — The core idea: robot observes state, imagines multiple futures in latent space, scores each outcome, and picks the best action
- Learning Physics from Video — How watching internet-scale video teaches robots gravity, collisions, and object permanence — no physics engine needed
- Closed-Loop World Model Control — The real-time observe → predict → act → re-observe cycle — how continuous re-planning handles the unexpected
- MPC vs Learned Policies — Model Predictive Control re-plans every step; a learned policy runs open-loop. Animated side-by-side on the same task
- Inverse Kinematics Explained — Given a target end-effector pose, how the robot solves for joint angles — Jacobian, gradient descent, singularities
- Task and Motion Planning (TAMP) — High-level symbolic plan ("pick → place → stack") grounded into continuous motion trajectories
- Behavior Trees vs FSMs — Two paradigms for structuring robot decision-making, animated with a pick-and-place example
- Imitation Learning Pipeline — Human demo → trajectory encoding → policy distillation. Show how a few demonstrations become a generalizable skill
- Diffusion Policy — Denoising process that iteratively refines random noise into a smooth action trajectory — the key insight behind diffusion-based robot control
- Hindsight Experience Replay — Failed trajectories relabeled with achieved goals — turning failures into training signal
- Curriculum Learning for Manipulation — Progressively harder tasks: reach → touch → grasp → lift → stack
- Multi-Robot Task Allocation — How a team of robots divides tasks using auction-based or graph-based coordination
- Human-Robot Handoff — Timing, grip force negotiation, and intent prediction during object handovers
- Safe RL with Constraints — Reward maximization plus constraint satisfaction — how robots learn to be both capable and safe
- Failure Detection & Recovery — How robots monitor execution, detect anomalies, and trigger recovery behaviors in real-time
- Copy the template:
cp animations/_template.html animations/your_name.html - Build your animation using the CSS variable contract for consistent theming
- Register it in the
ANIMATIONSarray inindex.html - Push — the site updates automatically
See CONTRIBUTING.md for the full guide.
python3 -m http.server 8000Then open http://localhost:8000.
