Add isaaclab_arena_openpi package for pi0 droid eval#663
Conversation
b4206e0 to
e11720c
Compare
There was a problem hiding this comment.
Review: isaaclab_arena_openpi Package
Thanks for adding this openpi integration! The overall architecture is clean and follows the established patterns from isaaclab_arena_gr00t. I've reviewed the implementation and have a few observations:
✅ Strengths
-
Clean separation of concerns - Wire-format constants and variant table in
pi0_droid_config.py, execution logic in the policy module. This makes maintenance straightforward when upstream changes. -
Well-documented - The README thoroughly explains the layout, installation, variant table, and rationale for not bundling an environment. The inline docstrings in the policy are also helpful.
-
Comprehensive unit tests - 13 tests covering wire format, chunk caching, binarisation, clipping, reconnect logic, and edge cases like short server responses. The stubbing approach is clever and allows testing without heavy dependencies.
-
Defensive chunking - The cache-and-advance pattern with explicit flush on reconnect is a sensible approach for a single-env scenario.
🔍 Observations / Questions
1. Broad exception handling in reconnect logic
_call_server_with_retry catches OSError in addition to the websocket-specific exceptions:
retryable_exceptions = (
websockets.exceptions.ConnectionClosedError,
websockets.exceptions.ConnectionClosedOK,
OSError,
)OSError is quite broad and could potentially mask unrelated errors (e.g., file permission issues if any file I/O is involved). Consider narrowing to specific socket errors like ConnectionRefusedError and ConnectionResetError, or at least logging which specific exception triggered the retry.
2. Missing validation for numeric edge cases
_unpack_pi0_response validates shape but doesn't check for NaN/Inf values which could propagate silently to the action space:
def _unpack_pi0_response(self, server_response: dict[str, Any]) -> np.ndarray:
action_chunk = np.asarray(server_response[WIRE_ACTIONS])
# ... shape validation onlyA quick np.isfinite check could catch malformed server responses early.
3. Test coverage gap: empty task description
set_task_description raises ValueError for empty/None input, but this path isn't covered by tests:
def set_task_description(self, task_description: str | None) -> str:
if not task_description:
raise ValueError(...)Consider adding a quick test for this.
4. Minor: Hardcoded gripper binarisation threshold
The 0.5 threshold is noted as matching upstream checkpoint behavior, but if a future checkpoint changes this, it would require a code change. Could be added to Pi0Variant for extensibility, though this may be over-engineering for now.
📝 Note on PR Description
The PR description mentions changes to PolicyBase.from_args and PolicyBase.add_args_to_parser with derived defaults, but those changes don't appear in this diff. Is that intentional for a follow-up, or did they get dropped?
Overall this is solid work for a draft. Once the end-to-end tests against a live openpi server are validated, this should be good to go. 👍
a10bde1 to
ebc820a
Compare
Adds a thin client for the upstream openpi WebSocket policy server
(pi05_droid_jointpos and friends), packaged the same way
isaaclab_arena_gr00t is: own pyproject, no environments, no embodiments,
reuses arena's pick_and_place_maple_table by name with
droid_abs_joint_pos. Touches no files under isaaclab_arena/.
- Pi0DroidRemotePolicy connects via openpi_client.WebsocketClientPolicy,
caches one action chunk + step counter (single-env only — openpi
server is single-obs), reconnects on dropped websockets up to
MAX_RECONNECT_ATTEMPTS, and binarises the gripper after each chunk.
- Variant table (pi05 / pi0 / pi0_fast) selects action_chunk_length and
optional arm-action clipping. pi0_fast clips to ±1.0 (it emits joint
deltas); pi05 leaves arm dims untouched.
- pi0_droid_config.py centralises wire-format constants, the variant
table, and the runtime args dataclass. The policy module holds only
execution logic + the CLI plumbing PolicyBase requires.
- docker/Dockerfile + build.sh extend isaaclab_arena:latest with
openpi-client (pinned via OPENPI_COMMIT) and this package. Pragmatic
dev-mode overlay; canonical external-repo pattern (FROM
nvcr.io/nvidia/isaac-sim, install Isaac Lab + arena + this) is
referenced in the README.
- 13 unit tests with stubbed openpi_client cover wire format, obs
extraction, gripper binarisation, variant clipping, single-env
enforcement, and retry behaviour.
Run via:
/isaac-sim/python.sh isaaclab_arena/evaluation/policy_runner.py \
--policy_type \
isaaclab_arena_openpi.policy.pi0_droid_remote_policy.Pi0DroidRemotePolicy \
--policy_variant pi05 \
--remote_host <pi0-server> --remote_port 8000 \
--enable_cameras --num_episodes 5 --num_envs 1 \
pick_and_place_maple_table --embodiment droid_abs_joint_pos
Signed-off-by: Clemens Volk <cvolk@nvidia.com>
ebc820a to
36a4c72
Compare
There was a problem hiding this comment.
Updated Review (Post-Cleanup #2)
Thanks for addressing the S3 concern! The latest commits simplify documentation and fix the pi0_fast bucket issue.
✅ Changes Since Last Review (31ee8a4 → 265b989)
-
Fixed pi0_fast bucket - Changed from
s3://openpi-assets-simeval/...togs://openpi-assets/checkpoints/polaris/pi0_fast_droid_jointpos_polaris. Now all variants use GCS (gs://), no mixed storage backends! 🎉 -
README simplification - Removed verbose explanation about checkpoint selection rationale (cleaner docs)
-
Config/policy comment cleanup - Trimmed module docstrings and inline comments in
pi0_droid_config.pyandpi0_droid_remote_policy.py(less noise, same functionality) -
Minor code style - Changed
raise ValueError→assertfor variant validation
⚠️ One Small Note
The raise ValueError → assert change in pi0_droid_remote_policy.py means the check can be optimized away with python -O. If this is intentional (trusting callers), fine. If the validation should always run, consider keeping ValueError.
✅ Previous Review Highlights (Still Valid)
- Clean action pipeline with gripper binarisation delegated to arena's config
- Docker server build workflow for reproducibility
- Polaris config variants with proper GCS checkpoints
✅ Overall
All previous concerns addressed. Cleanup looks good. Ready to merge! 👍
📌 Reviewed at: 265b989
The previous suite mixed contract tests with low-signal checks: horizon constants compared against themselves, shape-only round-trips, and a stray invariant assertion on MAX_RECONNECT_ATTEMPTS. Drop those and keep only: * test_pack_request_uses_pi0_wire_keys — the wire-format contract with the openpi droid server. * test_get_action_caches_chunk_and_advances_index — the core cache and index-advancement behavior of get_action. * test_call_server_with_retry_reconnects_on_drop — retry happy path. * test_call_server_with_retry_gives_up_after_max_attempts — retry exhaustion. 11 tests -> 4. Same monkeypatch strategy on WebsocketClientPolicy so we still exercise the real openpi_client. Signed-off-by: Clemens Volk <cvolk@nvidia.com>
Adds isaaclab_arena_openpi/docker/build_server.sh: clones upstream Physical-Intelligence/openpi at a pinned commit, builds with upstream's own scripts/docker/serve_policy.Dockerfile, and appends a COPY step so the resulting image is self-contained (no volume mount at runtime). Tags as isaaclab_arena_openpi-server:<short-sha> and :latest locally. With --push, retags and pushes to nvcr.io/nvstaging/isaac-amr/<image> to match arena's existing push_to_ngc.sh convention. Pins OPENPI_COMMIT in docker/Dockerfile.isaaclab_arena to the same commit so the openpi-client baked into the arena image and the server speak the same wire format. Adapted from robolab/docker/build_openpi_server.sh. Signed-off-by: Clemens Volk <cvolk@nvidia.com>
The previous quickstart asked users to clone openpi on the host and run uv from there. Replace with the docker server flow now that build_server.sh exists: * Step 1 builds the openpi-server image via build_server.sh. * Step 2 docker-runs the server with explicit --policy.config and --policy.dir flags so users see the full invocation. * Step 3 is unchanged (arena client). Document the working hybrid that grasps reliably: upstream's polaris config (declares the Pi0 architecture) plus xuningy's training-output bucket weights (load both params and norm stats, the norm stats from --policy.dir override the config-baked path). Same architecture as the polaris bucket weights, but trained without polaris cotrain data. Split the variants table into separate --policy.config and --policy.dir columns so the distinction is explicit. pi0_fast row flagged as untested (needs s3fs in the image; configs reference s3://). Signed-off-by: Clemens Volk <cvolk@nvidia.com>
* Rename build_server.sh to build_openpi_server.sh so the script is recognizable out of context. * Trim header on build_openpi_server.sh to one descriptive line plus usage. * Trim openpi-client comment in arena's Dockerfile to one line. * Drop the eval_jobs_configs JSON and its README section; users can drive single rollouts with policy_runner.py for now. * Drop the unit-test module docstring. Signed-off-by: Clemens Volk <cvolk@nvidia.com>
* Convert if-raise ValueError on policy_variant to assert per AGENTS.md
style guide.
* Reword task_description assertion message to state the prerequisite
rather than imply a runtime fix ("set via --language_instruction or
on the task definition").
* Drop three redundant private-method docstrings that restated the
method name (_fetch_action_chunk, _extract_droid_observation,
_pack_pi0_request) plus the stray "# Unreachable" comment.
* Restore "Fixed by upstream pi0 droid checkpoints" qualifier on
ACTION_DIM / TARGET_IMAGE_SIZE to signal these are not tunable.
* README cleanup: drop the empirical-perf paragraph from the
config-vs-weights block, fix the pi0_fast row to point at the
upstream polaris GCS bucket instead of an s3:// path with a
speculative s3fs caveat.
Signed-off-by: Clemens Volk <cvolk@nvidia.com>
| RUN /isaac-sim/python.sh -m pip install av | ||
|
|
||
| # openpi remote-client. Pin OPENPI_COMMIT to the commit the server runs at. | ||
| ARG OPENPI_COMMIT=c23745b5ad24e98f66967ea795a07b2588ed6c79 |
There was a problem hiding this comment.
For now I put the openpi dependency on the client into the arena base docker. We could in the future layer this instead ontop as a new container
Greptile SummaryAdds
Confidence Score: 4/5Safe to merge; the new package is self-contained and does not modify existing evaluation paths. The implementation is clean and well-tested. The only notable items are a docstring that overstates the reconnect count by one and no explicit close/cleanup path for the WebSocket client resource. Neither affects correctness during a normal eval run. pi0_droid_remote_policy.py — the retry docstring and missing WebSocket cleanup are worth a quick look before this pattern is copied to future remote policies. Important Files Changed
Sequence DiagramsequenceDiagram
participant Runner as policy_runner.py
participant Policy as Pi0DroidRemotePolicy
participant Client as WebsocketClientPolicy
participant Server as openpi server (Docker)
Runner->>Policy: set_task_description(instruction)
loop Every open_loop_horizon steps
Runner->>Policy: get_action(env, observation)
Policy->>Policy: extract DROID obs (cameras + proprio)
Policy->>Policy: pack pi0 wire-format request
Policy->>Client: infer(request)
Client-->>Server: WebSocket send
Server-->>Client: action chunk (H x 8)
Client-->>Policy: actions ndarray
Policy->>Policy: cache chunk[:open_loop_horizon]
Policy-->>Runner: action[0] as torch.Tensor (1x8)
end
loop Cached steps (open_loop_horizon - 1 remaining)
Runner->>Policy: get_action(env, observation)
Policy-->>Runner: cached action[i] as torch.Tensor (1x8)
end
note over Client,Server: On ConnectionClosedError/OSError: reconnect up to MAX_RECONNECT_ATTEMPTS-1 times, flush cache on each reconnect
Reviews (1): Last reviewed commit: "Address review feedback: trim docstrings..." | Re-trigger Greptile |
| def _call_server_with_retry(self, server_request: dict[str, Any]) -> dict[str, Any]: | ||
| """Send the request, reconnecting up to ``MAX_RECONNECT_ATTEMPTS`` times. |
There was a problem hiding this comment.
The docstring claims to reconnect "up to
MAX_RECONNECT_ATTEMPTS times" but the loop structure only reconnects MAX_RECONNECT_ATTEMPTS - 1 (2) times — the loop has 3 iterations, but the last iteration raises immediately on failure without reconnecting. A caller or operator who reads the docstring and tunes MAX_RECONNECT_ATTEMPTS will get one fewer reconnect than expected.
| def _call_server_with_retry(self, server_request: dict[str, Any]) -> dict[str, Any]: | |
| """Send the request, reconnecting up to ``MAX_RECONNECT_ATTEMPTS`` times. | |
| def _call_server_with_retry(self, server_request: dict[str, Any]) -> dict[str, Any]: | |
| """Send the request, making up to ``MAX_RECONNECT_ATTEMPTS`` total attempts | |
| (i.e. reconnecting up to ``MAX_RECONNECT_ATTEMPTS - 1`` times). |
| self._websocket_client = websocket_client_policy.WebsocketClientPolicy( | ||
| host=self._remote_host, port=self._remote_port | ||
| ) | ||
| print("[Pi0DroidRemotePolicy] Connected.") |
There was a problem hiding this comment.
Missing WebSocket cleanup on policy teardown
_websocket_client is created in __init__ but there is no close() or __del__ override to shut the connection down. If the runner exits or a new policy is instantiated in the same process, the underlying TCP connection and async event-loop resources held by WebsocketClientPolicy are leaked. Consider adding a close() (or __del__) that calls the equivalent cleanup method the upstream client exposes so the resource is freed when the policy is discarded.
| GIT_LFS_SKIP_SMUDGE=1 git clone --quiet "$OPENPI_REPO" "$TMPDIR/openpi" | ||
| cd "$TMPDIR/openpi" | ||
| git checkout "$COMMIT" |
There was a problem hiding this comment.
Full-history clone is slower than necessary
git clone without --filter=blob:none fetches the complete object graph including every historical blob. Consider adding --filter=blob:none (partial clone — fetches tree and commit objects only, blobs fetched on demand) to shorten the one-time build step without changing correctness.
| IMAGE_NAME="${IMAGE_NAME:-isaaclab_arena_openpi-server}" | ||
| NGC_PATH="${NGC_PATH:-nvcr.io/nvstaging/isaac-amr/${IMAGE_NAME}}" | ||
| OPENPI_REPO="https://github.com/Physical-Intelligence/openpi" | ||
| DEFAULT_COMMIT="c23745b5ad24e98f66967ea795a07b2588ed6c79" |
There was a problem hiding this comment.
Can you unify this spec defining the src of openpi with Dockerfile? I wonder if having a shared .env could work.
| @@ -0,0 +1,99 @@ | |||
| # isaaclab_arena_openpi | |||
There was a problem hiding this comment.
Can it be doc rst instead of readme?
| set -euo pipefail | ||
|
|
||
| IMAGE_NAME="${IMAGE_NAME:-isaaclab_arena_openpi-server}" | ||
| NGC_PATH="${NGC_PATH:-nvcr.io/nvstaging/isaac-amr/${IMAGE_NAME}}" |
There was a problem hiding this comment.
External users may not have write access to this path
| trap 'rm -rf "$TMPDIR"' EXIT | ||
|
|
||
| echo "Cloning openpi at ${COMMIT} ..." | ||
| GIT_LFS_SKIP_SMUDGE=1 git clone --quiet "$OPENPI_REPO" "$TMPDIR/openpi" |
There was a problem hiding this comment.
I wonder if the clone could be optional as users may already have a PI src repo and just want to build server docker (if Pi did not ship it).
| docker build \ | ||
| --network=host \ | ||
| -f "$TMPDIR/Dockerfile" \ | ||
| -t "${IMAGE_NAME}:${SHORT_HASH}" \ | ||
| -t "${IMAGE_NAME}:latest" \ | ||
| . | ||
|
|
||
| echo "Built ${IMAGE_NAME}:${SHORT_HASH} (also tagged :latest)" | ||
|
|
||
| if [ "$PUSH" = true ]; then | ||
| echo "Pushing to ${NGC_PATH}:${SHORT_HASH}" | ||
| docker tag "${IMAGE_NAME}:${SHORT_HASH}" "${NGC_PATH}:${SHORT_HASH}" | ||
| docker push "${NGC_PATH}:${SHORT_HASH}" | ||
| echo "Pushed ${NGC_PATH}:${SHORT_HASH}" | ||
| fi |
There was a problem hiding this comment.
If we imagine every policy builds their own docker, then can we make the docker build & push more universal?
Not bind to Pi server docker.
I personally think it's a valid assumption.
| self._next_chunk_step += 1 | ||
| return torch.from_numpy(next_action_np).to(dtype=torch.float32, device=self.device).unsqueeze(0) | ||
|
|
||
| def reset(self, env_ids: torch.Tensor | None = None) -> None: |
There was a problem hiding this comment.
Can you add a TODO for env_id reset when parallel is supported?
| ) | ||
| ) | ||
|
|
||
| def get_action(self, env: gym.Env, observation: dict[str, Any]) -> torch.Tensor: |
There was a problem hiding this comment.
Can you add a TODO for extending to parallel envs?
| self._next_chunk_step = 0 | ||
|
|
||
| def _fetch_action_chunk(self, observation: dict[str, Any]) -> np.ndarray: | ||
| droid_obs = self._extract_droid_observation(observation) |
There was a problem hiding this comment.
Does it have to droid dependent? Can you extend to other embodiments?
Can the dict key be a dataclass/cfg?
|
|
||
| def _extract_droid_observation(self, observation: dict[str, Any]) -> dict[str, np.ndarray]: | ||
| cam = observation["camera_obs"] | ||
| proprio = observation["policy"] |
There was a problem hiding this comment.
We kinda enforce "camera_obs" as the key when inserting camera to embodiment. But how about "policy"? Is it fixed?
| ) | ||
|
|
||
|
|
||
| class Pi0DroidRemotePolicy(PolicyBase): |
There was a problem hiding this comment.
I don't like the idea of limiting the class to only robot specific closedloop policy.
It's okay that you have only tested with Droid, but when building the policy class, it shall be embodiment agnostic, i.e. Pi0RemotePolicy.
You can provide embodiment specific cfg as the data member.
xyao-nv
left a comment
There was a problem hiding this comment.
Sorry I have to request changes -- can policy class be embodiment agostic?
| remote_port=args.remote_port, | ||
| ) | ||
| ) | ||
|
|
There was a problem hiding this comment.
When closing the connection (esp it's used in eval_runner), what's the expected components needed? Gr00t side I have added closing socket and cleaning context to prevent stalling memory.
| "prompt": language_instruction, | ||
| } | ||
|
|
||
| def _call_server_with_retry(self, server_request: dict[str, Any]) -> dict[str, Any]: |
There was a problem hiding this comment.
I wonder if this func could be in the parent class RemotePolicy. Retry with max attempts will always be needed.
| name = "pi0_droid_remote" | ||
| config_class = Pi0DroidRemotePolicyArgs | ||
|
|
||
| def __init__(self, config: Pi0DroidRemotePolicyArgs) -> None: |
There was a problem hiding this comment.
Do you need action_scheduler_cls? This MR seems to only contain obs -> action_chunk. Can you add a TODO for action_chunk to action
| ACTION_DIM = 8 | ||
| TARGET_IMAGE_SIZE = (224, 224) | ||
|
|
||
| # How many actions to replay before refetching a new chunk from the server. | ||
| OPEN_LOOP_HORIZON_BY_VARIANT: dict[str, int] = { | ||
| "pi05": 15, | ||
| "pi0": 10, | ||
| "pi0_fast": 10, | ||
| } | ||
| DEFAULT_VARIANT = "pi05" | ||
|
|
||
| ARENA_EXTERNAL_CAMERA_KEY = "external_camera_rgb" | ||
| ARENA_WRIST_CAMERA_KEY = "wrist_camera_rgb" |
There was a problem hiding this comment.
Shall we do a closedloop-embodiment dataclass? Hydra?
So we can unify how cfg are supposed to set, you have py, and g00t has yaml.
Summary
Adds
isaaclab_arena_openpi/, a new top-level package that lets arena evaluate the upstream openpi pi0/pi05/pi0_fast droid policies. For convenience ships the current openpi server as a dockerized container fixed to current main.What's in the package
policy/pi0_droid_remote_policy.py—Pi0DroidRemotePolicy(PolicyBasesubclass). Connects to an openpi server, fetches one action chunk peropen_loop_horizonsteps, reconnects on dropped websockets up toMAX_RECONNECT_ATTEMPTS. --Single-env only (openpi server is single-obs, asserted at runtime).policy/pi0_droid_config.py— wire-format constants, the variant horizon table, andPi0DroidRemotePolicyArgs.docker/build_openpi_server.sh— clones upstreamPhysical-Intelligence/openpiat a pinned commit, builds the server image with upstream's ownserve_policy.Dockerfile. Optionally pushes to ngc for use on osmo/ nightly etc.Usage