Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 7 additions & 7 deletions .agents/skills/nemoclaw-user-configure-inference/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -242,7 +242,7 @@ $ NEMOCLAW_PROVIDER=anthropicCompatible \
nemoclaw onboard --non-interactive
```

## vLLM (Experimental)
## vLLM

When vLLM is already running on `localhost:8000`, NemoClaw can detect it automatically and query the `/v1/models` endpoint to determine the loaded model.
On supported Linux hosts with NVIDIA GPUs, the onboard wizard can also install or start a managed vLLM container for you.
Expand All @@ -254,7 +254,8 @@ $ nemoclaw onboard
```

If vLLM is already running, NemoClaw detects the running model and validates the endpoint.
If vLLM is not running and your host matches a managed profile, set `NEMOCLAW_EXPERIMENTAL=1`, rerun `nemoclaw onboard`, and select the **Install vLLM** or **Start vLLM** entry.
If vLLM is not running and your host matches a DGX Spark or DGX Station managed profile, NemoClaw shows the **Install vLLM** or **Start vLLM** entry by default.
Generic Linux NVIDIA GPU hosts still require `NEMOCLAW_EXPERIMENTAL=1` or `NEMOCLAW_PROVIDER=install-vllm` before the managed entry appears.
NemoClaw pulls the vLLM image, downloads model weights into `~/.cache/huggingface`, starts the `nemoclaw-vllm` container on `localhost:8000`, and prints progress markers while the model loads.
The first run can take 10 to 30 minutes.
Later runs reuse the cached image and model weights.
Expand All @@ -281,11 +282,11 @@ $ NEMOCLAW_PROVIDER=vllm \
nemoclaw onboard --non-interactive
```

Install or start managed vLLM when a supported profile is detected:
Install or start managed vLLM when a supported profile is detected.
On DGX Spark and DGX Station, `NEMOCLAW_PROVIDER=install-vllm` is enough for non-interactive runs; add `NEMOCLAW_EXPERIMENTAL=1` on generic Linux NVIDIA GPU hosts.

```console
$ NEMOCLAW_EXPERIMENTAL=1 \
NEMOCLAW_PROVIDER=install-vllm \
$ NEMOCLAW_PROVIDER=install-vllm \
nemoclaw onboard --non-interactive
```

Expand All @@ -312,8 +313,7 @@ Gated models require a Hugging Face token; export it before onboarding so NemoCl

```console
$ export HF_TOKEN=<your-hf-token>
$ NEMOCLAW_EXPERIMENTAL=1 \
NEMOCLAW_PROVIDER=install-vllm \
$ NEMOCLAW_PROVIDER=install-vllm \
NEMOCLAW_VLLM_MODEL=deepseek-r1-distill-70b \
nemoclaw onboard --non-interactive
```
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -31,15 +31,15 @@ NemoClaw uses provider-specific local tokens for those routes, and rebuilds of l
| Local Ollama | Caveated | Local Ollama API | Available when Ollama is installed or running on the host |
| Local NVIDIA NIM | Experimental | Local OpenAI-compatible | Requires `NEMOCLAW_EXPERIMENTAL=1` and a NIM-capable GPU |
| Local vLLM (already running) | Caveated | Local OpenAI-compatible | Appears in the onboarding menu when NemoClaw detects a server already on `localhost:8000`. No flag required. |
| Local vLLM (managed install/start) | Experimental | Local OpenAI-compatible | Requires `NEMOCLAW_EXPERIMENTAL=1` or `NEMOCLAW_PROVIDER=install-vllm`. NemoClaw pulls/starts a vLLM container on a supported NVIDIA GPU host. |
| Local vLLM (managed install/start) | Caveated | Local OpenAI-compatible | Appears by default on DGX Spark and DGX Station. Generic Linux NVIDIA GPU hosts require `NEMOCLAW_EXPERIMENTAL=1` or `NEMOCLAW_PROVIDER=install-vllm`. NemoClaw pulls/starts a vLLM container on a supported NVIDIA GPU host. |

## Provider Options

The onboard wizard presents the following provider options by default.
The first six are always available.
Ollama appears when it is installed or running on the host.
Experimental local vLLM appears when NemoClaw detects a running vLLM server.
The managed install/start vLLM entry appears when you opt in and NemoClaw detects a supported NVIDIA GPU host profile.
Local vLLM appears when NemoClaw detects a running vLLM server.
The managed install/start vLLM entry appears by default on DGX Spark and DGX Station, and appears on generic Linux NVIDIA GPU hosts after opt-in.

| Option | Description | Curated models |
|--------|-------------|----------------|
Expand Down Expand Up @@ -103,15 +103,16 @@ NemoClaw probes only that interpreter and aborts with the failure reason if it d
Relative command names such as `python3.12` are rejected; use `command -v python3.12` to find the absolute path.
If `python -m venv` itself fails for a probe-clean interpreter (for example, a corrupt ensurepip seed), NemoClaw retries with the next healthy candidate when no pin is set; with a pin set, the failure stops onboarding so you can fix or repoint the pinned python.

## Experimental Options
## Caveated Local Options

The following local inference options are experimental.
Local NIM and managed vLLM install/start require `NEMOCLAW_EXPERIMENTAL=1`; an already-running vLLM server appears directly in the onboarding selection list.
The following local inference options are caveated.
Local NIM and generic Linux managed vLLM install/start require `NEMOCLAW_EXPERIMENTAL=1`; DGX Spark and DGX Station managed vLLM entries appear by default.
An already-running vLLM server appears directly in the onboarding selection list.

| Option | Condition | Notes |
|--------|-----------|-------|
| Local NVIDIA NIM | NIM-capable GPU detected | Pulls and manages a NIM container. |
| Local vLLM | vLLM running on `localhost:8000`, or a supported DGX Spark, DGX Station, or Linux NVIDIA GPU profile | Auto-detects the loaded model when vLLM is already running. Can install or start a managed vLLM container for supported profiles after experimental opt-in. |
| Local vLLM | vLLM running on `localhost:8000`, or a supported DGX Spark, DGX Station, or Linux NVIDIA GPU profile | Auto-detects the loaded model when vLLM is already running. Can install or start a managed vLLM container by default on DGX Spark/Station and after opt-in on generic Linux NVIDIA GPU hosts. |

For setup instructions, refer to Use a Local Inference Server (use the `nemoclaw-user-configure-inference` skill).

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -438,13 +438,13 @@ Different inference providers have different trust and cost profiles.

### Experimental Providers

The `NEMOCLAW_EXPERIMENTAL=1` environment variable gates local NVIDIA NIM and the managed vLLM install/start path. An already-running vLLM server on `localhost:8000` is offered in the menu without a flag, because selecting it is an explicit user action.
The `NEMOCLAW_EXPERIMENTAL=1` environment variable gates local NVIDIA NIM and generic Linux managed vLLM install/start. DGX Spark and DGX Station managed vLLM entries are offered by default, and an already-running vLLM server on `localhost:8000` is offered in the menu without a flag, because selecting either is an explicit user action.

| Aspect | Detail |
|---|---|
| Default | Local NVIDIA NIM and managed vLLM install/start are hidden. Already-running vLLM on `localhost:8000` is offered when detected. |
| What you can change | Set `NEMOCLAW_EXPERIMENTAL=1` before running `nemoclaw onboard` to surface Local NIM and managed vLLM. To request only the managed vLLM path non-interactively, set `NEMOCLAW_PROVIDER=install-vllm`. |
| Risk if relaxed | NemoClaw has not fully validated these providers. NIM requires a NIM-capable GPU. The managed vLLM path pulls a container image and starts it on a supported NVIDIA GPU host. Misconfiguration can cause failed inference or unexpected behavior. |
| Default | Local NVIDIA NIM and generic Linux managed vLLM install/start are hidden. DGX Spark and DGX Station managed vLLM entries, plus already-running vLLM on `localhost:8000`, are offered when detected. |
| What you can change | Set `NEMOCLAW_EXPERIMENTAL=1` before running `nemoclaw onboard` to surface Local NIM and generic Linux managed vLLM. To request only the managed vLLM path non-interactively, set `NEMOCLAW_PROVIDER=install-vllm`. |
| Risk if selected | NemoClaw has not fully validated these providers. NIM requires a NIM-capable GPU. The managed vLLM path pulls a container image and starts it on a supported NVIDIA GPU host. Misconfiguration can cause failed inference or unexpected behavior. |
| Recommendation | Use experimental providers only for evaluation. Do not rely on them for always-on assistants. |

## Posture Profiles
Expand Down
4 changes: 2 additions & 2 deletions .agents/skills/nemoclaw-user-get-started/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -219,11 +219,11 @@ $ NEMOCLAW_PROVIDER=routed NVIDIA_API_KEY=<your-key> nemoclaw onboard --non-inte
The router listens on the host at port `4000`.
The sandbox still calls `https://inference.local/v1`, so do not point in-sandbox tools at the host router port directly.

**Experimental: Local NIM and Local vLLM:**
**Local NIM and Local vLLM:**

- **Local NVIDIA NIM** appears when `NEMOCLAW_EXPERIMENTAL=1` is set and the host has a NIM-capable GPU. NemoClaw pulls and manages a NIM container.
- **Local vLLM (already running)** appears whenever NemoClaw detects a vLLM server on `localhost:8000`. No flag is required for the menu entry. NemoClaw auto-detects the loaded model.
- **Local vLLM (managed install/start)** requires `NEMOCLAW_EXPERIMENTAL=1` or `NEMOCLAW_PROVIDER=install-vllm`. NemoClaw pulls and starts a vLLM container on supported DGX Spark, DGX Station, and Linux NVIDIA GPU hosts.
- **Local vLLM (managed install/start)** appears by default on DGX Spark and DGX Station. Generic Linux NVIDIA GPU hosts require `NEMOCLAW_EXPERIMENTAL=1` or `NEMOCLAW_PROVIDER=install-vllm`. NemoClaw pulls and starts a vLLM container on supported hosts.

For setup, refer to Use a Local Inference Server (use the `nemoclaw-user-configure-inference` skill).

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ NemoClaw v0.0.40 improves onboarding reliability, local inference setup, and san
- The Docker-driver gateway startup check waits for the gateway port to accept TCP connections before it reports the gateway as healthy, and startup failures now include child process exit details.
- Local Ollama setup requires the authenticated reverse proxy token on every native Ollama API route, including `GET /api/tags`.
- The Linux Ollama install path preflights `zstd` before running the official installer and explains why each sudo-backed setup step needs elevated privileges.
- The onboarding provider menu offers an already-running local vLLM server directly when `localhost:8000` responds, while managed vLLM install and start options remain behind the experimental opt-in.
- The onboarding provider menu offers an already-running local vLLM server directly when `localhost:8000` responds. Managed vLLM install and start options now appear by default on DGX Spark and DGX Station, while generic Linux NVIDIA GPU hosts remain behind the experimental opt-in.
- Policy tier defaults are filtered by active agent support, so presets such as Brave Search are not reapplied to agents that do not support that integration.
- `nemoclaw <name> connect` checks dashboard forward reachability with a TCP probe before it reports a forward as stale.
- Sandbox startup captures a known-good OpenClaw config baseline and restores it on restart if `/sandbox/.openclaw/openclaw.json` becomes empty.
Expand Down
2 changes: 1 addition & 1 deletion .agents/skills/nemoclaw-user-reference/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ description: "Describes the NemoClaw plugin and blueprint architecture and how t
<!-- SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. -->
<!-- SPDX-License-Identifier: Apache-2.0 -->

# Architecture
# Architecture Details

## References

Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
<!-- SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. -->
<!-- SPDX-License-Identifier: Apache-2.0 -->
# Architecture
# Architecture Details

NemoClaw combines a host CLI, a TypeScript plugin that runs with OpenClaw inside the sandbox, and a versioned YAML blueprint that defines the sandbox image, policies, and inference profiles applied through OpenShell.

Expand Down
4 changes: 2 additions & 2 deletions ci/platform-matrix.json
Original file line number Diff line number Diff line change
Expand Up @@ -141,9 +141,9 @@
},
{
"name": "Local vLLM (managed install/start)",
"status": "experimental",
"status": "caveated",
"endpoint_type": "Local OpenAI-compatible",
"notes": "Requires `NEMOCLAW_EXPERIMENTAL=1` or `NEMOCLAW_PROVIDER=install-vllm`. NemoClaw pulls/starts a vLLM container on a supported NVIDIA GPU host."
"notes": "Appears by default on DGX Spark and DGX Station. Generic Linux NVIDIA GPU hosts require `NEMOCLAW_EXPERIMENTAL=1` or `NEMOCLAW_PROVIDER=install-vllm`. NemoClaw pulls/starts a vLLM container on a supported NVIDIA GPU host."
}
]
}
2 changes: 1 addition & 1 deletion docs/about/release-notes.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ NemoClaw v0.0.40 improves onboarding reliability, local inference setup, and san
- The Docker-driver gateway startup check waits for the gateway port to accept TCP connections before it reports the gateway as healthy, and startup failures now include child process exit details.
- Local Ollama setup requires the authenticated reverse proxy token on every native Ollama API route, including `GET /api/tags`.
- The Linux Ollama install path preflights `zstd` before running the official installer and explains why each sudo-backed setup step needs elevated privileges.
- The onboarding provider menu offers an already-running local vLLM server directly when `localhost:8000` responds, while managed vLLM install and start options remain behind the experimental opt-in.
- The onboarding provider menu offers an already-running local vLLM server directly when `localhost:8000` responds. Managed vLLM install and start options now appear by default on DGX Spark and DGX Station, while generic Linux NVIDIA GPU hosts remain behind the experimental opt-in.
- Policy tier defaults are filtered by active agent support, so presets such as Brave Search are not reapplied to agents that do not support that integration.
- `nemoclaw <name> connect` checks dashboard forward reachability with a TCP probe before it reports a forward as stale.
- Sandbox startup captures a known-good OpenClaw config baseline and restores it on restart if `/sandbox/.openclaw/openclaw.json` becomes empty.
Expand Down
4 changes: 2 additions & 2 deletions docs/get-started/quickstart.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -232,11 +232,11 @@ The sandbox still calls `https://inference.local/v1`, so do not point in-sandbox

</Accordion>

<Accordion title="Experimental: Local NIM and Local vLLM">
<Accordion title="Local NIM and Local vLLM">

- **Local NVIDIA NIM** appears when `NEMOCLAW_EXPERIMENTAL=1` is set and the host has a NIM-capable GPU. NemoClaw pulls and manages a NIM container.
- **Local vLLM (already running)** appears whenever NemoClaw detects a vLLM server on `localhost:8000`. No flag is required for the menu entry. NemoClaw auto-detects the loaded model.
- **Local vLLM (managed install/start)** requires `NEMOCLAW_EXPERIMENTAL=1` or `NEMOCLAW_PROVIDER=install-vllm`. NemoClaw pulls and starts a vLLM container on supported DGX Spark, DGX Station, and Linux NVIDIA GPU hosts.
- **Local vLLM (managed install/start)** appears by default on DGX Spark and DGX Station. Generic Linux NVIDIA GPU hosts require `NEMOCLAW_EXPERIMENTAL=1` or `NEMOCLAW_PROVIDER=install-vllm`. NemoClaw pulls and starts a vLLM container on supported hosts.

For setup, refer to [Use a Local Inference Server](/inference/use-local-inference).
</Accordion>
Expand Down
15 changes: 8 additions & 7 deletions docs/inference/inference-options.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -39,16 +39,16 @@ NemoClaw uses provider-specific local tokens for those routes, and rebuilds of l
| Local Ollama | Caveated | Local Ollama API | Available when Ollama is installed or running on the host |
| Local NVIDIA NIM | Experimental | Local OpenAI-compatible | Requires `NEMOCLAW_EXPERIMENTAL=1` and a NIM-capable GPU |
| Local vLLM (already running) | Caveated | Local OpenAI-compatible | Appears in the onboarding menu when NemoClaw detects a server already on `localhost:8000`. No flag required. |
| Local vLLM (managed install/start) | Experimental | Local OpenAI-compatible | Requires `NEMOCLAW_EXPERIMENTAL=1` or `NEMOCLAW_PROVIDER=install-vllm`. NemoClaw pulls/starts a vLLM container on a supported NVIDIA GPU host. |
| Local vLLM (managed install/start) | Caveated | Local OpenAI-compatible | Appears by default on DGX Spark and DGX Station. Generic Linux NVIDIA GPU hosts require `NEMOCLAW_EXPERIMENTAL=1` or `NEMOCLAW_PROVIDER=install-vllm`. NemoClaw pulls/starts a vLLM container on a supported NVIDIA GPU host. |
{/* provider-status:end */}

## Provider Options

The onboard wizard presents the following provider options by default.
The first six are always available.
Ollama appears when it is installed or running on the host.
Experimental local vLLM appears when NemoClaw detects a running vLLM server.
The managed install/start vLLM entry appears when you opt in and NemoClaw detects a supported NVIDIA GPU host profile.
Local vLLM appears when NemoClaw detects a running vLLM server.
The managed install/start vLLM entry appears by default on DGX Spark and DGX Station, and appears on generic Linux NVIDIA GPU hosts after opt-in.

| Option | Description | Curated models |
|--------|-------------|----------------|
Expand Down Expand Up @@ -112,15 +112,16 @@ NemoClaw probes only that interpreter and aborts with the failure reason if it d
Relative command names such as `python3.12` are rejected; use `command -v python3.12` to find the absolute path.
If `python -m venv` itself fails for a probe-clean interpreter (for example, a corrupt ensurepip seed), NemoClaw retries with the next healthy candidate when no pin is set; with a pin set, the failure stops onboarding so you can fix or repoint the pinned python.

## Experimental Options
## Caveated Local Options

The following local inference options are experimental.
Local NIM and managed vLLM install/start require `NEMOCLAW_EXPERIMENTAL=1`; an already-running vLLM server appears directly in the onboarding selection list.
The following local inference options are caveated.
Local NIM and generic Linux managed vLLM install/start require `NEMOCLAW_EXPERIMENTAL=1`; DGX Spark and DGX Station managed vLLM entries appear by default.
An already-running vLLM server appears directly in the onboarding selection list.

| Option | Condition | Notes |
|--------|-----------|-------|
| Local NVIDIA NIM | NIM-capable GPU detected | Pulls and manages a NIM container. |
| Local vLLM | vLLM running on `localhost:8000`, or a supported DGX Spark, DGX Station, or Linux NVIDIA GPU profile | Auto-detects the loaded model when vLLM is already running. Can install or start a managed vLLM container for supported profiles after experimental opt-in. |
| Local vLLM | vLLM running on `localhost:8000`, or a supported DGX Spark, DGX Station, or Linux NVIDIA GPU profile | Auto-detects the loaded model when vLLM is already running. Can install or start a managed vLLM container by default on DGX Spark/Station and after opt-in on generic Linux NVIDIA GPU hosts. |

For setup instructions, refer to [Use a Local Inference Server](/inference/use-local-inference).

Expand Down
Loading
Loading