Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 51 additions & 10 deletions architecture/06-cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,16 @@ The Cog CLI is a Go binary that provides commands for the full model lifecycle:

## Commands Overview

| Command | Job To Be Done |
| ----------- | ------------------------------------- |
| `cog init` | Bootstrap a new model project |
| `cog build` | Create a container image |
| `cog run` | Run a prediction in a container |
| `cog exec` | Run arbitrary commands in a container |
| `cog serve` | Start HTTP server in a container |
| `cog push` | Deploy to Replicate |
| `cog login` | Authenticate with Replicate |
| Command | Job To Be Done |
| ---------------- | ------------------------------------- |
| `cog init` | Bootstrap a new model project |
| `cog build` | Create a container image |
| `cog run` | Run a prediction in a container |
| `cog exec` | Run arbitrary commands in a container |
| `cog serve` | Start HTTP server in a container |
| `cog playground` | Browser UI to talk to a running model |
| `cog push` | Deploy to Replicate |
| `cog login` | Authenticate with Replicate |

## Development Commands

Expand Down Expand Up @@ -98,6 +99,44 @@ Builds the image (if needed) and starts a container running the [Container Runti

**Code**: `pkg/cli/serve.go`

---

### cog playground

**Job**: Open a browser UI for talking to a running model.

```bash
cog serve -p 8393 # terminal 1: start the model API
cog playground # terminal 2: opens the UI in your browser
```

Unlike the other commands, `cog playground` doesn't build an image or run model code. It starts a small Go web server that serves a schema-driven browser UI -- a Postman-like tool for Cog models -- and reverse-proxies requests to a _separate_ running model API, typically one started with `cog serve`. The UI reflects the model's [Schema](./02-schema.md) from `/openapi.json` and lets you run sync, streaming (SSE), and async predictions with either a generated form or raw JSON input.

The browser only ever talks to the playground's own origin, which forwards to the target API chosen at runtime (via an `X-Cog-Target` header). Proxying sidesteps CORS -- the model API sets none -- and keeps SSE streaming intact. Async predictions have no GET-by-id endpoint, so the server also hosts a webhook sink and relays delivered events back to the browser over its own SSE stream.

```mermaid
flowchart LR
Browser["Browser UI"]

subgraph Playground["cog playground (host)"]
Static["Static UI assets<br/>(go:embed)"]
Proxy["Reverse proxy<br/>/proxy/*"]
Sink["Webhook sink + relay<br/>/webhook/{token} → /events"]
end

Model["Target model API<br/>(e.g. cog serve)"]

Browser -->|"load UI"| Static
Browser -->|"schema, predictions, SSE"| Proxy
Proxy -->|"X-Cog-Target"| Model
Model -->|"webhook (async)"| Sink
Sink -->|"SSE events"| Browser
```

The UI is plain HTML/JS (no build step); JSON is edited and displayed with a vendored Ace editor. Assets are compiled into the binary with `go:embed`.

**Code**: `pkg/cli/playground.go` (server, reverse proxy, webhook sink); `pkg/cli/playground/` (embedded UI assets)

## Build Commands

### cog build
Expand Down Expand Up @@ -219,9 +258,11 @@ pkg/cli/
├── predict.go # prediction execution and legacy cog predict
├── exec.go # cog exec
├── serve.go # cog serve
├── playground.go # cog playground (UI server, reverse proxy, webhook sink)
├── push.go # cog push
├── login.go # cog login
Comment thread
anish-sahoo marked this conversation as resolved.
└── init.go # cog init
├── init.go # cog init
└── playground/ # embedded playground UI assets (go:embed)
```

Commands delegate to packages under `pkg/`:
Expand Down
43 changes: 43 additions & 0 deletions docs/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -175,6 +175,49 @@ cog login [flags]
--token-stdin Pass login token on stdin instead of opening a browser. You can find your Replicate login token at https://replicate.com/auth/token
```

## `cog playground`

Open a browser playground for talking to a running model.

Starts a local web server that serves a schema-driven UI (a Postman-like tool
for Cog models). Point it at any running Cog HTTP API -- for example one started
with 'cog serve' -- and the playground reflects that model's inputs and outputs
from its OpenAPI schema in real time.

Requests are reverse-proxied through this server, so the target API does not
need to set CORS headers. The server also hosts a webhook sink so async
predictions can be observed in the browser.

Async/webhook testing against a containerized model requires the webhook URL to
be reachable from inside the container. On Docker Desktop the default
'host.docker.internal' works once the server listens on a reachable interface
(e.g. --host 0.0.0.0).

```
cog playground [flags]
```

**Examples**

```
# Start a model API in one terminal
cog serve -p 8393

# Open the playground pointing at it
cog playground --target http://localhost:8393
```

**Options**

```
-h, --help help for playground
--host string Address to bind (use 0.0.0.0 to receive webhooks from containers) (default "127.0.0.1")
--no-open Do not open the browser automatically
-p, --port int Port to listen on (0 picks a free port)
--target string Default target model API URL (default "http://localhost:8393")
--webhook-host string Hostname the model uses to reach this server for webhooks (default "host.docker.internal")
```

## `cog push`

Build a Docker image from cog.yaml and push it to a container registry.
Expand Down
112 changes: 112 additions & 0 deletions docs/llms.txt

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

65 changes: 65 additions & 0 deletions docs/playground.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# Playground

`cog playground` opens a browser UI for talking to a running Cog model — a Postman-like tool that reflects your model's inputs and outputs from its OpenAPI schema and lets you run predictions interactively.

It doesn't build an image or run your model. Point it at a model API you're already running — typically [`cog serve`](cli.md#cog-serve) — and it proxies requests to that API.

## Quick start

Start your model's HTTP server in one terminal:

```sh
cog serve -p 8393
```

Open the playground in another:

```sh
cog playground --target http://localhost:8393
```

This serves the UI on a local port and opens it in your browser. You can change the target API from the UI at any time.

## What you can do

- **Schema-driven form.** Inputs render from the model's `/openapi.json` as the appropriate widgets (text, number, boolean, enum, list, file, secret). Optional fields without a default start unchecked so they're omitted.
- **Form or JSON.** Toggle between the generated form and a JSON editor; the two stay in sync.
- **Files by upload or URL.** File inputs accept an uploaded file (sent as a data URI) or a URL, with an inline preview for images, audio, and video.
- **Sync, streaming, or async.** Run modes appear based on what the model supports — streaming (SSE) when the predictor uses `@cog.streaming`, and async via webhooks.
- **Rendered or raw output.** View the rendered result (media, text, JSON) or switch to **Raw** to see exactly what arrived over the wire. A Copy button grabs the whole payload.

## Options

| Flag | Description |
| ---------------- | ---------------------------------------------------------------------------------------------- |
| `--target` | Default model API URL (also changeable in the UI). Defaults to `http://localhost:8393`. |
| `-p, --port` | Port to listen on. `0` (default) picks a free port. |
| `--host` | Address to bind. Use `0.0.0.0` to receive webhooks from a containerized model. |
| `--webhook-host` | Hostname the model uses to reach the playground for webhooks (default `host.docker.internal`). |
| `--no-open` | Don't open the browser automatically. |

## CORS and webhooks

Requests are reverse-proxied through the playground, so the model API doesn't need to send any CORS headers.

[Async predictions](http.md#webhooks) are observed via webhooks (there's no status-polling endpoint), so the playground hosts a webhook sink and relays events to the browser. For this to work against a model running in a container, the playground must be reachable from inside the container:

```sh
cog playground --host 0.0.0.0 --webhook-host host.docker.internal
```

> [!NOTE]
> Sync and streaming predictions work without any of this — the webhook setup is only needed for async runs.

## Remote models

If your model runs on another machine, forward its port over SSH and point the playground at it:

```sh
ssh -L 8393:localhost:5000 user@remote
cog playground --target http://localhost:8393
```

Sync and streaming work over the tunnel. For async/webhooks, run the playground next to the model on the remote and forward only the UI port instead.

See the [CLI reference](cli.md#cog-playground) for the full list of flags.
4 changes: 2 additions & 2 deletions examples/streaming-text/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ This example shows how a Cog runner can yield text chunks as a model generates t
From this directory:

```sh
cog predict -i prompt="Write a short haiku about databases"
cog run -i prompt="Write a short haiku about databases"
```

This returns the final accumulated output after the prediction completes.
Expand Down Expand Up @@ -46,6 +46,6 @@ data: {"id":"streaming-demo","status":"succeeded",...}

## How it works

`predict.py` defines `run() -> Iterator[str]`. Each `yield` becomes one streamed output chunk. The example uses Hugging Face `TextIteratorStreamer` to receive generated text from `model.generate()` while generation is still running.
`run.py` defines `run() -> Iterator[str]`. Each `yield` becomes one streamed output chunk. The example uses Hugging Face `TextIteratorStreamer` to receive generated text from `model.generate()` while generation is still running.

The normal prediction response still contains the accumulated output for compatibility. Requesting `Accept: text/event-stream` is useful when clients want to display tokens as they arrive.
2 changes: 1 addition & 1 deletion examples/streaming-text/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
torch==2.12.0
torch==2.8.0
transformers==5.0.0rc3
accelerate==1.6.0
4 changes: 2 additions & 2 deletions examples/streaming-text/run.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,12 +12,12 @@
class Runner(BaseRunner):
def setup(self) -> None:
self.device = "cuda" if torch.cuda.is_available() else "cpu"
dtype = torch.float16 if self.device == "cuda" else torch.float32
dtype = torch.bfloat16 if self.device == "cuda" else torch.float32

self.tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
self.model = AutoModelForCausalLM.from_pretrained(
MODEL_NAME,
torch_dtype=dtype,
dtype=dtype,
).to(self.device)
self.model.eval()

Expand Down
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ nav:
- Prediction API: python.md
- Training API: training.md
- HTTP API: http.md
- Playground: playground.md
- CLI: cli.md
- Environment variables: environment.md
- Private registry: private-package-registry.md
Expand Down
Loading
Loading