Support router as replica with pipelines by Bihan · Pull Request #3721 · dstackai/dstack

Bihan · 2026-03-31T12:32:47Z

Refer design document for this PR is here.

r4victor · 2026-04-08T06:19:13Z

src/dstack/_internal/server/background/pipeline_tasks/service_router_worker_sync.py

+
+
+class ServiceRouterWorkerSyncFetcher(Fetcher[ServiceRouterWorkerSyncPipelineItem]):
+    @sentry_utils.instrument_named_task("pipeline_tasks.ServiceRouterWorkerSyncFetcher.fetch")


I recently added @sentry_utils.instrument_pipeline_task – use it to avoid hardcoding pipeline_tasks prefix.

r4victor · 2026-04-08T06:28:33Z

src/dstack/_internal/server/background/pipeline_tasks/service_router_worker_sync.py

+            run_model = sync_row.run
+            if run_model is None:
+                await session.delete(sync_row)
+                await session.commit()
+                return


How can run_model be None here?

I thought what if the run row can be hard-deleted, so sync_row.run becomes None. If this is not possible we can delete this block.

But you defined run_id as non-optional with ondelete="CASCADE" - how can it be possible?

You are right. Maybe I delete this block.

r4victor · 2026-04-08T06:34:42Z

src/dstack/_internal/server/background/pipeline_tasks/service_router_worker_sync.py

+                .options(
+                    selectinload(RunModel.project),
+                    selectinload(RunModel.jobs).selectinload(JobModel.project),
+                    selectinload(RunModel.jobs)
+                    .selectinload(JobModel.instance)
+                    .selectinload(InstanceModel.project),
+                )
+            )


This is potentially a very inefficient select – a run can have thousands of job submissions. Select only the jobs that the processing needs, i.e. only the router replica job. Also every selectinload will be a separate query here – not sure if it's justified. joinedload may be a better suited for a one-to-one rel. Also, try to avoid loading all models's columns and use load_only to select only the necessary.

Please check if below proposed query addresses the concerns

Avoid loading thousands of job submissions: no longer load RunModel.jobs unconditionally. The selectinload(RunModel.jobs.and_(...)) restricts the loaded jobs to only RUNNING + registered replicas, which are the only ones sync_router_workers_for_run_model() can use (router job selection and worker list building both ignore non‑running / unregistered jobs).

selectinload is intentional: RunModel.jobs is a one‑to‑many collection; using joinedload would duplicate the RunModel row per job.

joinedload for one‑to‑one/many‑to‑one: RunModel.project, JobModel.project, JobModel.instance, InstanceModel.project are loaded with joinedload because these are scalar relationships from from run,job and instance.

Use load_only: This limits columns required by sync_router_workers_for_run_model(run_for_sync) and _get_service_replica_client(job_model)

res = await session.execute( select(RunModel) .where(RunModel.id == item.run_id) .options( load_only(RunModel.id, RunModel.run_spec), selectinload( RunModel.jobs.and_( JobModel.status == JobStatus.RUNNING, JobModel.registered == true(), ) ) .load_only( JobModel.id, JobModel.status, JobModel.registered, JobModel.job_spec_data, JobModel.job_provisioning_data, JobModel.job_runtime_data, ) .options( joinedload(JobModel.project).load_only(ProjectModel.id, ProjectModel.ssh_private_key), joinedload(JobModel.instance) .load_only(InstanceModel.id, InstanceModel.remote_connection_info) .joinedload(InstanceModel.project) .load_only(ProjectModel.id, ProjectModel.ssh_private_key), ), ) )

looks good, at least at a glance

r4victor · 2026-04-08T06:39:31Z

src/dstack/_internal/server/services/runs/router_worker_sync.py

+    router_jobs = [
+        j
+        for j in run_model.jobs
+        if job_belongs_to_group(j, group_name) and j.status == JobStatus.RUNNING
+    ]
+    if not router_jobs or not is_replica_registered(router_jobs):
+        return None
+    return router_jobs[0]


Can there be multiple router jobs? If so, how does that work?

For the first iteration, I suggest restricting the router replica group to count: 1 via configuration validation. The current sync logic effectively assumes a single active router job. We can extend this later to support multiple router replicas for HA.

it's worth a comment!

r4victor · 2026-04-08T06:43:05Z

src/dstack/_internal/server/services/runs/__init__.py

+def run_spec_has_router_replica_group(run_spec: RunSpec) -> bool:
+    if run_spec.configuration.type != "service":
+        return False
+    cfg = run_spec.configuration
+    if not isinstance(cfg, ServiceConfiguration):
+        return False
+    return any(g.router is not None for g in cfg.replica_groups)
+
+
+async def ensure_service_router_worker_sync_row(


Why put these router-speicfic functions in top of runs services.

I kept it there because they are used by run lifecycle. Should I shift them to src/dstack/_internal/server/services/router_worker_sync.py?

I mean at least they should not be at the top of the file.

r4victor · 2026-04-08T06:45:29Z

src/dstack/_internal/server/services/runs/__init__.py

                            ],
                        )
                    global_replica_num += 1
+            await ensure_service_router_worker_sync_row(session, run_model, run_spec)


I think in-place update supports replicas. What happens if a user adds a router replica in in-place update if ensure_service_router_worker_sync_row() gets called only on submit_run()?

Thanks for pointing out. I need to call ensure_service_router_worker_sync_row after this

What happens if a user adds a router replica in in-place update

@Bihan, is this use case expected to work at all? I think it won't work with the current implementation, because adding a router replica means that only this replica should receive requests, which means that other existing replicas should be unregistered from the gateway, which doesn't seem to be implemented.

Similarly, due to the need to register or unregister existing replicas, I assume that the following use cases won't work as expected:

Removing a router replica group.

Adding the router property to an existing replica group.

Removing the router property from an existing replica group.

If supporting these use cases requires additional effort, I can suggest to forbid them for now (see _check_can_update_configuration). And, in that case, only call ensure_service_router_worker_sync_row here and simplify its implementation

r4victor · 2026-04-08T06:46:34Z

src/dstack/_internal/server/services/runs/__init__.py

+    if not run_spec_has_router_replica_group(run_spec):
+        return
+    res = await session.execute(
+        select(ServiceRouterWorkerSyncModel.id).where(
+            ServiceRouterWorkerSyncModel.run_id == run_model.id
+        )
+    )
+    if res.scalar_one_or_none() is not None:
+        return


How can it be that ServiceRouterWorkerSyncModel already exists for a run if ensure_service_router_worker_sync_row is called only on run submit?

r4victor · 2026-04-08T06:48:48Z

src/dstack/_internal/server/background/pipeline_tasks/service_router_worker_sync.py

+                return
+            run_model = sync_row.run
+            if run_model is None:
+                await session.delete(sync_row)


We generally use soft deletes in dstack server easier debugging and historical data. Assuming there will be very few ServiceRouterWorkerSyncModel rows (one per service replica router), I'd also soft-delete it for consistency.

r4victor · 2026-04-08T06:50:11Z

src/dstack/_internal/server/models.py

    )


+class ServiceRouterWorkerSyncModel(PipelineModelMixin, BaseModel):


Let's put it somewhere in the end of the file so that "core" models come first.

r4victor · 2026-04-08T06:52:14Z

src/dstack/_internal/server/services/jobs/job_replica_http_client.py

@@ -0,0 +1,49 @@
+"""SSH-tunneled async HTTP client to a job's service port (same path as probes)."""


put this file in jobs services?

r4victor · 2026-04-08T06:53:05Z

src/dstack/_internal/server/services/runs/router_worker_sync.py

@@ -0,0 +1,345 @@
+"""Reconcile SGLang router /workers with dstack's registered worker replicas (async, SSH-tunneled)."""


put this file in runs services

r4victor

Did a quick review of the pipeline code. Haven't looked into the worker sync logic.

src/dstack/_internal/server/services/job_replica_http_client.py

src/dstack/_internal/core/models/configurations.py

src/dstack/_internal/server/services/runs/router_worker_sync.py

jvstme · 2026-04-09T23:22:41Z

src/dstack/_internal/server/services/runs/router_worker_sync.py

+async def _stream_response_body_bytes(resp: Response, max_bytes: int) -> bytes:
+    buf = bytearray()
+    async for chunk in resp.aiter_bytes():
+        buf.extend(chunk)
+        if len(buf) > max_bytes:
+            raise _ResponseTooLargeError()
+    return bytes(buf)


(nit) We have the join_byte_stream_checked function that appears to do the same thing

src/dstack/_internal/proxy/gateway/services/registry.py

src/dstack/_internal/core/models/configurations.py

src/dstack/_internal/proxy/gateway/services/registry.py

src/dstack/_internal/server/background/pipeline_tasks/service_router_worker_sync.py

src/dstack/_internal/proxy/lib/services/service_connection.py

src/dstack/_internal/server/background/pipeline_tasks/service_router_worker_sync.py

src/dstack/_internal/server/services/runs/router_worker_sync.py

src/dstack/_internal/proxy/gateway/services/registry.py

src/dstack/_internal/server/background/pipeline_tasks/service_router_worker_sync.py

jvstme · 2026-04-14T19:31:50Z

docs/blog/posts/pd-disaggregation.md

(nit) In my view, blog posts should generally remain unchanged, as they are timestamped and serve a historical purpose. As a reader, I wouldn't expect their content to change significantly over time.

I would keep the blog post as is, but add a note at the top indicating that gateway routers are deprecated, along with a reference to the relevant replica-group routers docs or examples.

jvstme · 2026-04-14T19:35:42Z

docs/docs/concepts/gateways.md

+!!! note "Deprecation"
+    Configuring the SGLang router in a gateway will be deprecated in a future release.


(nit) I'd put this at the top of the Router section, so that users don't have to read all of it before they find out that it's irrelevant.

Or even remove the section.

Also:

will be deprecated in a future release

More like is deprecated and will be disallowed in a future release?

jvstme · 2026-04-14T19:46:26Z

examples/inference/sglang/README.md


-<!-- TODO: Gateway creation using fleets is coming to simplify this. -->
+!!! note "Gateway-based routing (deprecated)"
+    If you create a gateway with the [`sglang` router](https://dstack.ai/docs/concepts/gateways/#sglang), you can also run SGLang with PD disaggregation. This method will be deprecated in the future in favor of running the router as a replica.


(nit)

will be deprecated in the future

More like is deprecated and will be disallowed in the future?

jvstme · 2026-04-14T19:55:07Z

examples/inference/sglang/README.md

+#### SSH fleet

-For example, if you run services on the `kubernetes` backend, make sure to also create the gateway in the same backend:
+Create an [SSH fleet](https://dstack.ai/docs/concepts/fleets/#apply-a-configuration) that includes one CPU host for the router and one or more GPU hosts for the workers. Make sure the CPU and GPU hosts are in the same network.


(nit) Does it have to be an SSH fleet specifically? I thought elastic (nodes: 0..) cloud and kubernetes fleets could work too — just don't specify any resource constraints in the fleet, and dstack will automatically provision the correct instances (both CPU and GPU, in the same fleet) based on the resources specified in replicas in the run configuration.

Only some backends won't work — like Nebius, which requires all instances in the cluster to be homogeneous.

The are more references to SSH fleets in the docs updated in this PR

jvstme · 2026-04-14T20:03:32Z

examples/inference/sglang/README.md

+fleets: [pd-disagg]

-# Custom probe is required for PD disaggregation
+# Custom probe is required for PD disaggregation.


(nit) By the way, is it still required? I thought sync_router_workers_for_run_model can gracefully handle the router or workers not being ready, and perform the registration eventually, once they become ready

jvstme · 2026-04-14T20:48:39Z

src/dstack/_internal/core/models/configurations.py

+    def validate_replica_group_router_mutex(cls, values):
+        """
+        When a replica group sets `router:`, service-level `router` must be omitted.


jvstme · 2026-04-14T21:08:12Z

...al/server/migrations/versions/2026/03_29_1200_e7f4a91b2c3d_add_service_router_worker_sync.py

+    op.create_index(
+        op.f("ix_service_router_worker_sync_pipeline_fetch_q"),
+        "service_router_worker_sync",
+        [sa.literal_column("last_processed_at ASC")],
+        unique=False,
+    )


Isn't this missing the sqlite_where and postgresql_where that are present in models.py?

Not sure why they weren't added automatically. I'd suggest to try and re-generate the migration (using Alembic, in case that's not what you were using previously)

jvstme · 2026-04-14T21:22:34Z

src/dstack/_internal/server/services/proxy/services/service_proxy.py

+    if service.router is not None and service.router.type == RouterType.SGLANG:
+        path_for_match = path if path.startswith("/") else f"/{path}"
+        if not _is_whitelisted_path(path_for_match, _SGLANG_WHITELISTED_PATHS):
+            raise ProxyError("Path is not allowed for this service", status.HTTP_404_NOT_FOUND)


(nit) 403 Forbidden for consistency with the gateway and better semantics?

jvstme · 2026-04-14T21:26:13Z

src/dstack/_internal/server/services/proxy/services/service_proxy.py

+_SGLANG_WHITELISTED_PATHS = (
+    "/generate",
+    "/v1/",
+    "/chat/completions",
+)


(nit) Duplicates the list from the gateway. Consider importing the constant from some common place, like proxy/lib/const.py

jvstme · 2026-04-14T22:21:30Z

src/dstack/_internal/server/background/pipeline_tasks/service_router_worker_sync.py

+                await session.execute(
+                    update(ServiceRouterWorkerSyncModel)
+                    .where(
+                        ServiceRouterWorkerSyncModel.id == item.id,
+                        ServiceRouterWorkerSyncModel.lock_token == item.lock_token,
+                    )
+                    .values(**early_cleanup_update_map)
+                )
+                await session.commit()
+                return


(nit) Missing the log_lock_token_changed_after_processing call in case the update fails.

To avoid such discrepancies, consider refactoring, so that there is only one place that performs the update (currently there are three in the same method)

Bihan force-pushed the support_router_replica_with_pipelines branch from 2fe5e14 to bafd2d9 Compare April 1, 2026 07:22

Bihan requested review from jvstme and r4victor April 7, 2026 10:33

r4victor reviewed Apr 8, 2026

View reviewed changes

r4victor requested changes Apr 8, 2026

View reviewed changes

Bihan force-pushed the support_router_replica_with_pipelines branch from e155d17 to 7b268cb Compare April 9, 2026 10:36

jvstme reviewed Apr 10, 2026

View reviewed changes

src/dstack/_internal/server/background/pipeline_tasks/service_router_worker_sync.py Show resolved Hide resolved

jvstme reviewed Apr 12, 2026

View reviewed changes

Bihan Rana added 8 commits April 13, 2026 13:03

Resolve Merge Conflict

2e46b95

Resolve pyright test

14bab7a

Resolve tests

f99bdd2

Optimize ServiceRouterWorkerSyncWorkerProcess select query

35120a3

Use soft_delete for ServiceRouterWorkerSyncModel

8481bd3

Remove worker registration to gateway in PD

f04999e

Resolve review comments and add ServiceRouterWorkerSyncPipeline test

b349f2c

Resolve Migration Conflict

8fe01e5

Bihan force-pushed the support_router_replica_with_pipelines branch from 3bc04df to 8fe01e5 Compare April 13, 2026 07:33

Bihan Rana added 2 commits April 13, 2026 21:11

Resolve review comments

c5a6716

Resolve Comments

37a1c5a

Bihan Rana added 3 commits April 14, 2026 11:54

Resolve all review comments

397cf98

Resolve all review comments

cbb13f0

Update docs for router as replica

59d246b

Bihan changed the title ~~[Draft PR] Support router as replica with pipelines~~ Support router as replica with pipelines Apr 14, 2026

jvstme reviewed Apr 14, 2026

View reviewed changes



		class ServiceRouterWorkerSyncFetcher(Fetcher[ServiceRouterWorkerSyncPipelineItem]):
		@sentry_utils.instrument_named_task("pipeline_tasks.ServiceRouterWorkerSyncFetcher.fetch")

		)


		class ServiceRouterWorkerSyncModel(PipelineModelMixin, BaseModel):

		@@ -0,0 +1,49 @@
		"""SSH-tunneled async HTTP client to a job's service port (same path as probes)."""

		@@ -0,0 +1,345 @@
		"""Reconcile SGLang router /workers with dstack's registered worker replicas (async, SSH-tunneled)."""

		!!! note "Deprecation"
		Configuring the SGLang router in a gateway will be deprecated in a future release.

Conversation

Bihan commented Mar 31, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Bihan Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

r4victor left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Bihan Apr 8, 2026 •

edited

Loading