Fix: Improve HTTP API structure and async handler usage (#569) by thchann · Pull Request #2063 · roboflow/inference

thchann · 2026-03-02T17:30:07Z

What does this PR do?

Refactors HttpInterface into modular FastAPI routers under inference/core/interfaces/http/routes/ (inference, models, workflows, stream, core_models, legacy, info, health).
Fixes incorrect async usage: blocking handlers are now sync + with_route_exceptions, while only truly async code (e.g. stream manager, WebRTC worker) remains async + with_route_exceptions_async.
Keeps the external HTTP API surface (paths, methods, response models, flags) unchanged; this is a structural/maintenance refactor plus async/sync correctness for the original HTTP API redesign ticket Improve API structure + put non-async handlers properly #569.

Type of Change

Refactoring (no functional changes)

Testing

I have tested this change locally
I have added/updated tests for this change
Ran: pytest -m "not slow" tests/

Test details:

Verified that:
- tests/inference/unit_tests/core/interfaces/http/test_remote_processing_time_middleware.py imports the refactored http_api and passes.
- No HTTP route tests fail due to the router extraction or async/sync changes (any remaining issues are unrelated env/dependency problems).

Checklist

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code where necessary, particularly in hard-to-understand areas
My changes generate no new warnings or errors
I have updated the documentation accordingly (if applicable)

Additional Context

PawelPeczek-Roboflow · 2026-03-06T17:21:30Z

Thanks for contribution - since we are really busy this week, postponing review

PawelPeczek-Roboflow · 2026-03-13T15:39:58Z

Review is ongoing - we've identified couple of issues, seem like @dkosowski87 has some feedback here

dkosowski87

Nice clean up, thanks 🚀 left some comments, after fixing those we will be able to merge this.

dkosowski87 · 2026-03-11T13:16:01Z

inference/core/interfaces/http/routes/info.py

+
+from fastapi import APIRouter, HTTPException, Query
+
+rom inference.core.version import __version__


Suggested change

rom inference.core.version import __version__

from inference.core.version import __version__

dkosowski87 · 2026-03-11T13:20:51Z

inference/core/interfaces/http/routes/inference.py

+from inference.core.interfaces.http.error_handlers import with_route_exceptions
+from inference.core.interfaces.http.orjson_utils import orjson_response
+from inference.core.managers.base import ModelManager
+from inference.core.utils.model_alias import resolve_roboflow_model_alias


This was moved inference/models/aliases.py

dkosowski87 · 2026-03-11T13:23:00Z

inference/core/interfaces/http/routes/inference.py

+
+    @router.post(
+        "/infer/semantic_segmentation",
+        response_model=Union[InstanceSegmentationInferenceResponse, StubResponse],


Suggested change

response_model=Union[InstanceSegmentationInferenceResponse, StubResponse],

response_model=Union[SemanticSegmentationInferenceResponse, StubResponse],

dkosowski87 · 2026-03-11T13:36:37Z

inference/core/interfaces/http/routes/core_models.py

+    if DEPTH_ESTIMATION_ENABLED:
+
+        @router.post(
+            "/infer/depth-estimation",


Probably we could set up a different route as this one will be shadowed by the one in inference.

dkosowski87 · 2026-03-18T09:01:51Z

inference/core/interfaces/http/routes/workflows.py

+    router = APIRouter()
+
+    def process_workflow_inference_request(
+        workflow_request,


Missing type hint

dkosowski87 · 2026-03-18T09:49:42Z

inference/core/interfaces/http/http_api.py

-                    countinference=countinference,
-                    service_secret=service_secret,
-                )
+            app.include_router(create_inference_router(model_manager=self.model_manager))

            if LMM_ENABLED or MOONDREAM2_ENABLED:


This part should also go the inference module. It is nested under if not LAMBDA and not GCP_SERVERLESS:

dkosowski87 · 2026-03-18T11:13:06Z

inference/core/interfaces/http/http_api.py

-                    max_concurrent_steps=WORKFLOWS_MAX_CONCURRENT_STEPS,
-                    prevent_local_images_loading=True,
-                )
-                return WorkflowValidationStatus(status="ok")

        if WEBRTC_WORKER_ENABLED:


We could probably extract this one also.

dkosowski87 · 2026-03-18T11:16:32Z

inference/core/interfaces/http/routes/health.py

+        """Health endpoint for Kubernetes liveness probe."""
+        return {"status": "healthy"}
+
+    return router


Always add a new line at the end of the file

dkosowski87 · 2026-03-18T11:22:04Z

inference/core/interfaces/http/routes/core_models.py

+            ),
+            countinference: Optional[bool] = None,
+            service_secret: Optional[str] = None,
+        ):


Maybe this is a result on working on a previous vesion of the repo. But this si missing:

if not SAM3_FINE_TUNED_MODELS_ENABLED: if not inference_request.model_id.startswith("sam3/"): raise HTTPException( status_code=501, detail="Fine-tuned SAM3 models are not supported on this deployment. Please use a workflow or self-host the server.", )

Let's update the branch and see if it will be brought back.

dkosowski87 · 2026-03-18T11:26:00Z

inference/core/interfaces/http/http_api.py

-                ):
-                    """
-                    Runs the YOLO-World zero-shot object detection model.
+            @app.get(


Notebook could also go to a separate module

Refactor HTTP API into routers and fix async handlers

9cc4725

thchann requested review from PawelPeczek-Roboflow, dkosowski87, grzegorz-roboflow, hansent, probicheaux and yeldarby as code owners March 2, 2026 17:30

dkosowski87 requested changes Mar 18, 2026

View reviewed changes


		from fastapi import APIRouter, HTTPException, Query

		rom inference.core.version import __version__

	rom inference.core.version import __version__
	from inference.core.version import __version__

	response_model=Union[InstanceSegmentationInferenceResponse, StubResponse],
	response_model=Union[SemanticSegmentationInferenceResponse, StubResponse],

Conversation

thchann commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Type of Change

Testing

Checklist

Additional Context

Uh oh!

PawelPeczek-Roboflow commented Mar 6, 2026

Uh oh!

PawelPeczek-Roboflow commented Mar 13, 2026

Uh oh!

dkosowski87 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

thchann commented Mar 2, 2026 •

edited

Loading