Skip to content

feat: async batch inference benchmark for Crusoe Managed Inference#60

Open
Sakshi3027 wants to merge 1 commit into
crusoecloud:mainfrom
Sakshi3027:feat/async-batch-crusoe
Open

feat: async batch inference benchmark for Crusoe Managed Inference#60
Sakshi3027 wants to merge 1 commit into
crusoecloud:mainfrom
Sakshi3027:feat/async-batch-crusoe

Conversation

@Sakshi3027

Copy link
Copy Markdown

What this adds

Async batch inference with a sequential vs parallel vs batched benchmark on Crusoe Managed Inference.

Mode Time Speedup
Sequential 3.96s baseline
Parallel 0.57s 7.0x faster
Batched (4 at a time) 1.29s 3.1x faster

Benchmark: 8 prompts on Llama-3.3-70B-Instruct.

Why it's useful

Production LLM applications rarely run one prompt at a time evaluation pipelines, batch scoring, and multi-user systems all need concurrent inference. This shows exactly how to use asyncio.gather with ChatCrusoe to get 7x throughput gains, and how to add batch size control for rate-limit-sensitive workloads.

Testing

Tested locally with Groq as a drop-in replacement. Numbers above are from a local run.

To run on Crusoe:
export CRUSOE_API_KEY="your-api-key"
python batch.py

Related contributions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant