feat: multi-model comparison benchmark for Crusoe Managed Inference by Sakshi3027 · Pull Request #61 · crusoecloud/solutions-library

Sakshi3027 · 2026-06-14T16:45:30Z

What this adds

A benchmark that runs multiple Crusoe models across reasoning, code generation,
and summarization tasks concurrently producing a latency and throughput leaderboard.

Models compared:

meta-llama/Llama-3.3-70B-Instruct
deepseek-ai/DeepSeek-V3-0324
Qwen/Qwen3-235B-A22B

Tasks:

Reasoning (multi-step math)
Code generation (Sieve of Eratosthenes with type hints)
Summarization (SQL vs NoSQL in 3 bullets)

All 9 combinations (3 models × 3 tasks) run concurrently via asyncio.gather.
Total benchmark time equals the slowest single call.

Why it's useful

Teams evaluating which Crusoe model to use for their workload need a
quick way to compare quality and speed across tasks. This gives them
a runnable starting point they can extend with their own prompts.

Testing

Tested locally with Groq (3 Llama variants) as a drop-in replacement.

To run on Crusoe:
export CRUSOE_API_KEY="your-api-key"
python compare.py

Related contributions

feat: LangGraph multi-node research agent on Crusoe Managed Inference #55 — LangGraph multi-node agent
feat: MLflow experiment tracking for Crusoe Managed Inference #56 — MLflow experiment tracking
feat: RAG pipeline with Qdrant and sentence-transformers on Crusoe Managed Inference #57 — RAG pipeline with Qdrant
feat: structured output and tool calling examples for Crusoe Managed Inference #58 — Structured output and tool calling
feat: real-time streaming output examples for Crusoe Managed Inference #59 — Real-time streaming output
feat: async batch inference benchmark for Crusoe Managed Inference #60 — Async batch inference benchmark

feat: add multi-model comparison benchmark for Crusoe Managed Inference

04c9332

Sakshi3027 requested review from chinmaybaikar, datadoc24, smazigh and youngjeong46 as code owners June 14, 2026 16:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: multi-model comparison benchmark for Crusoe Managed Inference#61

feat: multi-model comparison benchmark for Crusoe Managed Inference#61
Sakshi3027 wants to merge 1 commit into
crusoecloud:mainfrom
Sakshi3027:feat/model-comparison-crusoe

Sakshi3027 commented Jun 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Sakshi3027 commented Jun 14, 2026

What this adds

Why it's useful

Testing

Related contributions

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant