#
swebench
Here are 4 public repositories matching this topic...
Toolkit for measuring Claude Code and Codex performance over time against a baseline using SWEbench-lite dataset **No API key required for Max or Pro subscribers**
-
Updated
Nov 22, 2025 - Python
Wrapper of common LLM evaluation frameworks
evaluation artificial-intelligence llm lm-evaluation-harness vllm lighteval openai-compatible swebench
-
Updated
Apr 2, 2026 - Python
🚀 Generate front-end code from design mockups using a powerful integration of Gemini and Claude within a user-friendly command system.
-
Updated
May 14, 2026 - Python
Improve this page
Add a description, image, and links to the swebench topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the swebench topic, visit your repo's landing page and select "manage topics."