evaluation-pipeline

Here are 4 public repositories matching this topic...

Aivarass / rl-eval-pipeline

Automated quality evaluation for RL agent discoveries using LLM as Judge, rule based validation, and closed loop reward feedback

reinforcement-learning sarsa api-testing data-quality llm-as-judge evaluation-pipeline

Updated Apr 9, 2026
Python

SuperfiedStudd / ai-evals-orchestration

Star

End-to-end AI evals orchestration platform for comparing LLM outputs across providers with transcription, structured logging, human review, and Supabase-backed decision tracking.

gemini openai multi-model transcription human-in-the-loop model-comparison supabase anthropic llm-evaluation ai-evals evaluation-pipeline

Updated Mar 10, 2026
TypeScript

archminor / llm-as-a-judge

Star

LLM-as-a-Judge evaluation pipeline — score and compare LLM outputs with a separate judge LLM

python nlp gemini openai model-evaluation pairwise-comparison llm llm-evaluation llm-as-a-judge evaluation-pipeline

Updated Apr 6, 2026
Python

Harsh-1165 / AgentOS

Star

Production-grade Safe GenAI Agent Orchestrator with intent routing, hallucination guard, tool orchestration, evaluation pipeline, and multi-screen Next.js dashboard.

typescript nextjs ai-safety prisma ai-agent llm genai fullstack-ai agent-orchestrator evaluation-pipeline

Updated Feb 21, 2026
TypeScript

Improve this page

Add a description, image, and links to the evaluation-pipeline topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the evaluation-pipeline topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

evaluation-pipeline

Here are 4 public repositories matching this topic...

Aivarass / rl-eval-pipeline

SuperfiedStudd / ai-evals-orchestration

archminor / llm-as-a-judge

Harsh-1165 / AgentOS

Improve this page

Add this topic to your repo