Skip to content

focused-dot-io/07-evaluation-testing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Q&A Agent Evaluation and Testing

A customer support Q&A agent with a comprehensive evaluation suite demonstrating LangSmith-powered testing patterns. Includes deterministic evaluators (keyword coverage, tool usage, hallucination detection), LLM-as-judge evaluators (correctness, tone), trajectory evaluation via agentevals, and regression detection across experiment runs.

Prerequisites

Setup

python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
cp .env.example .env
# Edit .env with your API keys

Environment Variables

Variable Required Description
ANTHROPIC_API_KEY Yes Anthropic API key for Claude
LANGSMITH_API_KEY Yes LangSmith API key for tracing and evals
LANGSMITH_TRACING Yes Set to true to enable tracing

Running

python qa_agent.py

To run evaluations:

python evals.py

Article

Evaluation and Testing for LangGraph Agents

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages