A customer support Q&A agent with a comprehensive evaluation suite demonstrating LangSmith-powered testing patterns. Includes deterministic evaluators (keyword coverage, tool usage, hallucination detection), LLM-as-judge evaluators (correctness, tone), trajectory evaluation via agentevals, and regression detection across experiment runs.
- Python 3.11+
- Anthropic API key
- LangSmith API key
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
cp .env.example .env
# Edit .env with your API keys| Variable | Required | Description |
|---|---|---|
ANTHROPIC_API_KEY |
Yes | Anthropic API key for Claude |
LANGSMITH_API_KEY |
Yes | LangSmith API key for tracing and evals |
LANGSMITH_TRACING |
Yes | Set to true to enable tracing |
python qa_agent.pyTo run evaluations:
python evals.py