A prototype chat assistant for professors reviewing AI-generated thesis grades. Built with FastAPI + LangGraph.
The assistant is explanation-first: it answers questions about stored grading artifacts inside a single thesis conversation. It does not re-grade, finalise grades, or make pass/fail recommendations. Every response includes a grounding status, version ID, and citations back to the grading data.
Demo: here
- Python 3.11+
- UV (package manager — see install instructions below)
No API key is required. The app runs fully in deterministic mode without one. Set ANTHROPIC_API_KEY or OPENAI_API_KEY to enable LLM-generated answers.
# macOS / Linux
curl -LsSf https://astral.sh/uv/install.sh | sh
# Windows (PowerShell)
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"git clone <repo-url>
cd GradeLens
uv sync # creates .venv and installs all production dependencies
uv sync --extra dev # also installs pytestcp .env.example .env
# Optional: set ANTHROPIC_API_KEY or OPENAI_API_KEY in .envBoth the backend and frontend must be running for the full UI experience.
uv run uvicorn app.main:app --reloadAPI available at http://localhost:8000
Interactive docs at http://localhost:8000/docs
uv run streamlit run frontend/app.pyOpens at http://localhost:8501
Three-panel layout: sidebar (thesis selector, stage scores), centre (chat), right (citations/evidence).
uv run python -m app.evals.eval_runnerRuns 10 representative professor questions through the full graph and checks 42 properties across all intent types. Exits with code 1 on any failure. Run this after changes to router.py, nodes.py, or llm_client.py.
uv run pytest tests/ -vGradeLens/
├── app/
│ ├── agent/
│ │ ├── graph.py # LangGraph StateGraph definition
│ │ ├── nodes.py # All graph node functions
│ │ ├── router.py # Deterministic intent classifier (keyword/regex)
│ │ ├── llm_client.py # AnthropicClient / OpenAIClient / build_llm_client()
│ │ └── state.py # AgentState TypedDict + initial_state()
│ ├── api/
│ │ └── routes.py # FastAPI route handlers
│ ├── data/ # JSON fixtures (loaded at startup — no DB writes)
│ │ ├── theses.json
│ │ └── grades/
│ │ ├── thesis_001.json # 2 grade versions + evidence + override event
│ │ └── thesis_002.json
│ ├── evals/
│ │ ├── eval_runner.py # Offline eval script (10 cases, 42 checks)
│ │ └── deterministic_client.py # Offline answer generator (evals/tests only)
│ ├── models/
│ │ └── schemas.py # Pydantic models (GradeVersion, Citation, RoutingDecision, …)
│ ├── storage/
│ │ ├── data_store.py # In-memory DataStore loaded from JSON fixtures
│ │ └── database.py # SQLAlchemy async SQLite for conversations + audit log
│ └── main.py # FastAPI app factory + startup
├── frontend/
│ └── app.py # Streamlit UI
├── tests/
│ └── test_routing_model.py # Unit tests for intent classification + fetch nodes
├── CLAUDE.md # Guidance for Claude Code
├── pyproject.toml
└── .env.example
POST /theses/{id}/chat
│
▼
LangGraph StateGraph
│
┌──────────────────────────────┐
│ resolve_version │ loads Thesis + active GradeVersion
│ classify_intent │ → grade_explanation | version_comparison |
│ │ rubric_lookup | override_request | unsupported
│ fetch_explain_context │ focal stage(s) + evidence (grade_explanation)
│ fetch_rubric_context │ focal stage(s) + evidence (rubric_lookup)
│ fetch_comparison_context │ version diff + override metadata
│ call_override_service │ stub override handler — bypasses generation
│ generate_answer │ LLM synthesizer (deterministic offline fallback)
│ validate_grounding │ version ref · score consistency · citations · isolation
│ safe_fallback │ if blocked: replace with raw structured data
│ finalize │ promotes to final_answer
│ persist │ writes to SQLite (messages + audit log)
└──────────────────────────────┘
│
▼
ChatResponse: answer · grounding_status · citations · warnings · intent
| Decision | Rationale |
|---|---|
| LangGraph StateGraph | Routing logic is visible in graph structure rather than buried in if/else chains |
| Deterministic keyword router | No LLM needed for intent classification; unsupported sub-asks are tracked explicitly on the RoutingDecision |
| Grounding validator as a dedicated node | Cannot be skipped; safe_fallback is mandatory on blocked — the assistant never hallucinates |
DeterministicClient in app/evals/ |
Offline answer generation used for evals and tests; never imported in production code |
| In-memory DataStore + SQLite split | Immutable grading fixtures stay fast and simple; only mutable conversation history hits the DB |
| Intent | Trigger | Route |
|---|---|---|
grade_explanation |
Stage aliases ("methodology"), "why", "explain", "scored", evidence keywords | fetch_explain_context |
version_comparison |
"compare", "what changed", "before/after the override", version refs | fetch_comparison_context |
rubric_lookup |
"rubric", "grading criteria", "how is it graded" | fetch_rubric_context |
override_request |
"want to change the score", "increase/decrease", "override stage" | call_override_service |
unsupported |
Pass/fail recommendations, personal grading opinions | generate_answer (refusal) |
override_request is evaluated before unsupported so a professor asking to change a score is never treated as an opinion query.
One thesis: "Deep Learning Approaches to Climate Prediction"
Two grade versions:
gv_001(superseded) — original pipeline output, Stage 3 scored 9/20gv_002(active) — after Prof. Chen override, Stage 3 scored 14/20
Stage 3 (Methodology) is the intentional weak point, with linked evidence for realistic retrieval demos.
# Health check
curl http://localhost:8000/health
# Active grade for a thesis
curl http://localhost:8000/theses/thesis_001/active-grade | python -m json.tool
# List grade versions
curl http://localhost:8000/theses/thesis_001/grade-versions | python -m json.tool
# Chat — score explanation
curl -X POST http://localhost:8000/theses/thesis_001/chat \
-H "Content-Type: application/json" \
-d '{"message": "Why was methodology scored low?", "professor_id": "prof_chen"}'
# Chat — evidence lookup
curl -X POST http://localhost:8000/theses/thesis_001/chat \
-H "Content-Type: application/json" \
-d '{"message": "Show me the evidence supporting the methodology assessment.", "professor_id": "prof_chen"}'
# Chat — version comparison
curl -X POST http://localhost:8000/theses/thesis_001/chat \
-H "Content-Type: application/json" \
-d '{"message": "What changed between version 1 and version 2?", "professor_id": "prof_chen"}'
# Chat — rubric lookup
curl -X POST http://localhost:8000/theses/thesis_001/chat \
-H "Content-Type: application/json" \
-d '{"message": "What criteria are used to grade the methodology stage?", "professor_id": "prof_chen"}'
# Chat — override request
curl -X POST http://localhost:8000/theses/thesis_001/chat \
-H "Content-Type: application/json" \
-d '{"message": "I want to change the methodology score.", "professor_id": "prof_chen"}'
# Chat — unsupported (out-of-scope)
curl -X POST http://localhost:8000/theses/thesis_001/chat \
-H "Content-Type: application/json" \
-d '{"message": "What final grade would you personally give this thesis?", "professor_id": "prof_chen"}'
# Resume a conversation
curl -X POST http://localhost:8000/theses/thesis_001/chat \
-H "Content-Type: application/json" \
-d '{"message": "What about stage 4?", "professor_id": "prof_chen", "conversation_id": "conv_001"}'
# Read conversation history
curl http://localhost:8000/conversations/conv_001 | python -m json.tool
# Compare versions directly
curl -X POST http://localhost:8000/theses/thesis_001/compare-versions \
-H "Content-Type: application/json" \
-d '{"old_version_id": "gv_001", "new_version_id": "gv_002"}'| Field | Description |
|---|---|
answer |
The grounded explanation |
grade_version_id_used |
Always the active version unless comparison |
grounding_status |
grounded · blocked · allowed_override |
citations |
List of {source_type, reference_id, label, excerpt} — source types: stage, evidence, version |
warnings |
Grounding issues found (empty if clean) |
intent |
Classified question type |
retrieved_context_summary |
Short description of what was fetched |
- No real PDF parsing — thesis text is hardcoded fixture data
- No authentication —
professor_idis a free string in requests - The
persistnode runs DB writes in a background thread (workaround for LangGraph sync nodes in async FastAPI — sufficient for prototype) - LangGraph's built-in checkpointer is not used — conversation history is written by a custom
persistnode directly to SQLite via SQLAlchemy call_override_serviceis a stub — it reads existing override history but does not write new overrides
- Add vector embeddings for semantic evidence retrieval (e.g.
sentence-transformers) - Wire LangGraph's built-in checkpointer to the SQLite session factory
- Add professor authentication (JWT or institution SSO)
- Implement a real PDF parsing pipeline for thesis ingestion
- Implement the override service write path (currently stub-only)
- Add a "helpful / not grounded" feedback button (writes to audit log)
- Expose the LangGraph graph visualisation at
/graphfor debugging