GradeLens — AI Grading Assistant

A prototype chat assistant for professors reviewing AI-generated thesis grades. Built with FastAPI + LangGraph.

The assistant is explanation-first: it answers questions about stored grading artifacts inside a single thesis conversation. It does not re-grade, finalise grades, or make pass/fail recommendations. Every response includes a grounding status, version ID, and citations back to the grading data.

Demo: here

Requirements

Python 3.11+
UV (package manager — see install instructions below)

No API key is required. The app runs fully in deterministic mode without one. Set ANTHROPIC_API_KEY or OPENAI_API_KEY to enable LLM-generated answers.

Setup

Install UV

# macOS / Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

# Windows (PowerShell)
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

Install dependencies

git clone <repo-url>
cd GradeLens

uv sync               # creates .venv and installs all production dependencies
uv sync --extra dev   # also installs pytest

Configure environment

cp .env.example .env
# Optional: set ANTHROPIC_API_KEY or OPENAI_API_KEY in .env

Running the app

Both the backend and frontend must be running for the full UI experience.

Backend (FastAPI)

uv run uvicorn app.main:app --reload

API available at http://localhost:8000
Interactive docs at http://localhost:8000/docs

Frontend (Streamlit)

uv run streamlit run frontend/app.py

Opens at http://localhost:8501
Three-panel layout: sidebar (thesis selector, stage scores), centre (chat), right (citations/evidence).

Offline evals — no server or API key needed

uv run python -m app.evals.eval_runner

Runs 10 representative professor questions through the full graph and checks 42 properties across all intent types. Exits with code 1 on any failure. Run this after changes to router.py, nodes.py, or llm_client.py.

Tests

uv run pytest tests/ -v

Project structure

GradeLens/
├── app/
│   ├── agent/
│   │   ├── graph.py          # LangGraph StateGraph definition
│   │   ├── nodes.py          # All graph node functions
│   │   ├── router.py         # Deterministic intent classifier (keyword/regex)
│   │   ├── llm_client.py     # AnthropicClient / OpenAIClient / build_llm_client()
│   │   └── state.py          # AgentState TypedDict + initial_state()
│   ├── api/
│   │   └── routes.py         # FastAPI route handlers
│   ├── data/                 # JSON fixtures (loaded at startup — no DB writes)
│   │   ├── theses.json
│   │   └── grades/
│   │       ├── thesis_001.json   # 2 grade versions + evidence + override event
│   │       └── thesis_002.json
│   ├── evals/
│   │   ├── eval_runner.py    # Offline eval script (10 cases, 42 checks)
│   │   └── deterministic_client.py  # Offline answer generator (evals/tests only)
│   ├── models/
│   │   └── schemas.py        # Pydantic models (GradeVersion, Citation, RoutingDecision, …)
│   ├── storage/
│   │   ├── data_store.py     # In-memory DataStore loaded from JSON fixtures
│   │   └── database.py       # SQLAlchemy async SQLite for conversations + audit log
│   └── main.py               # FastAPI app factory + startup
├── frontend/
│   └── app.py                # Streamlit UI
├── tests/
│   └── test_routing_model.py # Unit tests for intent classification + fetch nodes
├── CLAUDE.md                 # Guidance for Claude Code
├── pyproject.toml
└── .env.example

Architecture

POST /theses/{id}/chat
        │
        ▼
  LangGraph StateGraph
        │
  ┌──────────────────────────────┐
  │ resolve_version              │  loads Thesis + active GradeVersion
  │ classify_intent              │  → grade_explanation | version_comparison |
  │                              │    rubric_lookup | override_request | unsupported
  │ fetch_explain_context        │  focal stage(s) + evidence  (grade_explanation)
  │ fetch_rubric_context         │  focal stage(s) + evidence  (rubric_lookup)
  │ fetch_comparison_context     │  version diff + override metadata
  │ call_override_service        │  stub override handler — bypasses generation
  │ generate_answer              │  LLM synthesizer (deterministic offline fallback)
  │ validate_grounding           │  version ref · score consistency · citations · isolation
  │ safe_fallback                │  if blocked: replace with raw structured data
  │ finalize                     │  promotes to final_answer
  │ persist                      │  writes to SQLite (messages + audit log)
  └──────────────────────────────┘
        │
        ▼
  ChatResponse: answer · grounding_status · citations · warnings · intent

Key design decisions

Decision	Rationale
LangGraph StateGraph	Routing logic is visible in graph structure rather than buried in if/else chains
Deterministic keyword router	No LLM needed for intent classification; unsupported sub-asks are tracked explicitly on the `RoutingDecision`
Grounding validator as a dedicated node	Cannot be skipped; `safe_fallback` is mandatory on `blocked` — the assistant never hallucinates
`DeterministicClient` in `app/evals/`	Offline answer generation used for evals and tests; never imported in production code
In-memory DataStore + SQLite split	Immutable grading fixtures stay fast and simple; only mutable conversation history hits the DB

Intent types

Intent	Trigger	Route
`grade_explanation`	Stage aliases ("methodology"), "why", "explain", "scored", evidence keywords	`fetch_explain_context`
`version_comparison`	"compare", "what changed", "before/after the override", version refs	`fetch_comparison_context`
`rubric_lookup`	"rubric", "grading criteria", "how is it graded"	`fetch_rubric_context`
`override_request`	"want to change the score", "increase/decrease", "override stage"	`call_override_service`
`unsupported`	Pass/fail recommendations, personal grading opinions	`generate_answer` (refusal)

override_request is evaluated before unsupported so a professor asking to change a score is never treated as an opinion query.

Sample data

One thesis: "Deep Learning Approaches to Climate Prediction"

Two grade versions:

gv_001 (superseded) — original pipeline output, Stage 3 scored 9/20
gv_002 (active) — after Prof. Chen override, Stage 3 scored 14/20

Stage 3 (Methodology) is the intentional weak point, with linked evidence for realistic retrieval demos.

Example API calls

# Health check
curl http://localhost:8000/health

# Active grade for a thesis
curl http://localhost:8000/theses/thesis_001/active-grade | python -m json.tool

# List grade versions
curl http://localhost:8000/theses/thesis_001/grade-versions | python -m json.tool

# Chat — score explanation
curl -X POST http://localhost:8000/theses/thesis_001/chat \
  -H "Content-Type: application/json" \
  -d '{"message": "Why was methodology scored low?", "professor_id": "prof_chen"}'

# Chat — evidence lookup
curl -X POST http://localhost:8000/theses/thesis_001/chat \
  -H "Content-Type: application/json" \
  -d '{"message": "Show me the evidence supporting the methodology assessment.", "professor_id": "prof_chen"}'

# Chat — version comparison
curl -X POST http://localhost:8000/theses/thesis_001/chat \
  -H "Content-Type: application/json" \
  -d '{"message": "What changed between version 1 and version 2?", "professor_id": "prof_chen"}'

# Chat — rubric lookup
curl -X POST http://localhost:8000/theses/thesis_001/chat \
  -H "Content-Type: application/json" \
  -d '{"message": "What criteria are used to grade the methodology stage?", "professor_id": "prof_chen"}'

# Chat — override request
curl -X POST http://localhost:8000/theses/thesis_001/chat \
  -H "Content-Type: application/json" \
  -d '{"message": "I want to change the methodology score.", "professor_id": "prof_chen"}'

# Chat — unsupported (out-of-scope)
curl -X POST http://localhost:8000/theses/thesis_001/chat \
  -H "Content-Type: application/json" \
  -d '{"message": "What final grade would you personally give this thesis?", "professor_id": "prof_chen"}'

# Resume a conversation
curl -X POST http://localhost:8000/theses/thesis_001/chat \
  -H "Content-Type: application/json" \
  -d '{"message": "What about stage 4?", "professor_id": "prof_chen", "conversation_id": "conv_001"}'

# Read conversation history
curl http://localhost:8000/conversations/conv_001 | python -m json.tool

# Compare versions directly
curl -X POST http://localhost:8000/theses/thesis_001/compare-versions \
  -H "Content-Type: application/json" \
  -d '{"old_version_id": "gv_001", "new_version_id": "gv_002"}'

Chat response fields

Field	Description
`answer`	The grounded explanation
`grade_version_id_used`	Always the active version unless comparison
`grounding_status`	`grounded` · `blocked` · `allowed_override`
`citations`	List of `{source_type, reference_id, label, excerpt}` — source types: `stage`, `evidence`, `version`
`warnings`	Grounding issues found (empty if clean)
`intent`	Classified question type
`retrieved_context_summary`	Short description of what was fetched

Known limitations

No real PDF parsing — thesis text is hardcoded fixture data
No authentication — professor_id is a free string in requests
The persist node runs DB writes in a background thread (workaround for LangGraph sync nodes in async FastAPI — sufficient for prototype)
LangGraph's built-in checkpointer is not used — conversation history is written by a custom persist node directly to SQLite via SQLAlchemy
call_override_service is a stub — it reads existing override history but does not write new overrides

Potential next steps

Add vector embeddings for semantic evidence retrieval (e.g. sentence-transformers)
Wire LangGraph's built-in checkpointer to the SQLite session factory
Add professor authentication (JWT or institution SSO)
Implement a real PDF parsing pipeline for thesis ingestion
Implement the override service write path (currently stub-only)
Add a "helpful / not grounded" feedback button (writes to audit log)
Expose the LangGraph graph visualisation at /graph for debugging

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
app		app
frontend		frontend
resources		resources
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.python-version		.python-version
CLAUDE.md		CLAUDE.md
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GradeLens — AI Grading Assistant

Demo: here

Requirements

Setup

Install UV

Install dependencies

Configure environment

Running the app

Backend (FastAPI)

Frontend (Streamlit)

Offline evals — no server or API key needed

Tests

Project structure

Architecture

Key design decisions

Intent types

Sample data

Example API calls

Chat response fields

Known limitations

Potential next steps

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GradeLens — AI Grading Assistant

Demo: here

Requirements

Setup

Install UV

Install dependencies

Configure environment

Running the app

Backend (FastAPI)

Frontend (Streamlit)

Offline evals — no server or API key needed

Tests

Project structure

Architecture

Key design decisions

Intent types

Sample data

Example API calls

Chat response fields

Known limitations

Potential next steps

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages