This project is a public-facing RAG application for the Chicago FY2026 budget documents. It lets users ask plain-English questions about the Annual Appropriation Ordinance and the Grant Details Ordinance, then returns answers with page-level citations and direct links back to the source PDFs.
The system is designed around three goals:
- retrieval quality for long civic PDFs
- transparent source grounding with page citations
- practical deployment for public use
- extracts text from the two source PDFs
- chunks and indexes the content with page metadata
- blends BM25 and optional vector retrieval
- optionally reranks results with a cross-encoder
- generates cited answers using OpenAI, Bedrock, or Ollama
- lets users open exact source pages in a built-in viewer
- supports exporting answers to Markdown, JSON, and CSV
- includes evaluation tooling and tuning scripts for retrieval quality
chicago_Annual_Appropriation_Ordinance_2026.pdfchicago_Grant_Details_Ordinance_2026.pdf
- app.py: FastAPI app, routes, SEO pages, export endpoints, analytics hooks, runtime controls
- src/chicago_budget_rag/engine.py: indexing, retrieval, reranking, answer generation, provider abstraction
- templates/: HTML templates for search, guides, disabled page, and rate-limit page
- static/styles.css: shared styling for the web UI
- build_index.py: offline index build entrypoint
- query_rag.py: CLI query tool
- eval_rag.py: retrieval evaluation and tuning harness
- eval/questions.sample.json: starter benchmark set
- docker-compose.yml: containerized runtime configuration
- Dockerfile: image build definition
- .env.openai.example: OpenAI-based runtime template
- .env.bedrock.example: Bedrock-based runtime template for Lightsail or other AWS hosts
- Local launch: docs/local/README.md
- AWS launch: docs/aws/README.md
This project is released under the MIT License. See LICENSE.
POST /redirects to canonicalGET /search?q=...URLs for crawlable search pages.- Curated search pages and guide pages are indexable; arbitrary search pages are
noindex,follow. robots.txtandsitemap.xmlare served by the app.- The public site can be disabled with
SITE_ENABLED=falsewhile keeping health checks alive. - Query export is available through the UI and
GET /export.
The app supports:
- OpenAI
- AWS Bedrock
- Ollama
Provider selection and model configuration are environment-driven.
The app includes:
- canonical search result pages
- guide landing pages
robots.txtsitemap.xml- Open Graph and Twitter metadata
- JSON-LD for home, search, and guide pages
- internal linking through guides and curated searches
Use eval_rag.py to measure hit rate and MRR across a benchmark set and tune BM25/vector blend settings before shipping retrieval changes.