An AI-powered data exploration agent that answers natural language questions over structured databases using LangGraph and LLM technology.
Delivery Cadet is an intelligent SQL agent that converts natural language questions into SQL queries, executes them against a PostgreSQL database, and returns results in a conversational format through a ChatGPT-style interface.
Key capabilities include:
- Natural language to SQL conversion with automatic retry
- Interactive data visualisation (bar, line, pie, scatter, area charts)
- In-browser Python execution for advanced analytics
- Automated ETL pipeline with PII detection and masking
- Dataset-agnostic design for easy adaptation
📚 For detailed features, tech stack, and system architecture, see the Architecture Guide.
Before running the application for the first time, you MUST complete the interactive data pipeline setup.
📖 See SETUP_GUIDE.md for step-by-step instructions with exact inputs needed for the provided dataset.
The guide includes:
- Primary & Foreign Key configuration
- SQL transformations for data integrity
- PII detection setup
Simply follow the guide when running
./start.shfor the first time.
- Python 3.12 with Conda/Miniconda (Install Guide)
- Node.js 18+
- Docker & Docker Compose (for PostgreSQL database)
- pnpm (installed via npm)
- Cerebras API key (free tier at Cerebras Cloud)
- LangSmith API key (optional, for tracing - free tier at LangSmith)
⚠️ Development Environment: This project was developed and tested with Conda on macOS. Windows compatibility has been verified, but if you encounter any issues, please contact tedbae0504@gmail.com.
Windows Users: This project uses Bash scripts. Please use Git Bash (included with Git for Windows). You may need to run
chmod +x start.shbefore executing the start script.
git clone https://github.com/DachshundMango/delivery-cadet-challenge.git
cd delivery-cadet-challengeCopy .env.example to .env and configure your credentials:
# Mac/Linux/Git Bash
cp .env.example .env
# Windows CMD (if not using Git Bash)
copy .env.example .envThen edit .env with your actual values:
# Cerebras API Key (Required)
CEREBRAS_API_KEY=your_cerebras_api_key_here
# LLM Model Configuration
LLM_MODEL=llama-3.3-70b
# LangSmith Settings (Required for trace visualisation)
LANGCHAIN_TRACING_V2=true
LANGCHAIN_API_KEY=your_langsmith_api_key_here
LANGCHAIN_PROJECT=delivery-cadet-challenge
# Database Settings (Required)
DB_USER=myuser
DB_PASSWORD=mypassword
DB_NAME=delivery_db
# PgAdmin Settings (Optional)
PGADMIN_DEFAULT_EMAIL=admin@admin.com
PGADMIN_DEFAULT_PASSWORD=adminRequired API Keys:
- Cerebras API Key: Get free tier at https://cloud.cerebras.ai
- LangSmith API Key: Get free tier at https://smith.langchain.com
Using Conda:
# Create conda environment with Python 3.12
conda create -n cadet python=3.12
conda activate cadet
# Install dependencies via pip
pip install -r requirements.txtNote: The
requirements.txtspecifies core packages with pinned versions. Pip automatically installs all transitive dependencies with compatible versions.
# Install pnpm if not already installed
npm install -g pnpm
cd frontend
pnpm install
cd ..Note: This project uses pnpm as specified in
package.json. Using npm or yarn may cause dependency conflicts.
Copy the frontend environment example file:
# Mac/Linux/Git Bash
cp frontend/.env.example frontend/.env.local
# Windows CMD (if not using Git Bash)
copy frontend\.env.example frontend\.env.localThe default values are suitable for local development. You only need to change them if:
- Your LangGraph server runs on a different port
- You've customized the agent ID in
langgraph.json
# From root directory
docker-compose up -dThis will start:
- PostgreSQL on port 5432
- PgAdmin on port 8080 (http://localhost:8080)
⚠️ IMPORTANT FOR EVALUATORS: While this repository includes pre-generated configuration files in src/config/ as reference, please backup these files and run the full setup to experience the complete data pipeline:
Recommended first-time setup:
./start.sh --resetThis will:
- Drop all tables and clear config files
- Run the interactive 6-step pipeline (profiler → relationship discovery → load data → integrity check → transform → schema generation with PII detection)
- Load the complete dataset into your local database
- Generate fresh configuration files
Alternative - Manual setup:
python src/setup.pyAfter setup completes, verify your database schema:
Check the generated schema documentation at src/config/schema_info.md to see:
- All table structures with column types
- Primary and Foreign Key relationships
- PII-masked columns (color-coded)
- Sample data for each table
For detailed pipeline documentation, see the Architecture Guide or SETUP_GUIDE.md for step-by-step instructions.
./start.shThis automatically starts both LangGraph server (port 2024) and frontend (port 3000), then opens http://localhost:3000 in your browser. On first run, it will automatically set up the database.
Note: Windows users should run this command in Git Bash. If you encounter permission issues, run
chmod +x start.shfirst.
Reset and start fresh:
./start.sh --resetThis drops all tables, clears config files, and re-runs the setup pipeline.
If the start script fails, you can run components separately:
1. Data Setup (first time only):
python src/setup.py2. CLI Mode (backend testing):
python src/cli.py3. Full UI Mode:
Terminal 1 - Backend:
conda activate cadet
langgraph devTerminal 2 - Frontend:
conda activate cadet
cd frontend
pnpm devThen open http://localhost:3000 in your browser.
- "What are the top 3 most popular products by total quantity sold?"
- "How many customers are there in each continent?"
- "What payment methods are used and how often?"
- "Which country generates the highest total revenue?"
- "Who are the top 5 customers by total spending?"
- "Which franchises have received the most customer reviews?"
- "Show total revenue by supplier ingredient. Which ingredients are associated with the highest-selling franchises?"
- "Analyse daily sales trends over time."
- "Compare revenue by franchise size (S, M, L, XL, XXL) with average transaction values."
- "For each country, rank the products by total revenue and show only the top-selling product in each country."
- "Calculate the running cumulative revenue per day."
- "For each transaction, calculate how its total price compares to the average transaction value for that franchise."
If you want to start completely fresh:
./start.sh --resetThis will drop all database tables, delete config files, and re-run the entire pipeline.
- Schema not found: Run
python src/setup.pyor./start.sh --reset - Database connection error: Check if PostgreSQL is running with
docker-compose ps - Port conflict: Edit
langgraph.jsonto use a different port, or for frontend usePORT=3001beforepnpm dev - Permission denied (./start.sh): Run
chmod +x start.shfirst - Module not found errors: Make sure you've installed all dependencies with
pip install -r requirements.txtin activated conda environment - Windows issues: Ensure you're using Git Bash with Conda activated. Try running components separately (see Alternative: Manual Start above)
For detailed system documentation, see the docs/ folder:
- Architecture Guide - System design, LangGraph workflow, project structure, and component details
- Error Handling Guide - SQL validation, error types, retry mechanism, and debugging
- Setup Guide - Step-by-step interactive data pipeline setup
If you're setting up this project for the first time or encountering data-related issues:
📖 Refer to SETUP_GUIDE.md for the complete interactive data pipeline setup guide.
This guide provides exact inputs for the challenge dataset and explains each step in detail.
This project is developed as part of the Delivery Cadet Challenge 2026.
- Built with LangGraph by LangChain
- UI template based on LangGraph Agent Chat UI
- Powered by Cerebras for fast LLM inference