Natural language to SQL — ask your database anything in plain English
Database analysts shouldn't need to know SQL. They know their business — they should be able to ask their data questions the same way they'd ask a colleague.
This system lets anyone query a PostgreSQL database using plain English. Type a question, get results. No SQL knowledge required. It runs entirely on your local machine using Ollama and Google Gemma — no API keys, no data leaving your infrastructure.
User question
│
▼
Schema RAG Context Builder
├── Introspect live schema (tables, columns, types, FK relationships)
├── Load business rules from config
└── Build grounded prompt
│
▼
Ollama + Google Gemma (local)
└── Generate SQL from grounded context
│
▼
SQL Safety Validator
└── Blocks writes in read-only mode
│
▼
PostgreSQL Execution
│
▼
Results → Streamlit UI / API
The key insight is schema-grounded RAG. The LLM doesn't just get your question — it gets your entire database schema, foreign key relationships, sample data patterns, and business-specific rules (like "prices are stored in cents") injected into its context. This is what makes the SQL accurate and safe, rather than generic and wrong.
Natural Language to SQL Ask questions like "Which customers spent over $500 last month?" or "Show me the top 5 products by revenue this quarter" and get back a valid, executing SQL query and its results.
Schema-Aware RAG Pipeline Before every query, the system introspects the live database schema and builds a rich context block: table definitions, column types, constraints, foreign key relationships, and custom business rules you define in config. The LLM sees all of this before generating SQL.
Read-Only Safety by Default The database connection runs in read-only mode. Any INSERT, UPDATE, DELETE, DROP, or ALTER statement generated by the LLM is caught and blocked before reaching the database. Write access requires explicit opt-in.
Streamlit Dashboard A clean web interface with live schema viewer, example queries, query history, and CSV download for any result set.
Fully Local — No Data Leaves Your Machine Uses Ollama running locally with Google Gemma. No external API calls, no data sent to third-party services.
nl-to-sql-engine/
│
├── app.py ← Streamlit web interface
├── src/
│ ├── nl_to_sql.py ← Core NL→SQL engine (Ollama + Gemma)
│ ├── schema_rag.py ← Schema introspection + RAG context builder
│ └── database.py ← PostgreSQL connection + safe execution
│
├── schema/
│ └── ecommerce.sql ← Sample e-commerce schema + test data
│
├── config/
│ ├── config.yaml ← Business rules, model settings, query limits
│ └── .env.example ← Credentials template
│
└── tests/
└── test_nl_to_sql.py
- Python 3.9+
- PostgreSQL running locally or remotely
- Ollama installed (ollama.ai)
# Install Ollama (Mac)
brew install ollama
# Start Ollama
ollama serve
# Pull the Gemma model
ollama pull gemma:2bgit clone https://github.com/SSG-YERRAMSETTI/nl-to-sql-engine.git
cd nl-to-sql-engine
python -m venv venv && source venv/bin/activate
pip install -r requirements.txt# Load the sample e-commerce schema
psql -U postgres -d your_db < schema/ecommerce.sqlcp config/.env.example .envEdit .env:
DATABASE_URL=postgresql://username:password@localhost:5432/your_dbCustomize config/config.yaml with your business rules:
business_rules:
- "Prices are stored in cents — divide by 100 for dollar amounts"
- "Order status: pending, confirmed, shipped, delivered"streamlit run app.pyOpen http://localhost:8501.
| You Ask | Generated SQL |
|---|---|
| Show top 5 products by revenue this month | SELECT p.name, SUM(oi.quantity * oi.unit_price) / 100.0 AS revenue FROM order_items oi JOIN products p ON oi.product_id = p.id JOIN orders o ON oi.order_id = o.id WHERE o.created_at >= DATE_TRUNC('month', NOW()) GROUP BY p.name ORDER BY revenue DESC LIMIT 5 |
| How many pending orders are older than 2 days? | SELECT COUNT(*) FROM orders WHERE status = 'pending' AND created_at < NOW() - INTERVAL '2 days' |
| Which customers haven't ordered in 30 days? | SELECT c.name, c.email FROM customers c LEFT JOIN orders o ON c.id = o.customer_id AND o.created_at > NOW() - INTERVAL '30 days' WHERE o.id IS NULL |
pytest tests/ -vSatya Sai Ganesh Yerramsetti MS Computer Science — University of North Texas