Skip to content

SSG-YERRAMSETTI/-AI-Powered-Database-Automation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AI-Powered Database Automation

Natural language to SQL — ask your database anything in plain English

Python Ollama Gemma PostgreSQL Streamlit


Database analysts shouldn't need to know SQL. They know their business — they should be able to ask their data questions the same way they'd ask a colleague.

This system lets anyone query a PostgreSQL database using plain English. Type a question, get results. No SQL knowledge required. It runs entirely on your local machine using Ollama and Google Gemma — no API keys, no data leaving your infrastructure.


How It Works

User question
     │
     ▼
Schema RAG Context Builder
  ├── Introspect live schema (tables, columns, types, FK relationships)
  ├── Load business rules from config
  └── Build grounded prompt
     │
     ▼
Ollama + Google Gemma (local)
  └── Generate SQL from grounded context
     │
     ▼
SQL Safety Validator
  └── Blocks writes in read-only mode
     │
     ▼
PostgreSQL Execution
     │
     ▼
Results → Streamlit UI / API

The key insight is schema-grounded RAG. The LLM doesn't just get your question — it gets your entire database schema, foreign key relationships, sample data patterns, and business-specific rules (like "prices are stored in cents") injected into its context. This is what makes the SQL accurate and safe, rather than generic and wrong.


What It Does

Natural Language to SQL Ask questions like "Which customers spent over $500 last month?" or "Show me the top 5 products by revenue this quarter" and get back a valid, executing SQL query and its results.

Schema-Aware RAG Pipeline Before every query, the system introspects the live database schema and builds a rich context block: table definitions, column types, constraints, foreign key relationships, and custom business rules you define in config. The LLM sees all of this before generating SQL.

Read-Only Safety by Default The database connection runs in read-only mode. Any INSERT, UPDATE, DELETE, DROP, or ALTER statement generated by the LLM is caught and blocked before reaching the database. Write access requires explicit opt-in.

Streamlit Dashboard A clean web interface with live schema viewer, example queries, query history, and CSV download for any result set.

Fully Local — No Data Leaves Your Machine Uses Ollama running locally with Google Gemma. No external API calls, no data sent to third-party services.


Project Structure

nl-to-sql-engine/
│
├── app.py                    ← Streamlit web interface
├── src/
│   ├── nl_to_sql.py          ← Core NL→SQL engine (Ollama + Gemma)
│   ├── schema_rag.py         ← Schema introspection + RAG context builder
│   └── database.py           ← PostgreSQL connection + safe execution
│
├── schema/
│   └── ecommerce.sql         ← Sample e-commerce schema + test data
│
├── config/
│   ├── config.yaml           ← Business rules, model settings, query limits
│   └── .env.example          ← Credentials template
│
└── tests/
    └── test_nl_to_sql.py

Getting Started

Prerequisites

  • Python 3.9+
  • PostgreSQL running locally or remotely
  • Ollama installed (ollama.ai)

1. Install Ollama and pull Gemma

# Install Ollama (Mac)
brew install ollama

# Start Ollama
ollama serve

# Pull the Gemma model
ollama pull gemma:2b

2. Clone and install

git clone https://github.com/SSG-YERRAMSETTI/nl-to-sql-engine.git
cd nl-to-sql-engine

python -m venv venv && source venv/bin/activate
pip install -r requirements.txt

3. Set up the database

# Load the sample e-commerce schema
psql -U postgres -d your_db < schema/ecommerce.sql

4. Configure

cp config/.env.example .env

Edit .env:

DATABASE_URL=postgresql://username:password@localhost:5432/your_db

Customize config/config.yaml with your business rules:

business_rules:
  - "Prices are stored in cents — divide by 100 for dollar amounts"
  - "Order status: pending, confirmed, shipped, delivered"

5. Run

streamlit run app.py

Open http://localhost:8501.


Example Queries

You Ask Generated SQL
Show top 5 products by revenue this month SELECT p.name, SUM(oi.quantity * oi.unit_price) / 100.0 AS revenue FROM order_items oi JOIN products p ON oi.product_id = p.id JOIN orders o ON oi.order_id = o.id WHERE o.created_at >= DATE_TRUNC('month', NOW()) GROUP BY p.name ORDER BY revenue DESC LIMIT 5
How many pending orders are older than 2 days? SELECT COUNT(*) FROM orders WHERE status = 'pending' AND created_at < NOW() - INTERVAL '2 days'
Which customers haven't ordered in 30 days? SELECT c.name, c.email FROM customers c LEFT JOIN orders o ON c.id = o.customer_id AND o.created_at > NOW() - INTERVAL '30 days' WHERE o.id IS NULL

Running Tests

pytest tests/ -v

Author

Satya Sai Ganesh Yerramsetti MS Computer Science — University of North Texas

LinkedIn GitHub

⭐ if this saved you from writing SQL

About

AI-Powered Database Automation Natural language to SQL

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages