Skip to content

Commit 34d3503

Browse files
Initial setup of the project
0 parents  commit 34d3503

28 files changed

Lines changed: 2120 additions & 0 deletions

.env.example

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
DEBUG=True
2+
OPENAI_API_KEY=your-openai-api-key-here
3+
SENTENCE_TRANSFORMER_MODEL=all-MiniLM-L6-v2
4+
5+
PGVECTOR_DB_NAME=llm_analytics
6+
PGVECTOR_DB_USER=postgres
7+
PGVECTOR_DB_PASSWORD=postgres
8+
PGVECTOR_DB_HOST=localhost
9+
PGVECTOR_DB_PORT=5432
10+
11+
DJANGO_SECRET_KEY=your-secret-key-here

SPEC.md

Lines changed: 223 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,223 @@
1+
# LLM Natural Language SQL Analytics - Specification
2+
3+
## 1. Project Overview
4+
5+
**Project Name:** LLM SQL Analytics
6+
**Project Type:** Django Web Application
7+
**Core Functionality:** A system that allows users to query structured databases using natural language. The system converts natural language questions into SQL queries, executes them against connected databases, and returns results. Includes semantic search over database schema documentation using pgvector.
8+
9+
**Target Users:** Business analysts, data scientists, and non-technical users who need to query databases without writing SQL.
10+
11+
---
12+
13+
## 2. Technology Stack
14+
15+
- **Framework:** Django >=4.2, <5.0
16+
- **Database Driver:** psycopg2-binary >=2.9.9
17+
- **Vector Storage:** pgvector >=0.1.8
18+
- **LLM Framework:** langchain >=0.1.0, langchain-core >=0.1.0
19+
- **Text Processing:** langchain-text-splitters >=0.3.0
20+
- **LLM Provider:** langchain-openai >=0.1.0, openai >=1.0.0
21+
- **Embeddings:** sentence-transformers >=2.2.2
22+
- **Environment:** python-dotenv >=1.0.0
23+
- **PDF Parsing:** PyPDF2 >=3.0.0
24+
25+
---
26+
27+
## 3. UI/UX Specification
28+
29+
### Layout Structure
30+
31+
**Header**
32+
- Application title: "LLM SQL Analytics"
33+
- Navigation: Dashboard, Query Interface, Schema Docs, Settings
34+
35+
**Main Content Area**
36+
- Dashboard: Overview of connected databases and recent queries
37+
- Query Interface: Natural language input + SQL preview + results table
38+
- Schema Docs: List of uploaded documents with semantic search
39+
- Settings: Database connection configuration
40+
41+
**Footer**
42+
- Version info and copyright
43+
44+
### Visual Design
45+
46+
**Color Palette**
47+
- Primary: `#1E3A5F` (Deep Navy)
48+
- Secondary: `#3D5A80` (Slate Blue)
49+
- Accent: `#48CAE4` (Cyan)
50+
- Background: `#F8F9FA` (Light Gray)
51+
- Surface: `#FFFFFF` (White)
52+
- Text Primary: `#212529` (Dark Gray)
53+
- Text Secondary: `#6C757D` (Medium Gray)
54+
- Success: `#28A745` (Green)
55+
- Error: `#DC3545` (Red)
56+
- Warning: `#FFC107` (Yellow)
57+
58+
**Typography**
59+
- Headings: "Inter", sans-serif, 600 weight
60+
- Body: "Inter", sans-serif, 400 weight
61+
- Monospace (SQL): "JetBrains Mono", monospace
62+
63+
**Spacing**
64+
- Base unit: 8px
65+
- Container max-width: 1200px
66+
- Card padding: 24px
67+
- Section margin: 32px
68+
69+
### Components
70+
71+
**Query Input Card**
72+
- Textarea for natural language input
73+
- "Generate SQL" button (primary style)
74+
- "Execute" button (accent style)
75+
- SQL preview panel with syntax highlighting
76+
77+
**Results Table**
78+
- Sortable columns
79+
- Pagination for large results
80+
- Export to CSV option
81+
82+
**Schema Doc Card**
83+
- Document name
84+
- Upload date
85+
- Preview snippet
86+
- Semantic search input
87+
88+
**Database Connection Card**
89+
- Connection name
90+
- Database type indicator
91+
- Status indicator (connected/disconnected)
92+
- Edit/Delete actions
93+
94+
---
95+
96+
## 4. Functionality Specification
97+
98+
### Core Features
99+
100+
#### 4.1 Database Connection Management
101+
- Add PostgreSQL database connections with connection details
102+
- Test database connectivity
103+
- Store connection configurations securely
104+
- Support multiple database connections
105+
106+
#### 4.2 Schema Introspection
107+
- Automatically fetch table names and column information
108+
- Store schema in PostgreSQL with pgvector for semantic search
109+
- Support uploading schema documentation (PDF)
110+
111+
#### 4.3 Natural Language to SQL
112+
- Accept natural language queries
113+
- Use LLM to convert natural language to SQL
114+
- Support SELECT, INSERT, UPDATE, DELETE operations
115+
- Include validation and error handling
116+
117+
#### 4.4 Semantic Schema Search
118+
- Embed schema documentation using sentence-transformers
119+
- Store embeddings in pgvector
120+
- Search schema semantically to find relevant tables/columns
121+
122+
#### 4.5 Query Execution
123+
- Execute generated SQL against target database
124+
- Return formatted results
125+
- Handle errors gracefully with user-friendly messages
126+
127+
#### 4.6 Query History
128+
- Store all queries with timestamps
129+
- View past queries and their results
130+
- Re-run previous queries
131+
132+
### User Interactions
133+
134+
1. **Add Database:** User enters connection details → System tests connection → Saves if successful
135+
2. **Ask Question:** User types natural language → System generates SQL → User reviews → Executes → Views results
136+
3. **Search Schema:** User enters search term → System returns semantically similar schema elements
137+
4. **Upload Docs:** User uploads PDF → System extracts text → Embeds and stores in vector DB
138+
139+
### Data Handling
140+
141+
- Query history stored in SQLite (local)
142+
- Database connections stored encrypted
143+
- Schema embeddings stored in PostgreSQL via pgvector
144+
145+
### Edge Cases
146+
147+
- Invalid SQL generated → Show error with suggestion to retry
148+
- Database connection failed → Show connection error with troubleshooting tips
149+
- Empty results → Show "No results found" message
150+
- Large result sets → Paginate results
151+
- Network timeout → Show timeout error with retry option
152+
153+
---
154+
155+
## 5. Database Schema
156+
157+
### Django Models
158+
159+
```
160+
DatabaseConnection
161+
- id: UUID (primary key)
162+
- name: CharField (unique)
163+
- host: CharField
164+
- port: IntegerField
165+
- database: CharField
166+
- username: CharField
167+
- password: CharField (encrypted)
168+
- created_at: DateTimeField
169+
- updated_at: DateTimeField
170+
- is_active: BooleanField
171+
172+
SchemaDocument
173+
- id: UUID (primary key)
174+
- connection: ForeignKey(DatabaseConnection)
175+
- name: CharField
176+
- content: TextField
177+
- embedding: VectorField (pgvector)
178+
- uploaded_at: DateTimeField
179+
180+
QueryHistory
181+
- id: UUID (primary key)
182+
- connection: ForeignKey(DatabaseConnection)
183+
- natural_language: TextField
184+
- generated_sql: TextField
185+
- executed_sql: TextField (nullable)
186+
- result: JSONField (nullable)
187+
- error: TextField (nullable)
188+
- created_at: DateTimeField
189+
```
190+
191+
---
192+
193+
## 6. API Endpoints
194+
195+
| Method | Endpoint | Description |
196+
|--------|----------|-------------|
197+
| GET | /api/connections/ | List all database connections |
198+
| POST | /api/connections/ | Add new database connection |
199+
| GET | /api/connections/{id}/ | Get connection details |
200+
| PUT | /api/connections/{id}/ | Update connection |
201+
| DELETE | /api/connections/{id}/ | Delete connection |
202+
| POST | /api/connections/{id}/test/ | Test connection |
203+
| GET | /api/connections/{id}/schema/ | Get database schema |
204+
| POST | /api/query/ | Execute natural language query |
205+
| GET | /api/history/ | Get query history |
206+
| POST | /api/docs/ | Upload schema document |
207+
| GET | /api/docs/ | List schema documents |
208+
| POST | /api/docs/search/ | Semantic search in documents |
209+
210+
---
211+
212+
## 7. Acceptance Criteria
213+
214+
1. ✓ User can add PostgreSQL database connections
215+
2. ✓ User can test database connectivity
216+
3. ✓ User can enter natural language queries
217+
4. ✓ System generates valid SQL from natural language
218+
5. ✓ User can execute queries and view results
219+
6. ✓ User can upload PDF schema documentation
220+
7. ✓ System performs semantic search over schema docs
221+
8. ✓ Query history is maintained
222+
9. ✓ Web interface is responsive and functional
223+
10. ✓ Error handling provides helpful feedback

core/__init__.py

Whitespace-only changes.

core/admin.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
from django.contrib import admin
2+
3+
# Register your models here.

core/apps.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
from django.apps import AppConfig
2+
3+
4+
class CoreConfig(AppConfig):
5+
default_auto_field = 'django.db.models.BigAutoField'
6+
name = 'core'

core/migrations/0001_initial.py

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
# Generated by Django 4.2.29 on 2026-03-17 01:54
2+
3+
from django.db import migrations, models
4+
import django.db.models.deletion
5+
import uuid
6+
7+
8+
class Migration(migrations.Migration):
9+
10+
initial = True
11+
12+
dependencies = [
13+
]
14+
15+
operations = [
16+
migrations.CreateModel(
17+
name='DatabaseConnection',
18+
fields=[
19+
('id', models.UUIDField(default=uuid.uuid4, editable=False, primary_key=True, serialize=False)),
20+
('name', models.CharField(max_length=255, unique=True)),
21+
('host', models.CharField(max_length=255)),
22+
('port', models.IntegerField(default=5432)),
23+
('database', models.CharField(max_length=255)),
24+
('username', models.CharField(max_length=255)),
25+
('password', models.CharField(max_length=255)),
26+
('created_at', models.DateTimeField(auto_now_add=True)),
27+
('updated_at', models.DateTimeField(auto_now=True)),
28+
('is_active', models.BooleanField(default=True)),
29+
],
30+
options={
31+
'db_table': 'database_connections',
32+
'ordering': ['-created_at'],
33+
},
34+
),
35+
migrations.CreateModel(
36+
name='SchemaDocument',
37+
fields=[
38+
('id', models.UUIDField(default=uuid.uuid4, editable=False, primary_key=True, serialize=False)),
39+
('name', models.CharField(max_length=255)),
40+
('content', models.TextField()),
41+
('embeddings', models.TextField(blank=True, null=True)),
42+
('uploaded_at', models.DateTimeField(auto_now_add=True)),
43+
('connection', models.ForeignKey(on_delete=django.db.models.deletion.CASCADE, related_name='documents', to='core.databaseconnection')),
44+
],
45+
options={
46+
'db_table': 'schema_documents',
47+
'ordering': ['-uploaded_at'],
48+
},
49+
),
50+
migrations.CreateModel(
51+
name='QueryHistory',
52+
fields=[
53+
('id', models.UUIDField(default=uuid.uuid4, editable=False, primary_key=True, serialize=False)),
54+
('natural_language', models.TextField()),
55+
('generated_sql', models.TextField()),
56+
('executed_sql', models.TextField(blank=True, null=True)),
57+
('result', models.JSONField(blank=True, null=True)),
58+
('error', models.TextField(blank=True, null=True)),
59+
('created_at', models.DateTimeField(auto_now_add=True)),
60+
('connection', models.ForeignKey(on_delete=django.db.models.deletion.CASCADE, related_name='queries', to='core.databaseconnection')),
61+
],
62+
options={
63+
'db_table': 'query_history',
64+
'ordering': ['-created_at'],
65+
},
66+
),
67+
]

core/migrations/__init__.py

Whitespace-only changes.

core/models.py

Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
import uuid
2+
from django.db import models
3+
4+
5+
class DatabaseConnection(models.Model):
6+
id = models.UUIDField(primary_key=True, default=uuid.uuid4, editable=False)
7+
name = models.CharField(max_length=255, unique=True)
8+
host = models.CharField(max_length=255)
9+
port = models.IntegerField(default=5432)
10+
database = models.CharField(max_length=255)
11+
username = models.CharField(max_length=255)
12+
password = models.CharField(max_length=255)
13+
created_at = models.DateTimeField(auto_now_add=True)
14+
updated_at = models.DateTimeField(auto_now=True)
15+
is_active = models.BooleanField(default=True)
16+
17+
class Meta:
18+
db_table = 'database_connections'
19+
ordering = ['-created_at']
20+
21+
def __str__(self):
22+
return self.name
23+
24+
def get_connection_string(self):
25+
return f"postgresql://{self.username}:{self.password}@{self.host}:{self.port}/{self.database}"
26+
27+
28+
class SchemaDocument(models.Model):
29+
id = models.UUIDField(primary_key=True, default=uuid.uuid4, editable=False)
30+
connection = models.ForeignKey(DatabaseConnection, on_delete=models.CASCADE, related_name='documents')
31+
name = models.CharField(max_length=255)
32+
content = models.TextField()
33+
embeddings = models.TextField(blank=True, null=True)
34+
uploaded_at = models.DateTimeField(auto_now_add=True)
35+
36+
class Meta:
37+
db_table = 'schema_documents'
38+
ordering = ['-uploaded_at']
39+
40+
def __str__(self):
41+
return self.name
42+
43+
44+
class QueryHistory(models.Model):
45+
id = models.UUIDField(primary_key=True, default=uuid.uuid4, editable=False)
46+
connection = models.ForeignKey(DatabaseConnection, on_delete=models.CASCADE, related_name='queries')
47+
natural_language = models.TextField()
48+
generated_sql = models.TextField()
49+
executed_sql = models.TextField(blank=True, null=True)
50+
result = models.JSONField(blank=True, null=True)
51+
error = models.TextField(blank=True, null=True)
52+
created_at = models.DateTimeField(auto_now_add=True)
53+
54+
class Meta:
55+
db_table = 'query_history'
56+
ordering = ['-created_at']
57+
58+
def __str__(self):
59+
return f"{self.natural_language[:50]}... - {self.created_at}"

0 commit comments

Comments
 (0)