|
| 1 | +# LLM Natural Language SQL Analytics - Specification |
| 2 | + |
| 3 | +## 1. Project Overview |
| 4 | + |
| 5 | +**Project Name:** LLM SQL Analytics |
| 6 | +**Project Type:** Django Web Application |
| 7 | +**Core Functionality:** A system that allows users to query structured databases using natural language. The system converts natural language questions into SQL queries, executes them against connected databases, and returns results. Includes semantic search over database schema documentation using pgvector. |
| 8 | + |
| 9 | +**Target Users:** Business analysts, data scientists, and non-technical users who need to query databases without writing SQL. |
| 10 | + |
| 11 | +--- |
| 12 | + |
| 13 | +## 2. Technology Stack |
| 14 | + |
| 15 | +- **Framework:** Django >=4.2, <5.0 |
| 16 | +- **Database Driver:** psycopg2-binary >=2.9.9 |
| 17 | +- **Vector Storage:** pgvector >=0.1.8 |
| 18 | +- **LLM Framework:** langchain >=0.1.0, langchain-core >=0.1.0 |
| 19 | +- **Text Processing:** langchain-text-splitters >=0.3.0 |
| 20 | +- **LLM Provider:** langchain-openai >=0.1.0, openai >=1.0.0 |
| 21 | +- **Embeddings:** sentence-transformers >=2.2.2 |
| 22 | +- **Environment:** python-dotenv >=1.0.0 |
| 23 | +- **PDF Parsing:** PyPDF2 >=3.0.0 |
| 24 | + |
| 25 | +--- |
| 26 | + |
| 27 | +## 3. UI/UX Specification |
| 28 | + |
| 29 | +### Layout Structure |
| 30 | + |
| 31 | +**Header** |
| 32 | +- Application title: "LLM SQL Analytics" |
| 33 | +- Navigation: Dashboard, Query Interface, Schema Docs, Settings |
| 34 | + |
| 35 | +**Main Content Area** |
| 36 | +- Dashboard: Overview of connected databases and recent queries |
| 37 | +- Query Interface: Natural language input + SQL preview + results table |
| 38 | +- Schema Docs: List of uploaded documents with semantic search |
| 39 | +- Settings: Database connection configuration |
| 40 | + |
| 41 | +**Footer** |
| 42 | +- Version info and copyright |
| 43 | + |
| 44 | +### Visual Design |
| 45 | + |
| 46 | +**Color Palette** |
| 47 | +- Primary: `#1E3A5F` (Deep Navy) |
| 48 | +- Secondary: `#3D5A80` (Slate Blue) |
| 49 | +- Accent: `#48CAE4` (Cyan) |
| 50 | +- Background: `#F8F9FA` (Light Gray) |
| 51 | +- Surface: `#FFFFFF` (White) |
| 52 | +- Text Primary: `#212529` (Dark Gray) |
| 53 | +- Text Secondary: `#6C757D` (Medium Gray) |
| 54 | +- Success: `#28A745` (Green) |
| 55 | +- Error: `#DC3545` (Red) |
| 56 | +- Warning: `#FFC107` (Yellow) |
| 57 | + |
| 58 | +**Typography** |
| 59 | +- Headings: "Inter", sans-serif, 600 weight |
| 60 | +- Body: "Inter", sans-serif, 400 weight |
| 61 | +- Monospace (SQL): "JetBrains Mono", monospace |
| 62 | + |
| 63 | +**Spacing** |
| 64 | +- Base unit: 8px |
| 65 | +- Container max-width: 1200px |
| 66 | +- Card padding: 24px |
| 67 | +- Section margin: 32px |
| 68 | + |
| 69 | +### Components |
| 70 | + |
| 71 | +**Query Input Card** |
| 72 | +- Textarea for natural language input |
| 73 | +- "Generate SQL" button (primary style) |
| 74 | +- "Execute" button (accent style) |
| 75 | +- SQL preview panel with syntax highlighting |
| 76 | + |
| 77 | +**Results Table** |
| 78 | +- Sortable columns |
| 79 | +- Pagination for large results |
| 80 | +- Export to CSV option |
| 81 | + |
| 82 | +**Schema Doc Card** |
| 83 | +- Document name |
| 84 | +- Upload date |
| 85 | +- Preview snippet |
| 86 | +- Semantic search input |
| 87 | + |
| 88 | +**Database Connection Card** |
| 89 | +- Connection name |
| 90 | +- Database type indicator |
| 91 | +- Status indicator (connected/disconnected) |
| 92 | +- Edit/Delete actions |
| 93 | + |
| 94 | +--- |
| 95 | + |
| 96 | +## 4. Functionality Specification |
| 97 | + |
| 98 | +### Core Features |
| 99 | + |
| 100 | +#### 4.1 Database Connection Management |
| 101 | +- Add PostgreSQL database connections with connection details |
| 102 | +- Test database connectivity |
| 103 | +- Store connection configurations securely |
| 104 | +- Support multiple database connections |
| 105 | + |
| 106 | +#### 4.2 Schema Introspection |
| 107 | +- Automatically fetch table names and column information |
| 108 | +- Store schema in PostgreSQL with pgvector for semantic search |
| 109 | +- Support uploading schema documentation (PDF) |
| 110 | + |
| 111 | +#### 4.3 Natural Language to SQL |
| 112 | +- Accept natural language queries |
| 113 | +- Use LLM to convert natural language to SQL |
| 114 | +- Support SELECT, INSERT, UPDATE, DELETE operations |
| 115 | +- Include validation and error handling |
| 116 | + |
| 117 | +#### 4.4 Semantic Schema Search |
| 118 | +- Embed schema documentation using sentence-transformers |
| 119 | +- Store embeddings in pgvector |
| 120 | +- Search schema semantically to find relevant tables/columns |
| 121 | + |
| 122 | +#### 4.5 Query Execution |
| 123 | +- Execute generated SQL against target database |
| 124 | +- Return formatted results |
| 125 | +- Handle errors gracefully with user-friendly messages |
| 126 | + |
| 127 | +#### 4.6 Query History |
| 128 | +- Store all queries with timestamps |
| 129 | +- View past queries and their results |
| 130 | +- Re-run previous queries |
| 131 | + |
| 132 | +### User Interactions |
| 133 | + |
| 134 | +1. **Add Database:** User enters connection details → System tests connection → Saves if successful |
| 135 | +2. **Ask Question:** User types natural language → System generates SQL → User reviews → Executes → Views results |
| 136 | +3. **Search Schema:** User enters search term → System returns semantically similar schema elements |
| 137 | +4. **Upload Docs:** User uploads PDF → System extracts text → Embeds and stores in vector DB |
| 138 | + |
| 139 | +### Data Handling |
| 140 | + |
| 141 | +- Query history stored in SQLite (local) |
| 142 | +- Database connections stored encrypted |
| 143 | +- Schema embeddings stored in PostgreSQL via pgvector |
| 144 | + |
| 145 | +### Edge Cases |
| 146 | + |
| 147 | +- Invalid SQL generated → Show error with suggestion to retry |
| 148 | +- Database connection failed → Show connection error with troubleshooting tips |
| 149 | +- Empty results → Show "No results found" message |
| 150 | +- Large result sets → Paginate results |
| 151 | +- Network timeout → Show timeout error with retry option |
| 152 | + |
| 153 | +--- |
| 154 | + |
| 155 | +## 5. Database Schema |
| 156 | + |
| 157 | +### Django Models |
| 158 | + |
| 159 | +``` |
| 160 | +DatabaseConnection |
| 161 | +- id: UUID (primary key) |
| 162 | +- name: CharField (unique) |
| 163 | +- host: CharField |
| 164 | +- port: IntegerField |
| 165 | +- database: CharField |
| 166 | +- username: CharField |
| 167 | +- password: CharField (encrypted) |
| 168 | +- created_at: DateTimeField |
| 169 | +- updated_at: DateTimeField |
| 170 | +- is_active: BooleanField |
| 171 | +
|
| 172 | +SchemaDocument |
| 173 | +- id: UUID (primary key) |
| 174 | +- connection: ForeignKey(DatabaseConnection) |
| 175 | +- name: CharField |
| 176 | +- content: TextField |
| 177 | +- embedding: VectorField (pgvector) |
| 178 | +- uploaded_at: DateTimeField |
| 179 | +
|
| 180 | +QueryHistory |
| 181 | +- id: UUID (primary key) |
| 182 | +- connection: ForeignKey(DatabaseConnection) |
| 183 | +- natural_language: TextField |
| 184 | +- generated_sql: TextField |
| 185 | +- executed_sql: TextField (nullable) |
| 186 | +- result: JSONField (nullable) |
| 187 | +- error: TextField (nullable) |
| 188 | +- created_at: DateTimeField |
| 189 | +``` |
| 190 | + |
| 191 | +--- |
| 192 | + |
| 193 | +## 6. API Endpoints |
| 194 | + |
| 195 | +| Method | Endpoint | Description | |
| 196 | +|--------|----------|-------------| |
| 197 | +| GET | /api/connections/ | List all database connections | |
| 198 | +| POST | /api/connections/ | Add new database connection | |
| 199 | +| GET | /api/connections/{id}/ | Get connection details | |
| 200 | +| PUT | /api/connections/{id}/ | Update connection | |
| 201 | +| DELETE | /api/connections/{id}/ | Delete connection | |
| 202 | +| POST | /api/connections/{id}/test/ | Test connection | |
| 203 | +| GET | /api/connections/{id}/schema/ | Get database schema | |
| 204 | +| POST | /api/query/ | Execute natural language query | |
| 205 | +| GET | /api/history/ | Get query history | |
| 206 | +| POST | /api/docs/ | Upload schema document | |
| 207 | +| GET | /api/docs/ | List schema documents | |
| 208 | +| POST | /api/docs/search/ | Semantic search in documents | |
| 209 | + |
| 210 | +--- |
| 211 | + |
| 212 | +## 7. Acceptance Criteria |
| 213 | + |
| 214 | +1. ✓ User can add PostgreSQL database connections |
| 215 | +2. ✓ User can test database connectivity |
| 216 | +3. ✓ User can enter natural language queries |
| 217 | +4. ✓ System generates valid SQL from natural language |
| 218 | +5. ✓ User can execute queries and view results |
| 219 | +6. ✓ User can upload PDF schema documentation |
| 220 | +7. ✓ System performs semantic search over schema docs |
| 221 | +8. ✓ Query history is maintained |
| 222 | +9. ✓ Web interface is responsive and functional |
| 223 | +10. ✓ Error handling provides helpful feedback |
0 commit comments