|
| 1 | +# PageIndex NotebookLM — AgentKit |
| 2 | + |
| 3 | +Upload any PDF and chat with it using **vectorless, tree-structured RAG** — powered **end-to-end by Lamatic AI flows**. |
| 4 | + |
| 5 | +> **No vector database. No external Python server. No custom backend code.** |
| 6 | +> Just 4 Lamatic flows + a Next.js frontend that implements the full PageIndex pipeline — from PDF ingestion to tree-navigated question answering — entirely within Lamatic's orchestration layer. |
| 7 | +
|
| 8 | +--- |
| 9 | + |
| 10 | +## What Makes This Different |
| 11 | + |
| 12 | +Most RAG implementations require a vector database, an embedding model, a retrieval server, and often a separate Python backend. **This kit eliminates all of that.** |
| 13 | + |
| 14 | +The entire PageIndex pipeline — TOC detection, tree construction, page indexing, summary generation, tree-navigated search, and LLM answering — is implemented as **4 Lamatic AI flows** with zero external servers or Python code. The Next.js frontend communicates exclusively with Lamatic's flow execution API via the official `lamatic` SDK. |
| 15 | + |
| 16 | +### Key Highlights |
| 17 | + |
| 18 | +- **100% Lamatic-powered backend** — all document processing, indexing, retrieval, and answering logic lives inside Lamatic flows |
| 19 | +- **No vector DB** — uses a hierarchical tree index (built from the document's table of contents) instead of vector embeddings |
| 20 | +- **No external server** — no FastAPI, no Railway, no Python — the Lamatic flows handle everything |
| 21 | +- **No chunking** — sections are identified by their structural position in the document, not arbitrary text splits |
| 22 | + |
| 23 | +--- |
| 24 | + |
| 25 | +## Architecture |
| 26 | + |
| 27 | +```text |
| 28 | +┌────────────────────────────────────────────────────┐ |
| 29 | +│ Next.js Frontend │ |
| 30 | +│ ┌──────────┐ ┌──────────┐ ┌────────┐ ┌─────────┐ │ |
| 31 | +│ │ Document │ │ Chat │ │ Tree │ │Document │ │ |
| 32 | +│ │ Upload │ │ Window │ │ Viewer │ │ List │ │ |
| 33 | +│ └────┬─────┘ └────┬─────┘ └───┬────┘ └────┬────┘ │ |
| 34 | +│ │ │ │ │ │ |
| 35 | +│ ┌────┴─────────────┴───────────┴────────────┴───┐ │ |
| 36 | +│ │ Server Actions (orchestrate.ts) │ │ |
| 37 | +│ └────────────────────┬──────────────────────────┘ │ |
| 38 | +│ │ │ |
| 39 | +│ Lamatic SDK (lamatic npm) │ |
| 40 | +└───────────────────────┼─────────────────────────────┘ |
| 41 | + │ |
| 42 | + ┌────────────┴────────────┐ |
| 43 | + │ Lamatic AI Platform │ |
| 44 | + │ │ |
| 45 | + │ Flow 1: Upload + Index │ |
| 46 | + │ Flow 2: Chat + Retrieve │ |
| 47 | + │ Flow 3: List Documents │ |
| 48 | + │ Flow 4: Tree / Delete │ |
| 49 | + │ │ |
| 50 | + │ ┌──────────────┐ │ |
| 51 | + │ │ Supabase │ │ |
| 52 | + │ │ (PostgreSQL) │ │ |
| 53 | + │ └──────────────┘ │ |
| 54 | + └──────────────────────────┘ |
| 55 | +``` |
| 56 | + |
| 57 | +--- |
| 58 | + |
| 59 | +## How It Works |
| 60 | + |
| 61 | +### Document Ingestion (Flow 1) |
| 62 | + |
| 63 | +When a PDF is uploaded, the Lamatic flow runs a multi-stage pipeline: |
| 64 | + |
| 65 | +1. **TOC Detection** — scans the first pages to locate the table of contents |
| 66 | +2. **TOC Extraction** — multi-pass extraction with completion verification |
| 67 | +3. **TOC → JSON** — structured flat list with hierarchy identifiers (`1`, `1.1`, `1.2.3`) |
| 68 | +4. **Physical Index Assignment** — verifies each section starts on the correct page |
| 69 | +5. **Tree Build** — nested tree structure with exact `start_index` + `end_index` per section |
| 70 | +6. **Summary Generation** — 1–2 sentence summary per node |
| 71 | +7. **Page Verification** — fuzzy-matches node titles against actual page text |
| 72 | +8. **Save** — stores the tree + metadata in Supabase |
| 73 | + |
| 74 | +### Chat & Retrieval (Flow 2) |
| 75 | + |
| 76 | +At query time, the LLM navigates the tree like a table of contents: |
| 77 | +1. Receives the full tree structure with section titles and summaries |
| 78 | +2. Selects the most relevant leaf nodes based on the query |
| 79 | +3. Fetches verbatim page content using exact `start_index → end_index` ranges |
| 80 | +4. Generates an answer grounded in the retrieved content |
| 81 | + |
| 82 | +The frontend receives the answer, the retrieved nodes with page ranges, and the LLM's tree-navigation reasoning — all displayed in the UI. |
| 83 | + |
| 84 | +--- |
| 85 | + |
| 86 | +## Stack |
| 87 | + |
| 88 | +| Layer | Technology | |
| 89 | +|---|---| |
| 90 | +| Orchestration & Backend | **Lamatic AI** (4 flows — no external server) | |
| 91 | +| Storage | **Supabase** (PostgreSQL) | |
| 92 | +| Frontend | **Next.js 15** (App Router, Server Actions) | |
| 93 | +| Styling | **CSS custom properties** (dark-mode design system) | |
| 94 | +| SDK | **`lamatic`** npm package | |
| 95 | + |
| 96 | +--- |
| 97 | + |
| 98 | +## Features |
| 99 | + |
| 100 | +- **PDF Upload** — drag-and-drop or paste a URL |
| 101 | +- **Tree-Structured RAG** — vectorless retrieval using hierarchical document index |
| 102 | +- **Multi-Turn Chat** — conversational history maintained across messages |
| 103 | +- **Chat Persistence** — conversations saved to `localStorage`, survive page navigations |
| 104 | +- **Interactive Tree Viewer** — explore the full document structure, nodes highlight on retrieval |
| 105 | +- **Source Panel** — view retrieved sections with page ranges and LLM reasoning |
| 106 | +- **Document Management** — list all documents, view trees, delete documents |
| 107 | +- **Markdown Rendering** — AI responses rendered with headings, lists, bold, code |
| 108 | +- **Responsive Dark UI** — premium design system with animations and micro-interactions |
| 109 | + |
| 110 | +--- |
| 111 | + |
| 112 | +## Prerequisites |
| 113 | + |
| 114 | +- [Lamatic AI](https://lamatic.ai) account (free) |
| 115 | +- [Supabase](https://supabase.com) account (free tier) |
| 116 | +- Node.js 18+ |
| 117 | + |
| 118 | +> **That's it.** No Groq account, no Railway, no Python environment needed. |
| 119 | +
|
| 120 | +--- |
| 121 | + |
| 122 | +## Setup |
| 123 | + |
| 124 | +### 1. Set Up Supabase |
| 125 | + |
| 126 | +Run this SQL in Supabase SQL Editor: |
| 127 | + |
| 128 | +```sql |
| 129 | +create table documents ( |
| 130 | + id uuid default gen_random_uuid() primary key, |
| 131 | + doc_id text unique not null, |
| 132 | + file_name text, |
| 133 | + file_url text, |
| 134 | + tree jsonb, |
| 135 | + raw_text text, |
| 136 | + tree_node_count integer default 0, |
| 137 | + status text default 'completed', |
| 138 | + created_at timestamptz default now() |
| 139 | +); |
| 140 | +alter table documents enable row level security; |
| 141 | +-- Only the Supabase service role (used server-side in Lamatic flows) can |
| 142 | +-- read and write documents. No direct client-side access is permitted. |
| 143 | +create policy "service_role_only" on documents |
| 144 | + for all |
| 145 | + using (auth.role() = 'service_role') |
| 146 | + with check (auth.role() = 'service_role'); |
| 147 | +``` |
| 148 | + |
| 149 | +### 2. Import Lamatic Flows |
| 150 | + |
| 151 | +Import all 4 flows from the `flows/` folder into Lamatic Studio: |
| 152 | + |
| 153 | +| Flow | Folder | Purpose | |
| 154 | +|---|---|---| |
| 155 | +| Upload | `flows/flow-1-upload-pdf-build-tree-save/` | PDF → 7-stage pipeline → tree index → Supabase | |
| 156 | +| Chat | `flows/chat-with-pdf/` | Tree search → page fetch → LLM answer | |
| 157 | +| List | `flows/flow-list-all-documents/` | List all documents from Supabase | |
| 158 | +| Tree | `flows/flow-4-get-tree-structure/` | Return full tree JSON or delete a document | |
| 159 | + |
| 160 | +Add these secrets in **Lamatic → Settings → Secrets**: |
| 161 | + |
| 162 | +| Secret | Value | |
| 163 | +|---|---| |
| 164 | +| `SUPABASE_URL` | `https://xxx.supabase.co` | |
| 165 | +| `SUPABASE_ANON_KEY` | From Supabase Settings → API | |
| 166 | +| `SUPABASE_SERVICE_ROLE_KEY` | From Supabase Settings → API — **server-side only, never expose client-side** | |
| 167 | + |
| 168 | +> **Important:** `SUPABASE_SERVICE_ROLE_KEY` bypasses RLS. Store it in Lamatic Secrets only — never in `.env.local` shipped to the browser. |
| 169 | +
|
| 170 | +### 3. Install and Configure |
| 171 | + |
| 172 | +```bash |
| 173 | +cd kits/assistant/pageindex-notebooklm |
| 174 | +npm install |
| 175 | +cp .env.example .env.local |
| 176 | +``` |
| 177 | + |
| 178 | +Fill in `.env.local`: |
| 179 | + |
| 180 | +```env |
| 181 | +LAMATIC_API_KEY=... # Lamatic → Settings → API Keys |
| 182 | +LAMATIC_PROJECT_ID=... # Lamatic → Settings → Project ID |
| 183 | +LAMATIC_API_URL=... # Lamatic → Settings → API Docs → Endpoint |
| 184 | +
|
| 185 | +FLOW_ID_UPLOAD=... # Flow 1 → three-dot menu → Copy ID |
| 186 | +FLOW_ID_CHAT=... # Flow 2 → three-dot menu → Copy ID |
| 187 | +FLOW_ID_LIST=... # Flow 3 → three-dot menu → Copy ID |
| 188 | +FLOW_ID_TREE=... # Flow 4 → three-dot menu → Copy ID |
| 189 | +``` |
| 190 | + |
| 191 | +### 4. Run Locally |
| 192 | + |
| 193 | +```bash |
| 194 | +npm run dev |
| 195 | +# → http://localhost:3000 |
| 196 | +``` |
| 197 | + |
| 198 | +--- |
| 199 | + |
| 200 | +## Project Structure |
| 201 | + |
| 202 | +```text |
| 203 | +pageindex-notebooklm/ (TypeScript · Next.js/React) |
| 204 | +├── actions/ |
| 205 | +│ └── orchestrate.ts # TypeScript — Server actions — all 4 flow calls via Lamatic SDK |
| 206 | +├── app/ |
| 207 | +│ ├── globals.css # CSS — Design system (custom properties, animations) |
| 208 | +│ ├── layout.tsx # TSX/React — Root layout with metadata |
| 209 | +│ └── page.tsx # TSX/React — Main page — document list + chat + tree viewer |
| 210 | +├── components/ |
| 211 | +│ ├── ChatWindow.tsx # TSX/React — Chat UI with markdown, sources, persistence |
| 212 | +│ ├── DocumentList.tsx # TSX/React — Document sidebar with search + delete |
| 213 | +│ ├── DocumentUpload.tsx # TSX/React — Drag-and-drop / URL upload |
| 214 | +│ └── TreeViewer.tsx # TSX/React — Interactive hierarchical tree viewer |
| 215 | +├── flows/ |
| 216 | +│ ├── flow-1-upload-pdf-build-tree-save/ |
| 217 | +│ ├── chat-with-pdf/ |
| 218 | +│ ├── flow-list-all-documents/ |
| 219 | +│ └── flow-4-get-tree-structure/ |
| 220 | +├── lib/ |
| 221 | +│ ├── lamatic-client.ts # TypeScript — Lamatic SDK initialization |
| 222 | +│ └── types.ts # TypeScript — Shared interfaces and types |
| 223 | +├── config.json # Kit metadata |
| 224 | +└── .env.example # Environment variable template |
| 225 | +``` |
| 226 | + |
| 227 | +--- |
| 228 | + |
| 229 | +## Deploying to Vercel |
| 230 | + |
| 231 | +```bash |
| 232 | +git checkout -b feat/pageindex-notebooklm |
| 233 | +git add kits/assistant/pageindex-notebooklm/ |
| 234 | +git commit -m "feat: PageIndex NotebookLM — end-to-end Lamatic-powered tree RAG" |
| 235 | +git push origin feat/pageindex-notebooklm |
| 236 | +``` |
| 237 | + |
| 238 | +Then in Vercel: |
| 239 | +1. Import your repo |
| 240 | +2. Set **Root Directory** → `kits/assistant/pageindex-notebooklm` |
| 241 | +3. Add all 7 env vars from `.env.local` |
| 242 | +4. Deploy |
| 243 | + |
| 244 | +--- |
| 245 | + |
| 246 | +## Author |
| 247 | + |
| 248 | +**Saurabh Tiwari** — [st108113@gmail.com](mailto:st108113@gmail.com) |
| 249 | +GitHub: [@Skt329](https://github.com/Skt329) |
0 commit comments