Skip to content

Commit 08d7b62

Browse files
Alexandre Oliveiraclaude
andcommitted
docs(turing): add Assets page and align Knowledge Base docs with actual flow
- New assets.md: dual-panel layout, file table (Name/Size/Type/Modified/AI/Actions), upload with drag-and-drop, create folder, download, delete, preview panel (images/PDF/video/audio/text), AI training column with tooltip, batch training sequence diagram (Tika extraction, 100K truncation, 1024-char chunks, progress polling), automatic indexing on upload/delete, embedding metadata table, MinIO configuration with Docker Compose snippet - genai-llm.md: replace Knowledge Base section to reference assets.md and add accurate indexing pipeline details (Tika, 100K chars, 1024-char chunks) - administration-guide.md: update Knowledge Base table row to point to Management → Assets with link - sidebars-turing.ts: add Management category with assets page Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent 3b32529 commit 08d7b62

4 files changed

Lines changed: 244 additions & 8 deletions

File tree

docs-turing/administration-guide.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ A brief overview of each administration section:
3030
| **LLM Instances** | Administration → LLM Instances | Configure connections to Anthropic Claude, OpenAI, Azure OpenAI, Gemini, and Ollama |
3131
| **MCP Servers** | Administration → MCP Servers | Register external MCP servers (HTTP or stdio) to extend agent tool calling |
3232
| **AI Agents** | Administration → AI Agents | Compose agents from an LLM Instance + selected tools + MCP Servers |
33-
| **Knowledge Base** | AdministrationKnowledge Base | Upload and organize files in MinIO; files are indexed as vector embeddings for RAG |
33+
| **Knowledge Base** | ManagementAssets | Upload and organize files in MinIO; files are indexed as vector embeddings and queried by AI Agents. See [Assets](./assets.md) |
3434

3535
---
3636

docs-turing/assets.md

Lines changed: 224 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,224 @@
1+
---
2+
sidebar_position: 1
3+
title: Assets
4+
description: Manage files and train the RAG knowledge base with Viglet Turing ES Assets.
5+
---
6+
7+
# Assets
8+
9+
The **Assets** section (`/console/asset`) is a file manager with built-in RAG training capabilities. It is available in the **Management** section of the sidebar and is only visible when **MinIO is enabled**.
10+
11+
Assets serves as the Knowledge Base for AI Agents — every file uploaded here can be indexed as vector embeddings and queried by the LLM via tool calling. For the conceptual overview of how this fits into the GenAI architecture, see [Generative AI & LLM Configuration](./genai-llm.md).
12+
13+
:::info MinIO required
14+
Assets and all RAG Knowledge Base features require MinIO to be configured. See [MinIO Configuration](#minio-configuration) at the bottom of this page.
15+
:::
16+
17+
---
18+
19+
## Layout
20+
21+
The interface uses a **resizable dual-panel layout**:
22+
23+
- **Left panel** — file and folder listing with the action toolbar
24+
- **Right panel** — inline preview of the selected file
25+
26+
A **breadcrumb** at the top of the left panel shows the current folder path and allows navigation to any parent level. A **Root** button returns to the top-level folder instantly.
27+
28+
---
29+
30+
## File Table
31+
32+
The file listing displays the following columns:
33+
34+
| Column | Description |
35+
|---|---|
36+
| **Name** | File or folder name |
37+
| **Size** | File size in human-readable format |
38+
| **Type** | MIME type or folder indicator |
39+
| **Last Modified** | Date and time of the last modification |
40+
| **AI** | Training status — a checkmark indicates the file has been indexed as embeddings, with a tooltip showing the training timestamp |
41+
| **Actions** | Per-row download and delete buttons |
42+
43+
---
44+
45+
## File Management
46+
47+
### Upload Files
48+
49+
Files are uploaded to the **current folder** via drag-and-drop or a file picker. Multiple files can be selected in one operation. Uploads are sent to:
50+
51+
```
52+
POST /api/asset
53+
```
54+
55+
After upload, an **asynchronous event automatically triggers individual AI indexing** for each uploaded file — no manual training step is needed for new uploads.
56+
57+
### Create Folder
58+
59+
A dialog prompts for a folder name. Folders can be nested to any depth and are navigated via the breadcrumb.
60+
61+
### Download
62+
63+
Each file has a dedicated download button that preserves the original filename.
64+
65+
### Delete
66+
67+
Files and folders can be deleted via an inline button. A **toast notification** confirms completion. When a file is deleted, its **embeddings are automatically removed from the vector store**.
68+
69+
---
70+
71+
## Preview Panel
72+
73+
Selecting a file opens an inline preview in the right panel without leaving the page. Supported formats:
74+
75+
| Category | Formats |
76+
|---|---|
77+
| **Images** | PNG, JPEG, GIF, WebP, SVG, BMP |
78+
| **PDFs** | Rendered via iframe |
79+
| **Video** | MP4, WebM, OGG (with player controls) |
80+
| **Audio** | MP3, OGG, WAV, WebM (with player controls) |
81+
| **Text** | TXT, CSV, HTML, CSS, JS, JSON, XML |
82+
83+
**Panel actions:**
84+
85+
- **Maximise** — opens fullscreen view (press `Esc` to close)
86+
- **Download** — downloads the file directly from the preview panel
87+
- **Close** — collapses the preview panel
88+
89+
The panel footer displays the file size, content type, modification date, and file extension.
90+
91+
---
92+
93+
## AI Training (RAG)
94+
95+
The AI training features are only available when `ragEnabled=true` **and** an embedding model and embedding store are configured in **Administration → Global Settings → RAG Settings**.
96+
97+
### Training Status per File
98+
99+
The **AI column** in the file table shows the indexing state of each file:
100+
101+
-**Checkmark** — file has been indexed; hover to see the training timestamp
102+
- *(empty)* — file has not yet been indexed
103+
104+
### Automatic Training on Upload
105+
106+
When a file is uploaded, Turing ES dispatches an **asynchronous event** that indexes the file individually without any user action required. Similarly, when a file is deleted, its embeddings are automatically purged from the vector store.
107+
108+
### Batch Training
109+
110+
To index all existing files at once — useful after enabling RAG on an existing installation, or after changing the embedding model — use the **"Train AI with Assets"** button.
111+
112+
```mermaid
113+
sequenceDiagram
114+
participant Admin
115+
participant UI as Assets UI
116+
participant API as Turing ES
117+
participant MinIO
118+
participant Tika as Apache Tika
119+
participant VS as Vector Store
120+
121+
Admin->>UI: Click "Train AI with Assets"
122+
UI->>API: Start batch training
123+
loop For each file in MinIO (recursive)
124+
API->>MinIO: Download file
125+
MinIO-->>API: File bytes
126+
API->>Tika: Extract text
127+
Tika-->>API: Plain text (truncated at 100,000 chars)
128+
API->>API: Split into 1,024-char chunks
129+
API->>VS: Create embeddings and store chunks
130+
API->>API: Write record to asset_training_record
131+
end
132+
API-->>UI: Training complete
133+
```
134+
135+
**Batch training steps for each file:**
136+
137+
1. Download file bytes from MinIO
138+
2. Extract plain text via **Apache Tika** — supports PDF, DOCX, XLSX, PPTX, HTML, TXT, and images (with OCR)
139+
3. Truncate text to **100,000 characters**
140+
4. Split into **chunks of 1,024 characters**
141+
5. Generate embeddings and store in the configured vector store
142+
6. Write a record to `asset_training_record` with timestamp
143+
144+
**Progress monitoring** — while the batch is running, the UI polls every **3 seconds** and displays:
145+
146+
```
147+
X / Y files processed, Z errors
148+
```
149+
150+
**Training states:** `IDLE``RUNNING``COMPLETED` / `FAILED`
151+
152+
:::warning Re-training after embedding model change
153+
If you change the Default Embedding Model in **Administration → Global Settings → RAG Settings**, all existing embeddings become invalid. Run "Train AI with Assets" again to re-index all files with the new model.
154+
:::
155+
156+
### Embedding Metadata
157+
158+
Each chunk stored in the vector store carries the following metadata:
159+
160+
| Field | Value |
161+
|---|---|
162+
| `source` | `"minio-asset"` |
163+
| `objectName` | Full object path in MinIO |
164+
| `objectPath` | Folder path within the bucket |
165+
| `fileName` | Original filename |
166+
| `contentType` | MIME type of the source file |
167+
| `size` | File size in bytes |
168+
169+
This metadata is used by AI Agents when returning search results, so the LLM can cite the source file and provide context about where the information came from.
170+
171+
---
172+
173+
## How AI Agents Use the Knowledge Base
174+
175+
Once files are indexed, AI Agents can query the knowledge base via four built-in tool callings:
176+
177+
| Tool | Description |
178+
|---|---|
179+
| `search_knowledge_base` | Semantic similarity search across all indexed chunks |
180+
| `knowledge_base_stats` | Returns total files, chunks, and storage size |
181+
| `list_knowledge_base_files` | Lists all indexed files, with optional keyword filter |
182+
| `get_file_from_knowledge_base` | Retrieves the full indexed content of a specific file |
183+
184+
For details on configuring AI Agents and tool callings, see [Generative AI & LLM Configuration — AI Agents](./genai-llm.md#ai-agents).
185+
186+
---
187+
188+
## MinIO Configuration
189+
190+
MinIO must be enabled and configured before Assets becomes available:
191+
192+
```properties
193+
turing.minio.enabled=true
194+
turing.minio.endpoint=http://minio:9000
195+
turing.minio.accessKey=minioadmin
196+
turing.minio.secretKey=minioadmin
197+
turing.minio.bucket=turing-assets
198+
```
199+
200+
The bucket (`turing-assets` by default) is **created automatically on startup** if it does not exist.
201+
202+
With Docker Compose, add the MinIO service alongside Turing ES:
203+
204+
```yaml
205+
minio:
206+
image: minio/minio
207+
ports:
208+
- "9000:9000"
209+
- "9001:9001"
210+
environment:
211+
MINIO_ROOT_USER: minioadmin
212+
MINIO_ROOT_PASSWORD: minioadmin
213+
command: server /data --console-address ":9001"
214+
volumes:
215+
- minio_data:/data
216+
```
217+
218+
:::tip
219+
The MinIO web console is available at `http://localhost:9001` when running locally via Docker Compose. Use it to inspect buckets and verify that files are being stored correctly.
220+
:::
221+
222+
---
223+
224+
*Previous: [Administration Guide](./administration-guide.md) | Next: [Generative AI & LLM Configuration](./genai-llm.md)*

docs-turing/genai-llm.md

Lines changed: 12 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -171,15 +171,20 @@ sequenceDiagram
171171

172172
Turing ES retrieves the **top 10** most similar document chunks by default, using a similarity **threshold of 0.7**. Documents with a similarity score below the threshold are excluded from the context, preventing low-relevance content from polluting the prompt.
173173

174-
### Knowledge Base (MinIO)
174+
### Knowledge Base (Assets)
175175

176-
The Knowledge Base is a collection of files stored in MinIO and indexed as vector embeddings. Administrators manage files through a folder-based UI in the Turing ES admin console — creating folders, uploading documents, and organizing content in a way similar to a file system.
176+
The Knowledge Base is built from files managed in the **Assets** section (`/console/asset`), a file manager backed by MinIO. Administrators can create folders, upload documents, and browse content via a dual-panel interface. Files are indexed as vector embeddings and can be queried semantically by AI Agents.
177177

178-
When a file is uploaded, the indexing pipeline:
179-
1. Extracts text content (including OCR for images and PDFs)
180-
2. Splits the content into chunks
181-
3. Generates a vector embedding for each chunk using the configured embedding model
182-
4. Stores the chunks and embeddings in the active embedding store
178+
Full documentation — including the UI layout, file preview, batch training, automatic indexing on upload/delete, and MinIO configuration — is available on the dedicated [Assets](./assets.md) page.
179+
180+
**Indexing pipeline (per file):**
181+
182+
1. Download file from MinIO
183+
2. Extract plain text via **Apache Tika** (supports PDF, DOCX, XLSX, PPTX, HTML, TXT, images with OCR)
184+
3. Truncate to **100,000 characters**
185+
4. Split into **chunks of 1,024 characters**
186+
5. Generate embeddings using the configured embedding model
187+
6. Store chunks in the active embedding store with source metadata
183188

184189
The Knowledge Base is queried by AI Agents using the **RAG / Knowledge Base** tool callings:
185190

sidebars-turing.ts

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,13 @@ const sidebars: SidebarsConfig = {
2121
"installation-guide",
2222
"administration-guide",
2323
"developer-guide",
24+
{
25+
type: "category",
26+
label: "Management",
27+
items: [
28+
"assets",
29+
],
30+
},
2431
{
2532
type: "category",
2633
label: "Generative AI",

0 commit comments

Comments
 (0)