Textagent
diff --git a/‎README.md‎
Lines changed: 4 additions & 1 deletion b/‎README.md‎
Lines changed: 4 additions & 1 deletion
@@ -37,7 +37,7 @@
 | **Desktop** | Native app via Neutralino.js with system tray and offline support |
 | **Code Execution** | 7 languages in-browser: Bash ([just-bash](https://justbash.dev/)), Math (Nerdamer), LaTeX (MathJax + Nerdamer evaluation), Python ([Pyodide](https://pyodide.org/)), HTML (sandboxed iframe, `html-autorun` for widgets/quizzes), JavaScript (sandboxed iframe), SQL ([sql.js](https://sql.js.org/) SQLite) · 25+ compiled languages via [Judge0 CE](https://ce.judge0.com): C, C++, Rust, Go, Java, TypeScript, Kotlin, Scala, Ruby, Swift, Haskell, Dart, C#, and more · **▶ Run All** notebook engine — one-click sequential execution with preflight dialog (block table with model/status), pre-execution model loading (AI + TTS auto-loaded before blocks run), progress bar, abort, per-block status badges, detailed console logging, and SQLite shared context store |
 | **Security** | Content Security Policy (CSP), SRI integrity hashes, XSS sanitization (DOMPurify), ReDoS protection, Firestore write-token ownership, API keys via HTTP headers, postMessage origin validation, 8-char password minimum, sandboxed code execution, Cloudflare Turnstile CAPTCHA on email endpoint, automated security scanner (`scripts/security-check.sh`) with pre-commit integration |
-| **AI Document Tags** | `{{@AI:}}` text generation (`@think: Yes` for deep reasoning), `{{@Image:}}` image generation (Gemini Imagen), `{{@OCR:}}` image-to-text extraction (Text/Math/Table modes via Granite Docling 258M, Florence-2 230M, or GLM-OCR 1.5B, 📷 live camera capture + 📎 image/PDF upload, PDF page rendering via pdf.js), `{{@TTS:}}` text-to-speech playback (Kokoro TTS per card, language selector, ▶ Play / ⬇ Save WAV), `{{@STT:}}` speech-to-text dictation (engine selector: Whisper/Voxtral/Web Speech API, 11 languages, Record/Stop/Insert/Clear), `{{@Translate:}}` translation (target language selector, integrated TTS pronunciation, cloud model routing), `{{@Game:}}` game builder (AI-generated or pre-built, Canvas 2D/Three.js/P5.js, import/export HTML), `{{@Draw:}}` whiteboard (Excalidraw + Mermaid, AI diagram generation with per-card model selector + 🚀 Generate, robust JSON repair for local models, Insert/PNG/SVG export, 📚 Library Browser with 29 bundled packs in 6 categories), `{{@Tools:}}` web tools (Jina Reader scrape + Jina Search, multi-URL scraping, API key modal, Accept/Reject/Copy results), `{{@Research:}}` autonomous experiment loop (Pyodide-powered, AI-driven code optimization with live results table, keep/discard metric tracking, configurable @metric/@direction/@max_iterations), `{{Annotate:}}` canvas annotation overlay (freehand pen/highlighter/eraser/shapes, `@text:` mode renders text with **Pretext-style scanline reflow** — text flows live around strokes in real-time, 📖 Present button switches to live reading/drawing mode, ↩ Undo / 🗑 Clear / 📥 PNG export) — `@` prefix syntax on all tag types + metadata fields (`@name`, `@use`, `@think`, `@search`, `@prompt`, `@step`, `@upload`, `@model`, `@engine`, `@lang`, `@prebuilt`); `@model:` field persists selected model per card with intelligent defaults (OCR→`granite-docling`, TTS→`kokoro-tts`, STT→`voxtral-stt`, Image→`imagen-ultra`); editable `@prompt:` textarea and `@step:` inputs in preview cards; description/prompt separation (bare text = label, `@prompt:` = AI instruction); 📎 image/PDF upload for multimodal vision analysis; per-card model selector with document-portable model persistence, concurrent block operations |
+| **AI Document Tags** | `{{@AI:}}` text generation (`@think: Yes` for deep reasoning), `{{@Image:}}` image generation (Gemini Imagen), `{{@OCR:}}` image-to-text extraction (Text/Math/Table modes via Granite Docling 258M, Florence-2 230M, or GLM-OCR 1.5B, 📷 live camera capture + 📎 image/PDF upload, PDF page rendering via pdf.js), `{{@TTS:}}` text-to-speech playback (Kokoro TTS per card, language selector, ▶ Play / ⬇ Save WAV), `{{@STT:}}` speech-to-text dictation (engine selector: Whisper/Voxtral/Web Speech API, 11 languages, Record/Stop/Insert/Clear), `{{@Translate:}}` translation (target language selector, integrated TTS pronunciation, cloud model routing), `{{@Game:}}` game builder (AI-generated or pre-built, Canvas 2D/Three.js/P5.js, import/export HTML), `{{@Draw:}}` whiteboard (Excalidraw + Mermaid, AI diagram generation with per-card model selector + 🚀 Generate, robust JSON repair for local models, Insert/PNG/SVG export, 📚 Library Browser with 29 bundled packs in 6 categories), `{{@Tools:}}` web tools (Jina Reader scrape + Jina Search, multi-URL scraping, API key modal, Accept/Reject/Copy results), `{{@Research:}}` autonomous experiment loop (Pyodide-powered, AI-driven code optimization with live results table, keep/discard metric tracking, configurable @metric/@direction/@max_iterations), `{{Annotate:}}` canvas annotation overlay (freehand pen/highlighter/eraser/shapes, `@text:` mode renders text with **Pretext-style scanline reflow** — text flows live around strokes in real-time, 📖 Present button switches to live reading/drawing mode, ↩ Undo / 🗑 Clear / 📥 PNG export), `{{@Vision:}}` omni-modal analysis (**Gemma 4 E2B/E4B** local model — image, audio, video frame extraction, text; 📷 camera capture + 📎 upload; video auto-extracted as 4 JPEG keyframes via Canvas; cyan-themed card; dynamic model switching — auto-swaps to Gemma 4 then restores prior model) — `@` prefix syntax on all tag types + metadata fields (`@name`, `@use`, `@think`, `@search`, `@prompt`, `@step`, `@upload`, `@model`, `@engine`, `@lang`, `@prebuilt`); `@model:` field persists selected model per card with intelligent defaults (OCR→`granite-docling`, TTS→`kokoro-tts`, STT→`voxtral-stt`, Image→`imagen-ultra`, Vision→`gemma4-e2b`); editable `@prompt:` textarea and `@step:` inputs in preview cards; description/prompt separation (bare text = label, `@prompt:` = AI instruction); 📎 image/PDF upload for multimodal vision analysis; per-card model selector with document-portable model persistence, concurrent block operations |
 | **🔌 API Calls** | `{{API:}}` REST API integration — GET/POST/PUT/DELETE methods, custom headers, JSON body, response stored in `$(api_varName)` variables; inline review panel; toolbar GET/POST buttons |
 | **🔗 Agent Flow** | `{{Agent:}}` multi-step pipeline — define Step 1/2/3, chain outputs, per-card model + search provider selector, live step status indicators (⏳/✅/❌), review combined output; `@cloud: yes/no` with ☁️ Cloud / 🖥️ Local badge; `@agenttype:` dropdown selector (openclaw/openfang) for external agents; Docker-based local execution via `agent-runner/server.js`; Agent Execution Settings UI (Codespaces/Local Docker/Custom endpoint); GitHub Codespaces cloud execution via ☁️ toggle; **📦 Agent Containers panel** — floating toolbar panel showing running Docker containers with live status, uptime, instant stop (`docker rm -f`), badge count, daemon readiness check, startup container recovery, and floating toggle accessible when header is fully hidden |
 | **🔍 Web Search** | Toggle web search for AI — 7 providers: DuckDuckGo (free), Brave Search, Serper.dev, Tavily (AI-optimized), Google CSE, Wikipedia, Wikidata; search results injected into LLM context; source citations in responses; per-agent-card search provider selector |
@@ -69,6 +69,8 @@ TextAgent includes a built-in AI assistant panel with **three local model sizes*
 | **Granite Docling (258M)** | Local (WebGPU/WASM) | 🔒 Private — document OCR · ~500 MB | 📄 Document |
 | **Florence-2 (230M)** | Local (WebGPU/WASM) | 🔒 Private — OCR + captioning · ~230 MB | 📷 Vision |
 | **GLM-OCR (1.5B)** | Local (WebGPU) | 🔒 Private — Advanced OCR · ~650 MB | 📷 Advanced OCR |
+| **Gemma 4 E2B** | Local (WebGPU/WASM) | 🔒 Private — omni-modal vision · ~2 GB | 👁️ Vision |
+| **Gemma 4 E4B** | Local (WebGPU/WASM) | 🔒 Private — omni-modal vision · ~4 GB | 👁️ Vision Pro |
 
 **AI Actions:** Summarize · Expand · Rephrase · Fix Grammar · Explain · Simplify · Auto-complete · Generate Markdown · Polish · Formalize · Elaborate · Shorten
 
@@ -544,6 +546,7 @@ TextAgent has undergone significant evolution since its inception. What started
 
 | Date | Commits | Feature / Update |
 |------|---------|-----------------:|
+| **2026-04-03** | — | 👁️ **Gemma 4 Vision Tag** — new `{{@Vision:}}` DocGen tag backed by Gemma 4 E2B/E4B running locally via WebGPU/WASM; `ai-worker-gemma4.js` Web Worker with `Gemma4Processor` instantiation (bypasses `AutoProcessor` which lacks `image_processor_type`) and system-prompt persona fix; primary `onnx-community/gemma-4-E2B-it-ONNX` with `textagent/gemma-4-E4B-it-ONNX` fallback; cyan-themed Vision card with 📷 camera capture + 📎 omni-modal upload (image/audio/video); **video frame extraction** — `extractVideoFrames()` seeks 4 evenly-spaced timestamps in a hidden `<video>` element, draws each to Canvas at max 1280px, stores as JPEG 0.85; audio stored as direct base64; upload handler detects Vision card type, sets `accept="image/*,audio/*,video/*"`; generation handler maps attachments to typed inputs and calls `switchToModel('gemma4-e2b')` before execution, restores prior model after; 👁️ Vision toolbar button in AI Tags dropdown; Fixed: Vision card double-rendering raw `@upload:` / `@prompt:` lines caused by broken `\\s*` regex (quadruple-escaped) — now correct `\s*`; removed duplicate static text row; `gemma4-e2b` / `gemma4-e4b` entries in `ai-models.js` with `isDocModel: true` + `supportsVision: true` |
 | **2026-04-02** | `55538f3` | 🔧 **DocGen Reject Block Fix** — fixed: rejecting a generated Translate/OCR/TTS/STT/Image/AI block restored a generic hardcoded "AI Generate" card, losing all type-specific UI (language dropdown, mode pills, camera button, step inputs, etc.); reject handler now calls `M.transformDocgenMarkdown(block.fullMatch)` to re-render the exact original typed card with all controls intact; `data-ai-index` patched on restored card and all children; review panel header label+icon now shows correct type for all blocks ("🌐 Translate — Review", "🔍 OCR Scan — Review", etc.) instead of always "✨ AI Generate — Review" |
 | **2026-04-02** | `f012a30`, `971de55` | ⚡ **Share Link Loading Overlay** — eliminates the flash of bare UI when opening shared links (`#space=`, `#s=`, `#id=`, `#d=`); full-screen branded loading splash (TextAgent logo + spinner) activates before any JS loads via inline hash detection in `<head>`; theme-aware (dark `#0d1117` / light `#f6f8fa`); fades out smoothly (0.35s) once content is ready; `hideShareLoader()` called at every terminal path (success, error, form-gate); 15-second safety timeout auto-dismisses on network failure |
 | **2026-04-02** | `6f495fe` | ✏️ **Annotate DocGen + Pretext Reflow Engine** — new `{{Annotate:}}` DocGen tag with canvas-based freehand annotation overlay; `@text:` source mode renders text on-canvas with real-time Pretext-style scanline reflow (~0.5–1.5ms/frame, O(width) per row); freehand strokes build an offscreen mask; per-row `getImageData` scans free x-intervals; words packed into intervals via `canvas.measureText()` (same arithmetic as Pretext `layoutNextLine()`); tools: pen, highlighter, eraser, line, arrow, rect, circle; color swatches + size slider; `📖 Present` button calls `M.setViewMode('preview')` — hides editor, annotation stays fully drawable while reading; `↩ Undo` / `🗑 Clear` / `📥 PNG` actions; Fixed: `data-text` stripped by DOMPurify — added `data-text`, `data-reflow` to `ADD_ATTR` + hidden `<span class="ann-reflow-text">` textContent as primary storage (survives sanitization without whitelisting); `annotate-docgen.js` (~710 lines) + `annotate-docgen.css` (~340 lines); interactive demo: `public/pretext-reflow-demo.html` — 4-tab demo (Float Image, Draw Exclusion, Both Together, API explainer) |