Skip to content

davidackerman/tourguide

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

305 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Neuroglancer Tourguide

A 3D microscopy viewer with built-in structured-data browsing, plain-English queries powered by Claude / Gemini / OpenAI / local Ollama, and Python analysis (mesh-based or voxel-based, in-browser or on a cloud backend).

This repo contains two flavors of tourguide. Pick the one that fits your situation.


🌐 Web tourguide web-app/

Static web app — Neuroglancer embedded in the page, AI agent for natural-language queries and analysis, share links, optional cloud compute backend. Anyone can use it; nothing to install.

Best for: trying things out, sharing views with collaborators, exploratory analysis, day-to-day use, sharing data with anyone via a URL.

What it does:

  • Loads zarr / n5 / Neuroglancer precomputed datasets directly from S3 / GCS / local folders
  • Natural-language queries against organelle CSVs ("show the largest mito", "plot volume distributions")
  • Agent-generated Python analysis (regionprops, cc3d, custom code) — in-browser via Pyodide or on the HF Space for bigger volumes
  • Share-link with NG state + computed tables embedded; persists across browser refreshes
  • One-click "Copy NG link" for sharing just the viewer state with non-tourguide users
  • Bring your own AI key (Gemini free tier works great), or run an in-browser model via WebLLM
  • Can be run fully on-prem (vite preview + local uvicorn for analysis + local Ollama for LLM) — no cloud required

🖥️ Sidecar tourguide server/

Python service that runs alongside a local Neuroglancer process, streams screenshots, narrates them with local TTS, and records narrated tour videos. Originally the only flavor; preserved for the workflows the web app doesn't (yet) cover.

Best for: making narrated tour movies, voice cloning with Chatterbox, fully on-prem GPU workflows, batch tour generation.

What it does (in addition to the web app's features):

  • Voice narration with Chatterbox cloning (GPU TTS)
  • Movie recording with synchronized narration + multiple transition modes
  • Local Ollama integration on a GPU box (the web app supports this too via the OpenAI-compatible backend; the sidecar adds Janelia-cluster-friendly conventions on top)

Setup + usage instructions are below ⬇


Features

  • Live Screenshot Streaming: Debounced 0.1-5 fps JPEG streaming
  • State Tracking: Position, zoom, orientation, layer visibility, and segment selection
  • WebSocket Updates: Real-time updates to browser panel
  • AI Narration: Context-aware descriptions using cloud (Gemini/Claude) or local (Ollama) AI
  • Natural Language Query: Ask questions about organelles in plain English
  • Agent-Driven Visualization: AI interprets queries to show/hide segments intelligently
  • AI-Powered Analysis Mode: Generate and execute Python code for data analysis via natural language
  • Voice Synthesis: Browser-based TTS or edge-tts with multiple voices
  • Movie Recording: Record navigation sessions with synchronized narration
  • Multiple Transition Modes: Direct cuts, crossfade, or smooth state interpolation
  • Responsive UI: Clean dark theme with status indicators and narration history
  • Explore Mode with Verbose Logging: Real-time progress tracking shows screenshot capture, AI narration generation, and audio synthesis status

Quick Start

Installation with pixi (recommended)

# Install dependencies with pixi
pixi install

# Start the server
pixi run start

# Or with custom settings
pixi run python server/main.py --ng-port 9999 --web-port 8090 --fps 2

Alternative: Installation with pip

# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install dependencies
pip install -r server/requirements.txt

# Start the server
python server/main.py

Usage

Just open one URL: http://localhost:8090/

The web panel now includes:

  • Embedded Neuroglancer viewer (left) with sample EM data pre-loaded
  • Explore Mode (default, right panel):
    • Screenshots tab: Live screenshots with AI narrations as you navigate
    • Verbose Log tab: Real-time progress tracking (📸 Screenshot captured → 📤 Sent to AI → ⏳ Waiting → ✅ Narration received → 🔊 Audio generated)
  • Query Mode: Natural language questions about organelles with AI-driven visualization
  • State tracking: Position, zoom, layers, selections
  • Recording controls: Capture and compile narrated tours with multiple transition modes

Navigate in the embedded viewer and watch the live stream update automatically!

Natural Language Queries

Ask questions about organelles in plain English:

Examples:

  • "show the largest mitochondrion"
  • "how many nuclei are there?"
  • "take me to the smallest peroxisome"
  • "show mitochondria larger than 1e11 nm³"
  • "also show nucleus 5" (adds to current selection)
  • "hide all mitochondria" (removes from view)

The AI agent:

  1. Converts your question to SQL
  2. Queries the organelle database
  3. Interprets the results based on query semantics
  4. Updates the visualization intelligently
  5. Provides a natural language answer with voice narration

See AGENT_DRIVEN_VISUALIZATION.md for technical details.

Analysis Mode

Switch to Analysis Mode to generate and execute Python code for data analysis using natural language:

Examples:

  • "Plot the volume distribution of mitochondria"
  • "Show me a histogram of nucleus sizes"
  • "Create a scatter plot comparing mitochondria volume vs surface area"

The AI analysis agent:

  1. Converts your question to Python code
  2. Executes the code in a sandboxed container (Docker or Apptainer)
  3. Displays generated plots and statistics
  4. Tracks session metadata and timing information

Container Support:

  • Docker: Default for most systems
  • Apptainer: Automatic fallback for HPC/cluster environments

See ANALYSIS_MODE.md for technical details and API documentation.

Recording Tours

  1. Start Recording: Click "Start Recording" to begin capturing frames
  2. Navigate: Explore the dataset - narration triggers automatically on significant view changes
  3. Stop Recording: Click "Stop Recording" when done
  4. Create Movie: Choose transition style and click "Create Movie"
    • Direct Cuts: Instant transitions with 2-second silent pauses
    • Crossfade: Smooth dissolve transitions between views
    • State Interpolation: Neuroglancer renders smooth camera movements

Movies are saved to recordings/<session_id>/output/movie.mp4 with:

  • 960x540 resolution
  • Frame duration matches audio narration length
  • 2-second silent transitions between narrations
  • Synchronized audio track

See QUICKSTART.md for detailed usage guide.

Architecture

Stage 1: State Capture ✅

  • Neuroglancer viewer with state change callbacks
  • Summarizes position, zoom, orientation, layers, and selections
  • Filters meaningful changes to avoid spam

Stage 2: Screenshot Loop ✅

  • Background thread captures screenshots when viewer state is "dirty"
  • Converts PNG to JPEG for bandwidth efficiency
  • Debounced to max 2 fps (configurable)

Stage 3: WebSocket Streaming ✅

  • FastAPI server with WebSocket endpoint
  • Sends {type: "frame", jpeg_b64: "...", state: {...}} messages
  • Browser displays live frames and state summary

Stage 4: AI Narrator ✅

  • Triggers narration on meaningful state changes
  • Uses Gemini, Claude, or local Ollama to describe current view
  • Context-aware prompts for EM/neuroanatomy
  • Real-time WebSocket broadcasting to all clients
  • Configurable thresholds and intervals

Stage 5: Voice & TTS ✅

  • Browser-based TTS or edge-tts with multiple voices
  • Automatic audio playback in browser
  • Audio synchronized with narration display
  • Saved to recordings for movie compilation

Stage 6: Movie Recording ✅

  • Record navigation sessions with frame capture
  • Three transition modes: cuts, crossfade, interpolation
  • Frame duration matches narration audio length
  • 2-second silent transitions between narrations
  • FFmpeg-based video compilation with audio sync
  • Neuroglancer video_tool integration for smooth camera movements

Stage 7: Natural Language Query System ✅

  • SQLite database for organelle metadata (volume, position, etc.)
  • AI-powered natural language to SQL conversion
  • Multi-query support with automatic splitting
  • Intent classification: navigation, visualization, or informational
  • Agent-driven visualization state updates
  • Semantic understanding: "show X" vs "also show X" vs "hide X"
  • Context-aware command generation using current viewer state

Stage 8: Analysis Mode ✅

  • Natural language to Python code generation
  • Sandboxed code execution (Docker/Apptainer)
  • Interactive plot generation and visualization
  • Session metadata tracking with timing breakdown
  • Comprehensive results management with REST API
  • Automatic container detection for HPC environments

Project Structure

tourguide/
├── server/
│   ├── main.py             # Entry point
│   ├── ng.py               # Neuroglancer viewer + state tracking
│   ├── stream.py           # FastAPI WebSocket server + query/analysis endpoints
│   ├── narrator.py         # AI narration engine
│   ├── query_agent.py      # Natural language query agent
│   ├── analysis_agent.py   # Natural language to Python code agent
│   ├── docker_sandbox.py   # Docker container sandbox
│   ├── apptainer_sandbox.py # Apptainer container sandbox
│   ├── analysis_results.py # Analysis session metadata manager
│   ├── organelle_db.py     # SQLite database for organelle metadata
│   ├── recording.py        # Movie recording and compilation
│   └── requirements.txt    # Legacy pip requirements
├── web/
│   ├── index.html      # Web UI with recording and analysis controls
│   ├── app.js          # WebSocket client + recording + analysis logic
│   ├── style.css       # Styling with spinner animations
│   └── ng-screenshot-handler.js  # Neuroglancer screenshot capture
├── organelle_data/     # Organelle CSV files and database (gitignored)
├── analysis_results/   # Analysis session outputs (gitignored)
├── containers/         # Container images (gitignored)
├── recordings/         # Recorded sessions (auto-created)
├── pixi.toml           # Pixi environment config
├── AGENT_DRIVEN_VISUALIZATION.md  # Agent visualization docs
├── ANALYSIS_MODE.md    # Analysis mode documentation
└── README.md

Configuration

Command-line Arguments

--ng-host HOST        Neuroglancer bind address (default: 127.0.0.1)
--ng-port PORT        Neuroglancer port (default: 9999)
--web-host HOST       Web server bind address (default: 0.0.0.0)
--web-port PORT       Web server port (default: 8090)
--fps FPS             Maximum screenshot frame rate (default: 2)

Development Stages

  • Stage 0: Repository structure
  • Stage 1: Neuroglancer state capture
  • Stage 2: Screenshot loop
  • Stage 3: WebSocket streaming
  • Stage 4: AI narrator
  • Stage 5: Voice/TTS
  • Stage 6: Movie recording and compilation
  • Stage 7: Natural language query system with agent-driven visualization
  • Stage 8: Analysis mode with AI code generation and sandboxed execution
  • Stage 9: Quality upgrades (ROI crop, advanced UI controls)

Using AI Narration

Option 1: Cloud AI (Gemini - Recommended)

  1. Get a free API key from https://aistudio.google.com/app/apikey

  2. Create a .env file:

    cp .env.example .env
  3. Add your API key to .env:

    GOOGLE_API_KEY=your_api_key_here
  4. Start the server:

    pixi run start

Option 2: Local AI (Ollama + Kokoro TTS - No API Key!)

For completely local, private, and free AI narration with voice:

  1. Install Ollama from ollama.com

  2. Download the vision model:

    ollama pull llama3.2-vision
  3. Install TTS (optional):

    pixi run pip install kokoro soundfile sounddevice
  4. Enable local mode in .env:

    USE_LOCAL=true
  5. Start the server:

    pixi run start

See LOCAL_SETUP.md for detailed local setup instructions.

Option 3: Cloud AI (Claude/Anthropic)

Use ANTHROPIC_API_KEY in .env instead of GOOGLE_API_KEY.


Navigate in Neuroglancer and watch the AI narrate your exploration in real-time!

Running on GPU Cluster (LSF/H100)

To run on a GPU cluster node, use mode=shared when requesting GPUs:

bsub -P cellmap -n 12 -gpu "num=1:mode=shared" -q gpu_h100 -Is /bin/bash

Important: The mode=shared parameter is required! Without it, the GPU will be in exclusive mode, preventing both PyTorch (Chatterbox) and Ollama from using the GPU simultaneously.

Once on the node, run the application normally:

pixi run start

See CLUSTER_TROUBLESHOOTING.md for detailed cluster setup and troubleshooting.

Requirements

  • Python 3.10+
  • FastAPI & Uvicorn
  • Pillow
  • Neuroglancer
  • FFmpeg (for movie compilation)
  • edge-tts (for voice synthesis, optional)

License

GNU General Public License v3.0 — see LICENSE for details.

Tourguide depends on zmesh, cc3d, fastmorph, edt, and kimimaro from the Seung Lab, which are GPL-3.0; the combined work is therefore GPL-3.0.

About

No description or website provided.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors