Skip to content

Commit e2e6d01

Browse files
authored
Merge pull request #2 from VirtualFlyBrain/clare-testing
test connectivity queries and build term hierarchies
2 parents bc074cf + 4656125 commit e2e6d01

2 files changed

Lines changed: 249 additions & 68 deletions

File tree

LLM_GUIDANCE.md

Lines changed: 196 additions & 68 deletions
Original file line numberDiff line numberDiff line change
@@ -281,102 +281,223 @@ Available filter types are loaded dynamically from Solr at server startup, so th
281281
282282
---
283283
284-
## Connectivity Query Workflow
284+
## Connectivity Queries
285285
286-
**When to use:** User asks about synaptic connections between neuron types, upstream/downstream partners, or connectivity patterns.
286+
**When to use:** User asks about synaptic connections, upstream/downstream partners, connectivity patterns, or where a neuron connects.
287287
288-
**DO NOT USE for:**
289-
- Individual neuron-to-neuron connections (use `run_query` with `NeuronNeuronConnectivityQuery` instead)
290-
- Connections between muscles and neurons or sense organs and neurons
288+
**DO NOT USE for:** Connections between muscles and neurons, or sense organs and neurons.
291289
292-
### Choosing the Right Connectivity Tool
290+
There are **six** connectivity query types. Pick the right one using the decision rules below.
293291
294-
| | `query_connectivity` | `run_query` + `NeuronNeuronConnectivityQuery` |
295-
|---|---|---|
296-
| **Scope** | Neuron **class** to neuron **class** | Single **individual** neuron |
297-
| **Datasets** | Queries across **all** connectome datasets simultaneously | Single dataset (whichever the neuron belongs to) |
298-
| **Filtering** | Filter at **both** upstream AND downstream ends by class | Shows all partners of one neuron |
299-
| **Use case** | "What Tm1→T3 connections exist across all datasets?" | "What does neuron VFB_00104glj connect to?" |
300-
| **Data** | Live comparative connectomics (NOT pre-cached) | Pre-computed per-neuron results (fast) |
301-
| **Performance** | Can take minutes — use higher weight (≥50 for both-ends) or group_by_class | Fast (pre-computed) |
292+
### Step 1: Pick the Right Query
302293
303-
### Performance Notes
294+
**Rule 1 — User has an individual neuron ID (starts with `VFB_`):**
295+
- To see all synaptic partners of that neuron → use `run_query` with query_type `NeuronNeuronConnectivityQuery`
296+
- To see which brain regions that neuron connects to → use `run_query` with query_type `NeuronRegionConnectivityQuery`
297+
- To see presynaptic inputs with neurotransmitter types → use `run_query` with query_type `NeuronInputsTo`
298+
- If the user specifically asks for a class-level query (e.g. "what classes connect to neurons like this one?"), first call `get_term_info` on the VFB ID to find its neuron class (`FBbt_...` ID), then use that class ID with the queries in Rule 2.
304299
305-
- `query_connectivity` is **not pre-cached**it runs live queries across all connectome datasets, so responses can take up to several minutes
306-
- **Both-ends queries** (upstream_type AND downstream_type both set) with low weight thresholds can timeout on large neuron classes — start with `weight ≥ 50`
307-
- **Single-end queries** (only upstream_type or downstream_type) work well with `weight ≥ 10`
308-
- Use `group_by_class=true` to get faster aggregated results instead of individual neuron-to-neuron rows
309-
- Warn the user that connectivity queries may take a while before executing
300+
**Rule 2User has a neuron class (starts with `FBbt_`) or a neuron type name (e.g. "Kenyon cell"):**
301+
- To see downstream partner classes → use `run_query` with query_type `DownstreamClassConnectivity` (fast, pre-indexed)
302+
- To see upstream partner classes → use `run_query` with query_type `UpstreamClassConnectivity` (fast, pre-indexed)
303+
- To see region connectivity or neurotransmitter inputs for a neuron class → use the instance batch workflow described below
304+
- To filter by **both** upstream AND downstream class at the same time, or to retrieve results that include data from multiple connectome datasets → use `query_connectivity` (slow, live query)
310305
311-
### Workflow
306+
**Instance batch workflow — running individual neuron queries at the class level:**
307+
308+
Some queries (`NeuronRegionConnectivityQuery`, `NeuronInputsTo`) only work on individual neurons, not classes. To use them for a whole neuron class:
309+
310+
1. Get instances of the class: `run_query(id="FBbt_00003686", query_type="ListAllAvailableImages")`
311+
2. Extract the VFB IDs from the results.
312+
3. If there are many instances, tell the user how many there are and ask whether to query all of them or a subset.
313+
4. Batch-query the instances: `run_query(id=["VFB_xxx", "VFB_yyy", ...], query_type="NeuronRegionConnectivityQuery")`
314+
315+
The `run_query` tool accepts an array of IDs and runs the query on all of them in parallel. Results are returned as a JSON object keyed by `"ID::query_type"`.
316+
317+
**Rule 3 — User asks about a brain region (e.g. "What connects to the lobula?"):**
318+
- First use `search_terms` with `filter_types: ["neuron", "class"]` to find neuron classes in that region.
319+
- Then apply Rule 2 for the neuron classes found.
320+
321+
**If unsure**, start with the `run_query` options listed in Rules 1–2. They are fast and cached. Only use `query_connectivity` when dual-end class filtering is specifically needed.
322+
323+
### Summary Table
324+
325+
| Query | Input | What it returns | Speed |
326+
|-------|-------|----------------|-------|
327+
| `NeuronNeuronConnectivityQuery` | Individual neuron VFB ID | All partner neurons with input/output weights | Fast (cached) |
328+
| `NeuronRegionConnectivityQuery` | Individual neuron VFB ID | Brain regions with pre/postsynaptic terminal counts | Fast (cached) |
329+
| `NeuronInputsTo` | Individual neuron VFB ID | Presynaptic partners with neurotransmitter types and weights | Fast (cached) |
330+
| `DownstreamClassConnectivity` | Neuron class FBbt ID | Downstream partner classes with % connected, avg weight (includes data from all datasets) | Fast (pre-indexed) |
331+
| `UpstreamClassConnectivity` | Neuron class FBbt ID | Upstream partner classes with % connected, avg weight (includes data from all datasets) | Fast (pre-indexed) |
332+
| `query_connectivity` | Neuron class names or FBbt IDs | Connections between two neuron classes (includes data from all datasets) | Slow (1–5 min, live) |
333+
334+
### Step 2: Run the Query
335+
336+
#### For `run_query` connectivity queries (fast path)
337+
338+
1. Get the VFB ID or FBbt ID. If the user gave a name, use `search_terms` to find the ID first.
339+
2. Call `get_term_info` on the ID. Check that the relevant query_type appears in the `Queries` array. If it does not, either the entity does not support that query type, or there are no results for it.
340+
3. Call `run_query` with the ID and query_type.
341+
342+
**Example — individual neuron partners:**
343+
```
344+
run_query(id="VFB_00104glj", query_type="NeuronNeuronConnectivityQuery")
345+
```
346+
347+
**Example — class downstream partners:**
348+
```
349+
run_query(id="FBbt_00003686", query_type="DownstreamClassConnectivity")
350+
```
351+
352+
**Example — neuron region connectivity:**
353+
```
354+
run_query(id="VFB_00104glj", query_type="NeuronRegionConnectivityQuery")
355+
```
356+
357+
**Example — neuron inputs with neurotransmitter types:**
358+
```
359+
run_query(id="VFB_00104glj", query_type="NeuronInputsTo")
360+
```
312361
313-
1. **Parse input** — Extract parameters using this inference table:
362+
#### For `query_connectivity` (slow path — dual-end class-to-class)
314363
315-
| User says | Mode |
316-
|-----------|------|
317-
| "upstream of X", "inputs to X", "presynaptic to X" | set `downstream_type` = X |
318-
| "downstream of X", "outputs from X", "postsynaptic to X" | set `upstream_type` = X |
319-
| "between X and Y", "X to Y connections" | set both `upstream_type` = X, `downstream_type` = Y |
320-
| "all connections from X" | set `upstream_type` = X only |
321-
| "summarise by class", "aggregated" | set `group_by_class` = true |
364+
1. **Check if you really need `query_connectivity`.** If the user asks about only one direction (e.g. "what is downstream of X?" or "what are the inputs to X?"), use `DownstreamClassConnectivity` or `UpstreamClassConnectivity` via `run_query` instead — they are much faster. Only use `query_connectivity` when the user specifies **both** upstream and downstream types.
322365
323-
**Defaults:** `weight` = 5, `exclude_dbs` = ["hb", "fafb"] (unless user specifies otherwise)
366+
2. **Parse the user's request** using this table:
324367
325-
2. **Confirm parameters** — Unless user explicitly specified all parameters, show planned query and ask to confirm:
368+
| User says | Action |
369+
|-----------|--------|
370+
| "upstream of X", "inputs to X", "presynaptic to X" | Use `run_query` with `UpstreamClassConnectivity` on X. |
371+
| "downstream of X", "outputs from X", "postsynaptic to X" | Use `run_query` with `DownstreamClassConnectivity` on X. |
372+
| "between X and Y", "X to Y connections" | Use `query_connectivity` with `upstream_type` = X, `downstream_type` = Y |
373+
| "summarise by class", "aggregated" | Use `query_connectivity` with `group_by_class` = true (this option is specific to `query_connectivity`) |
374+
375+
**Defaults for `query_connectivity`:** `weight` = 5, `exclude_dbs` = ["hb", "fafb"]
376+
377+
3. **Validate neuron type names.** Use `search_terms` with `filter_types: ["neuron", "class"]` to check the name is correct. If ambiguous, show candidates and ask user to pick.
378+
379+
4. **Confirm parameters with the user before running.** This query is slow. Show:
326380
```
327381
I'll query connectivity with these parameters:
328382
- Upstream type: transmedullary neuron Tm1
329-
- Downstream type: (any)
383+
- Downstream type: T3 neuron
330384
- Min. weight: 5
331385
- Excluded DBs: hb, fafb
332386
- Group by class: No
333-
Shall I proceed, or would you like to change any of these?
387+
This query may take several minutes. Shall I proceed?
334388
```
335389
336-
3. **Validate neuron type names** — Use `search_terms` with `filter_types: ["neuron", "class"]` to validate/canonicalize labels. Skip if label is already clearly canonical (e.g., "GABAergic neuron"). If ambiguous or multiple candidates, show disambiguation list and ask user.
390+
5. **Execute** — Call `query_connectivity` with confirmed parameters.
391+
392+
**Performance rules for `query_connectivity`:**
393+
- Always start with the default `weight = 5`. There is no universal "good" weight — it varies by cell type.
394+
- Single-end queries (only upstream or only downstream set) are **slower** than both-ends queries because they return more results. If the user only cares about one direction, prefer `DownstreamClassConnectivity` or `UpstreamClassConnectivity` via `run_query` instead — they are pre-indexed and fast.
395+
- Use `group_by_class=true` for faster aggregated results.
396+
- Only use `query_connectivity` when you need both ends filtered by class.
397+
398+
### Step 3: Present Results
399+
400+
**Always start with a query summary:**
401+
```
402+
Query: NeuronNeuronConnectivityQuery for VFB_00104glj
403+
Results: 42 partner neurons (23 upstream, 19 downstream)
404+
```
405+
406+
Or for `query_connectivity`:
407+
```
408+
Query:
409+
- Upstream type: transmedullary neuron Tm1 (FBbt_00003789)
410+
- Downstream type: T3 neuron (FBbt_00047727)
411+
- Min. weight: 5
412+
- Excluded DBs: hb, fafb
413+
Results: 142 connections across 28 upstream neurons → 85 downstream neurons
414+
```
337415
338-
> **Tip:** If user asks about a brain region (e.g., "What connects to the lobula?"), first find neuron classes in that region using `search_terms`, then query connectivity for those specific classes.
416+
**Result formatting:**
417+
- **≤50 rows:** Show full table.
418+
- **>50 rows:** Show top 20 sorted by weight descending. Include summary stats (total connections, unique partners, weight range). Note that results are truncated.
339419
340-
> **Tip:** If you have a VFB neuron ID (e.g., `VFB_...`), run `get_term_info` on it and look for the `FBbt_...` class identifier; use that as `upstream_type`/`downstream_type`.
420+
**Column guide by query type:**
341421
342-
4. **Execute query** — Call `query_connectivity` with confirmed parameters.
422+
| Query type | Key columns |
423+
|------------|------------|
424+
| `NeuronNeuronConnectivityQuery` | partner label, outputs (synapses out), inputs (synapses in), tags |
425+
| `NeuronRegionConnectivityQuery` | region, presynaptic terminals, postsynaptic terminals |
426+
| `NeuronInputsTo` | presynaptic neuron name, neurotransmitter type, weight, neuron type |
427+
| `DownstreamClassConnectivity` | downstream class, total N, connected N, % connected, avg weight |
428+
| `UpstreamClassConnectivity` | upstream class, total N, connected N, % connected, avg weight |
429+
| `query_connectivity` (per-neuron) | upstream class, upstream neuron, weight, downstream neuron, downstream class, data source |
430+
| `query_connectivity` (grouped) | upstream class, downstream class, % connected, pairwise connections, avg weight |
343431
344-
5. **Handle results:**
432+
**Zero results from `query_connectivity` — try these relaxation steps in order:**
433+
1. Lower weight to 1.
434+
2. Set `exclude_dbs` to `[]` to include all datasets.
435+
3. Try `group_by_class=true`.
436+
4. Tell the user what was tried and let them decide.
345437
346-
**Per-neuron mode** (`group_by_class=false`):
347-
- Columns: upstream_class, upstream_neuron_id, upstream_neuron_name, weight, downstream_neuron_id, downstream_neuron_name, downstream_class, data_source, accession
348-
- **>50 rows:** Show total connection count, top 20 sorted by weight descending, summary stats (unique upstream neurons, unique downstream neurons, weight range)
349-
- **≤50 rows:** Show full table
438+
### Step 4: Follow-up Offers
350439
351-
**Class mode** (`group_by_class=true`):
352-
- Columns: upstream_class, downstream_class, total_upstream_count, connected_upstream_count, percent_connected, pairwise_connections, total_weight, average_weight
353-
- Present ranked by `pairwise_connections` descending
440+
- "To see full details on any neuron, I can look it up in VFB."
441+
- "To see which brain regions this neuron connects to, I can run a region connectivity query."
442+
- "To find what connects *back* to this type, I can swap upstream/downstream."
443+
- "To aggregate by neuron class, I can re-run with group_by_class=true."
444+
- "You can view any neuron at `https://v2.virtualflybrain.org/org.geppetto.frontend/geppetto?id={ID}`"
354445
355-
**Zero results — relaxation loop:**
356-
1. Lower weight to 1 → report count
357-
2. Remove exclude_dbs (include all datasets) → report count
358-
3. Try `group_by_class=true` → report count
359-
4. Show user what was tried and let them decide which relaxation to apply
446+
---
360447
361-
**Error:** Confirm neuron types with user, suggest using `search_terms` to find correct terms, retry.
448+
## Hierarchy Queries
362449
363-
6. **Output format** — Always include a resolved terms block:
364-
```
365-
Query:
366-
- Upstream type: transmedullary neuron Tm1 (FBbt_00003789)
367-
- Downstream type: (any)
368-
- Min. weight: 5
369-
- Excluded DBs: hb, fafb
370-
- Group by class: No
450+
**When to use:** User asks about the structure of a brain region, the types/subtypes of a cell class, or where something fits in the anatomical or cell type hierarchy.
371451
372-
Results: 142 connections across 28 upstream neurons → 85 downstream neurons
373-
```
452+
Use the `get_hierarchy` tool.
453+
454+
### Choosing the Parameters
455+
456+
| User asks | `relationship` | `direction` |
457+
|-----------|---------------|-------------|
458+
| "What are the parts of the mushroom body?" | `part_of` | `descendants` |
459+
| "What is the mushroom body part of?" | `part_of` | `ancestors` |
460+
| "Where does the mushroom body fit in the brain?" | `part_of` | `both` |
461+
| "What types of Kenyon cell are there?" | `subclass_of` | `descendants` |
462+
| "What class of neuron is the Kenyon cell?" | `subclass_of` | `ancestors` |
463+
| "Show me the Kenyon cell hierarchy" | `subclass_of` | `both` |
464+
465+
**Default:** Start with `max_depth=1` (direct parents/children only). If the user wants more detail, increase it. Use `max_depth=-1` with caution — broad terms can have thousands of descendants.
466+
467+
### Result Structure
468+
469+
- **Descendants** are returned as a **nested tree** for both relationship types (children contain their own children).
470+
- **Ancestors** are returned as a **nested chain** for both relationship types.
471+
- **`part_of` ancestors** are filtered to nervous system terms only (developmental lineage and generic structural terms are excluded).
472+
- **`subclass_of` ancestors** are filtered to FBbt cell types only, stopping at "cell" (cross-ontology and non-cell ancestors are excluded).
473+
474+
### Examples
475+
476+
**Brain region structure:**
477+
```
478+
get_hierarchy(id="FBbt_00005801", relationship="part_of", direction="both", max_depth=1)
479+
```
480+
481+
**Cell type hierarchy:**
482+
```
483+
get_hierarchy(id="FBbt_00003686", relationship="subclass_of", direction="both", max_depth=2)
484+
```
485+
486+
### Presenting Results
487+
488+
The response includes:
489+
- **`display`** — a pre-formatted text tree with large sibling groups shortened. Always present this directly to the user rather than reformatting the JSON.
490+
- **`display_full`** — the same text tree with no shortening. Use this if the user asks to see all terms.
491+
492+
After showing the text tree, offer the user an interactive HTML version they can open in their browser. Construct the URL using this pattern:
493+
494+
```
495+
https://v3-cached.virtualflybrain.org/get_hierarchy_html?id=<ID>&relationship=<RELATIONSHIP>&direction=<DIRECTION>&max_depth=<DEPTH>
496+
```
497+
498+
For example: `https://v3-cached.virtualflybrain.org/get_hierarchy_html?id=FBbt_00003686&relationship=subclass_of&direction=both&max_depth=2`
374499
375-
7. **Follow-up offers:**
376-
- "To get full details on any neuron, I can look it up in VFB using its ID"
377-
- "To find what connects *back* to [type], I can swap upstream/downstream and re-run"
378-
- "To aggregate these results by neuron class, I can re-run with group_by_class=true"
379-
- "You can view any neuron at `https://v2.virtualflybrain.org/org.geppetto.frontend/geppetto?id={short_form}`"
500+
The HTML page has a collapsible interactive tree with clickable links to VFB for every term.
380501
381502
---
382503
@@ -406,7 +527,10 @@ Always provide appropriate links in results:
406527
The typical flow is: **resolve** (get IDs) → **query** (get data) → **present** (format for user):
407528
- `resolve_entity` → `find_stocks`
408529
- `resolve_combination` → `find_combo_publications`
409-
- `search_terms` (validate neuron class) → `query_connectivity`
530+
- `search_terms` (find neuron class) → `run_query` with `DownstreamClassConnectivity` or `UpstreamClassConnectivity`
531+
- `search_terms` (validate neuron class) → `query_connectivity` (dual-end class-to-class)
532+
- `get_term_info` (get VFB ID) → `run_query` with `NeuronNeuronConnectivityQuery`, `NeuronRegionConnectivityQuery`, or `NeuronInputsTo`
533+
- `search_terms` (find term) → `get_hierarchy` (explore structure or taxonomy)
410534
411535
## How to Interpret Image Data
412536
@@ -547,7 +671,11 @@ A term is a template brain if its `SuperTypes` array from `get_term_info` includ
547671
- Neuron morphology: Search for neuron type with `filter_types: ["neuron"]` → get_term_info → check for SimilarMorphology
548672
- Adult neurons with images: Search with `filter_types: ["neuron", "adult", "has_image"]`, `minimize_results: true`
549673
- Brain regions: Search for anatomical terms with `filter_types: ["anatomy"]` → explore hierarchical relationships
550-
- Connectivity: Search with `filter_types: ["has_neuron_connectivity"]` → run Connectivity queries
674+
- Connectivity (individual neuron): Search with `filter_types: ["has_neuron_connectivity"]``get_term_info``run_query` with `NeuronNeuronConnectivityQuery`
675+
- Connectivity (neuron class): Search with `filter_types: ["neuron", "class"]``run_query` with `DownstreamClassConnectivity` or `UpstreamClassConnectivity`
676+
- Connectivity (class-to-class): Search with `filter_types: ["neuron", "class"]``query_connectivity` (both upstream and downstream types)
677+
- Brain region structure: Search with `filter_types: ["anatomy"]``get_hierarchy` with `relationship: "part_of"`
678+
- Cell type hierarchy: Search with `filter_types: ["neuron", "class"]``get_hierarchy` with `relationship: "subclass_of"`
551679
- Datasets: Search with `filter_types: ["dataset"]` to find available datasets
552680
- Exact term lookup: Use `auto_fetch_term_info: true` for immediate detailed information on exact matches
553681
- Exclude noise: Always consider `exclude_types: ["deprecated"]` to remove obsolete entities

0 commit comments

Comments
 (0)