Operational guide: run services via Docker Compose, seed data, run GDS pipelines, inspect Redis and Neo4j, and operate multi-agent hunters and NL-to-Cypher.
Documentation map: README (overview), USAGE (command reference), RUNBOOK (ops, this file), CONTRIBUTING, dashboard/README, scripts/README.
Neo4j and Redis are easiest to run via Docker. Install Docker Desktop for Mac from docker.com/products/docker-desktop, then start Docker and use from the repo root:
docker compose up -d neo4j redis(On some systems the CLI is docker-compose with a hyphen; use whichever is installed.)
On macOS 12 or when only Command Line Tools are installed, brew install neo4j can fail with "A full installation of Xcode.app is required" because Neo4j depends on OpenJDK 21.
- Recommended: Use Docker (see above) so Neo4j and Redis run in containers without installing Java or Xcode on the host.
- Redis only via Homebrew: Install Redis alone:
brew install redisthenbrew services start redis. The API will still need Neo4j; use either Docker for Neo4j or one of the options below. - Neo4j without Docker: (1) Install full Xcode from the App Store and retry
brew install neo4j, or (2) Install Eclipse Temurin JDK 21 and Neo4j Community manually, or (3) Use Neo4j Aura (free tier) and setNEO4J_URI(and auth) in.envto your Aura instance.
- Install —
pip install -e .(or install from repo); ensure Python 3.9+. - Env — Copy
.env.exampleto.envand set at leastNEO4J_PASSWORD. Run./scripts/first_run.shto do this and print next steps. - Neo4j — Start Neo4j (e.g.
docker compose up -d neo4j redisor local install). Apply Phase 3 schema once:cypher-shell -u neo4j -p <pwd> < scripts/neo4j_schema_phase3.cypher. - Run hunters once —
shadowhunter hunters(orshadowhunter hunters --enhanced). Optionally pass--events-file events.jsonor--hunters PasteHunter,CryptoHunter. - Run search once (optional) — Tor on 127.0.0.1:9050, then
shadowhunter search -q "credential leak" -o report.md(use--no-llmif no LLM API key). - Start API + dashboard —
shadowhunter api(port 8000); indashboard/:npm install && npm run dev(port 3000). Use Quick Hunt and “Ask the graph” from the dashboard.
Health check: GET /api/health returns Neo4j, Redis, and hunters availability.
If you run the API with uvicorn (or python main.py api) on your machine and only need the graph/DB:
# From repo root — start only Neo4j and Redis
docker-compose up -d neo4j redisEnsure .env has (docker-compose default auth is neo4j / shadowhunter):
NEO4J_URI=bolt://localhost:7687
NEO4J_USER=neo4j
NEO4J_PASSWORD=shadowhunter
REDIS_URL=redis://localhost:6379Then start the API: python3 -m uvicorn shadowhunter_api:app --host 0.0.0.0 --port 8000. Dashboard status will show Operational once Neo4j is up.
# From repo root
docker compose up --build -d
# (or: docker-compose up --build -d)Exposes:
- ShadowHunter API: http://localhost:8000 (health:
/api/health, metrics:/metrics) - Neo4j Browser: http://localhost:7474 (Bolt:
bolt://localhost:7687) - Redis: localhost:6379
- Tor proxy: localhost:9050 (SOCKS)
Default Neo4j auth: neo4j / shadowhunter (or set NEO4J_AUTH in docker-compose.yml).
Use this checklist to confirm everything works after setup.
From repo root, with API on port 8000:
./scripts/quick_test.shRuns: health check, then NL-to-Cypher "Ask the graph" sample. Exit 0 = pass.
| Step | What to do | Pass condition |
|---|---|---|
| 1. Docker | docker compose ps |
neo4j and redis containers Up |
| 2. Health | curl -s http://localhost:8000/api/health | head -1 |
"status":"healthy" or "status":"degraded" with neo4j.status: "ok" |
| 3. API docs | Open http://localhost:8000/docs | Swagger UI loads |
| 4. Dashboard | Open http://localhost:3000 | Overview loads; System status shows Operational (or Degraded if Neo4j down) |
| 5. Ask the graph | Dashboard → Ask the graph → type "Find all threat actors linked to wallet addresses" → Ask | Cypher appears; results (or empty list) returned |
| 6. Quick Hunt | Dashboard → Quick Hunt → paste a line of text → Run | No crash; response or synthesis message |
| 7. Neo4j Browser | Open http://localhost:7474, log in neo4j / shadowhunter |
Connected; can run MATCH (n) RETURN n LIMIT 5 |
# Health (from repo root; API on 8000)
curl -s http://localhost:8000/api/health | python3 -m json.tool
# Ask the graph (NL → Cypher + results)
curl -s -X POST http://localhost:8000/api/nl-to-cypher \
-H "Content-Type: application/json" \
-d '{"query":"List all wallet addresses linked to threat actors"}' | python3 -m json.toolIf health shows neo4j.status: "ok" and redis.status: "ok", backend is good. If Ask the graph returns "cypher": "MATCH ..." and "error": null, NL-to-Cypher is working.
Labels and indexes from the hunter specs (Supply chain, AI threat, Insider, etc.) must exist in Neo4j so hunters and Ask the graph can use them.
From repo root, with Neo4j running (e.g. Docker) and password set to shadowhunter:
cypher-shell -u neo4j -p shadowhunter < scripts/neo4j_schema_phase3.cypher- If
cypher-shellis not installed, run the script from Neo4j Browser: open http://localhost:7474, log inneo4j/shadowhunter, paste sections fromscripts/neo4j_schema_phase3.cypherand execute. - Run once per database (or after a fresh Neo4j volume). Idempotent: safe to re-run (uses
IF NOT EXISTS).
Without data, Overview and Ask the graph return empty or placeholder stats. Two ways to populate:
A. Run hunters (recommended)
# From repo root
shadowhunter hunters
# Or with enhanced pipeline:
shadowhunter hunters --enhancedOptionally limit hunters or pass events: --hunters PasteHunter,CryptoHunter or --events-file events.json.
B. Ingest sample data via Redis (see §3)
Push an IngestDoc into the ingest stream; ingestion worker normalizes to STIX and hunters (e.g. PasteHunter, CryptoHunter) write nodes. Example in §3 (Ingest a sample paste).
After seeding, refresh the dashboard: Overview key metrics and Ask the graph queries should return real nodes/results (or more than zero).
By default, Ask the graph uses rule-based translation for a fixed set of example questions. To allow any natural-language question, use an LLM.
In repo root .env, set one of:
# OpenAI (or compatible API)
OPENAI_API_KEY=sk-...
OPENAI_MODEL=gpt-4o
# Optional: custom base URL (e.g. Ollama)
OPENAI_BASE_URL=http://127.0.0.1:11434/v1- Ollama (local): Install Ollama, run e.g.
ollama run llama3, then setOPENAI_BASE_URL=http://127.0.0.1:11434/v1andOPENAI_API_KEY=ollama(or any non-empty value). No cloud key required. - Restart the API after changing
.env. Ask the graph will then call the LLM to translate free-form questions to Cypher; if the LLM fails or returns "UNABLE_TO_QUERY", the rule-based fallback still runs for the built-in examples.
The Next.js dashboard must know the API base URL. In dashboard/ (not repo root):
cd dashboard
cp .env.example .env.local
# Edit .env.local and set:
# NEXT_PUBLIC_API_URL=http://localhost:8000- NEXT_PUBLIC_API_URL — No trailing slash. Use
http://localhost:8000when the API runs on the same machine; usehttp://<host>:8000if the API is elsewhere. - Optional: NEXT_PUBLIC_REPO_URL — Repo URL for footer links (README, RUNBOOK).
- Restart
npm run devafter changing.env.localso the new URL is picked up.
Create .env in repo root or set before docker-compose up:
# .env.template
NEO4J_URI=bolt://neo4j:7687
NEO4J_USER=neo4j
NEO4J_PASSWORD=shadowhunter
REDIS_URL=redis://redis:6379
TELEGRAM_API_ID= # optional, for Telegram ingestor
TELEGRAM_API_HASH= # optionalNo secrets required for MVP: ingestion/graph/agents work with defaults (I2P/Session/Briar/IPFS stubs, Neo4j/Redis from compose).
From host (Python with shadowhunter installed):
import asyncio
from shadowhunter.contracts import IngestDoc
from shadowhunter.ingestion.streams import RedisStreams, STREAM_INGEST_IN
async def seed():
streams = RedisStreams("redis://localhost:6379")
await streams.connect()
doc = IngestDoc.from_raw(
source="paste",
origin="sample-paste",
content="Donation: 4AdkPJoxn7JCvAqP31V1EBhNj4pes2u3uT5r2W2R8f2N2...",
content_type="text/plain",
)
await streams.push_ingest_doc(doc, stream=STREAM_INGEST_IN)
await streams.close()
asyncio.run(seed())Or push a sample that triggers PasteHunter → CryptoHunter (Monero-like address in content):
doc = IngestDoc.from_raw(
source="i2p",
origin="sample-eepsite",
content="Wallet: 4AdkPJoxn7JCvAqP31V1EBhNj4pes2u3uT5r2W2R8f2N2abc",
content_type="text/plain",
)
# Push to ingest.in; ingestion worker normalizes to STIX and pushes to ingest.normalized;
# PasteHunter consumes ingest.normalized, emits paste_detected → CryptoHunter writes wallet node.Use shadowhunter.crawlers.ipfs_watcher.IPFSWatcher: call inject_mock_content(cid, b"content") then await fetch(cid) for stub IngestDoc.
From host (with Neo4j and shadowhunter analytics available):
# CLI (if wired in main.py)
python main.py analytics --neo4j-uri bolt://localhost:7687 --neo4j-user neo4j --neo4j-password shadowhunterOr via graph service admin endpoint (if implemented):
curl -X POST http://localhost:8000/admin/gds/train -H "Content-Type: application/json" \
-d '{"graph_name": "threatGraph", "pipeline_name": "lpPipe"}'Steps (manual):
- Project graph:
CALL gds.graph.project('threatGraph', ['ThreatActor','Indicator','Infrastructure'], { ... }) - FastRP:
CALL gds.fastRP.mutate('threatGraph', {embeddingDimension:128, mutateProperty:'embedding'}) - Train link prediction: use
shadowhunter.analytics.link_prediction.LinkPredictor(train/predict/write) - Verify: In Neo4j Browser,
MATCH ()-[r:POTENTIAL_LINK]->() RETURN r LIMIT 10
# From host (redis-cli)
redis-cli -h localhost -p 6379
# List streams
XLEN ingest.in
XLEN ingest.normalized
XLEN hunter.events
XLEN hunter.results
# Read last N from hunter.events
XREVRANGE hunter.events + - COUNT 5
# Read last N from hunter.results
XREVRANGE hunter.results + - COUNT 5Consumer groups: ingestion workers use ingestion / worker1 on ingest.normalized; orchestrator uses agents / orchestrator on hunter.events.
- Open http://localhost:7474, connect (bolt://localhost:7687, neo4j / shadowhunter).
- Example queries:
MATCH (n) RETURN labels(n), count(*)MATCH (w:WalletAddress) RETURN w.address, w.heuristics LIMIT 10MATCH (a:ThreatActor)-[r]-(b) RETURN a, type(r), b LIMIT 20MATCH ()-[r:POTENTIAL_LINK]->() RETURN r LIMIT 10
- Ingestion pushes normalized IngestDoc to
ingest.normalized. - A bridge (or ingestion worker) wraps IngestDoc in
Event(type="stix_ingest", payload=doc)and pushes tohunter.events. - Orchestrator consumes
hunter.events; routesstix_ingestto PasteHunter,paste_detectedto CryptoHunter. - PasteHunter emits
paste_detectedtohunter.events(so orchestrator routes to CryptoHunter). - CryptoHunter writes wallet nodes to Neo4j and emits
crypto_scan_resulttohunter.results.
To run orchestrator locally (outside Docker):
import asyncio
from shadowhunter.ingestion.streams import RedisStreams
from shadowhunter.agents.orchestrator import AgentOrchestrator
from shadowhunter.database.graph_store import GraphStore
async def main():
redis = RedisStreams("redis://localhost:6379")
await redis.connect()
store = GraphStore("bolt://localhost:7687", "neo4j", "shadowhunter")
store.connect()
orch = AgentOrchestrator(redis_streams=redis, neo4j_client=store)
orch.setup_default_agents()
await orch.run() # blocking consume
asyncio.run(main())Run the full hunter pipeline (all 19 hunters, escape hatches, ThreatScoringEngine, synthesis report).
- Neo4j running and reachable (e.g.
bolt://localhost:7687). Default authneo4j/shadowhunter(or set via env /config.database). - Optional: Neo4j schema for hunter node types (LeakedAPIKey, ThreatScore, ExploitMarketListing, etc.):
Or run
# From repo root (Neo4j 5.x) cypher-shell -u neo4j -p <password> < scripts/neo4j_schema_phase3.cypher
scripts/neo4j_schema_phase3.cypherin Neo4j Browser.
# Default Neo4j (from config / env)
python main.py hunters
# Explicit Neo4j and events file
python main.py hunters --neo4j-uri bolt://localhost:7687 --neo4j-user neo4j --neo4j-password <password> --events-file events.json- Without
--events-file: A single sample event is used (wallet + email snippet) so you can see output immediately. - With
--events-file: Expect a JSON array of events; each event:source,raw_content,timestamp, optionalmetadata, optionalstix_objects.
Use --enhanced to run the EnhancedOrchestrator, which wires:
- IntelligentEventRouter — classifies events by priority tier (critical → research) and routes each event only to relevant hunters (content/source-based).
- ResourcePool — runs hunters with mode-aware concurrency (I/O-bound hunters use a semaphore; config:
io_concurrency, default 100). - CircuitBreakerManager — per-hunter resilience: after N failures a hunter is skipped until timeout (
circuit_breaker_threshold,circuit_breaker_timeoutin config). - EnhancedEscapeHatchRegistry — condition-based escape hatches (e.g. high confidence, pattern match, critical finding) with cascade depth limit (default 3).
python main.py hunters --enhanced
python main.py hunters --enhanced --events-file events.jsonLog prefix for this path: ENHANCED_ORCHESTRATOR. Same outputs (console, Neo4j, logs) as the default orchestrator.
- Console: Synthesis report summary and key findings (first 10).
- Neo4j: All hunter results with confidence ≥ 0.7 are written (MERGE by entity
id). Inspect with queries in Neo4j Browser (e.g.MATCH (t:ThreatScore) RETURN t ORDER BY t.threat_score DESC LIMIT 10). - Logs: Use project logger; filter by
ORCHESTRATOR,THREAT_SCORING_ENGINE, or hunter name for flow and debugging.
| Issue | Check |
|---|---|
| Phase 3 not available | Ensure dependencies installed (pip install -r requirements.txt); check for ImportError on startup. |
| Neo4j connection failed | --neo4j-uri / credentials; Neo4j running and Bolt enabled. |
| No findings | Try --events-file with richer content (IOCs, CVEs, “insider”, “jailbreak”, etc.) to trigger multiple hunters. |
- Translate:
from shadowhunter.nl_to_cypher import translate_nl_to_cypher; translate_nl_to_cypher("Find threat actors by IP") - Validate: Use
validate_cypher_readonly(cypher, neo4j_driver)(EXPLAIN in read-only mode). - Tests:
pytest tests/test_nl_to_cypher.py— 5 NL prompts must translate to valid Cypher.
| Issue | Check |
|---|---|
| API not healthy | docker-compose logs shadowhunter-api; ensure Neo4j/Redis are healthy first |
| No wallet nodes | Ensure sample paste contains Monero-like address; check hunter.events / hunter.results for events |
| GDS fails | Neo4j memory; ensure GDS plugin loaded; use smaller projection or sampled subgraph |
| Redis connection refused | docker-compose ps; Redis must be on same network and healthy |
- All services start via
docker-compose upwith no secrets. - Ingest pipeline accepts a sample IngestDoc and produces normalized STIX written to Neo4j.
- PasteHunter detects a wallet in sample paste and triggers CryptoHunter; CryptoHunter writes wallet node to Neo4j with heuristics.
- GDS pipeline runs against sample graph and writes at least one
:POTENTIAL_LINKedge with probability. - NL-to-Cypher translates 5/5 test NL prompts to syntactically valid Cypher that pass EXPLAIN in Neo4j.
Before pushing or tagging a release:
- No secrets in repo (
.env,config.json, API keys in code). Use.env.example/ env vars. -
README.md,USAGE.md,RUNBOOK.mdreflect current features (hunters,--enhanced, Neo4j schema, dashboard). -
scripts/neo4j_schema_phase3.cypherruns cleanly against Neo4j 5.x. - Core tests pass:
pytest tests/ -v. - Optional: run
shadowhunter hunters(andshadowhunter hunters --enhanced) once with default sample event to confirm hunter path.