Skip to content

rcvandijk1/DBGraph

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DBGraph

Turn any SQL Server or Oracle database into a navigable knowledge graph.

DBGraph introspects your database schema — tables, views, stored procedures, functions, triggers, foreign keys, and cross-object dependencies — and builds a persistent knowledge graph with community detection, god-node ranking, and an interactive HTML visualization. Ask natural-language questions about your schema, trace dependency chains, and find surprising cross-schema couplings you didn't know existed.

Inspired by graphify — the same graph pipeline, applied to databases instead of code.


What it produces

Output Description
dbgraph-out/graph.html Interactive D3.js force graph — search, filter by type, hover for details
dbgraph-out/GRAPH_REPORT.md God nodes, community map, surprising connections, suggested questions
dbgraph-out/graph.json Raw graph (GraphRAG-ready, importable into Neo4j / FalkorDB)

Quick start

pip install dbgraph[sqlserver]   # SQL Server
pip install dbgraph[oracle]      # Oracle
pip install dbgraph[all]         # both + best clustering (graspologic)

SQL Server

python -m dbgraph --sqlserver "Server=myhost;Database=mydb;UID=sa;PWD=secret"

Oracle

python -m dbgraph --oracle "user/pass@host:1521/ORCL"

Open dbgraph-out/graph.html in your browser.


As a Claude Code skill (/dbgraph)

DBGraph ships as a Claude Code skill. Once installed, type /dbgraph in any Claude Code chat to run the full pipeline interactively — with community labeling, natural-language queries, and guided exploration.

Install the skill:

# Windows
New-Item -ItemType Directory -Force "$env:USERPROFILE\.claude\skills\dbgraph"
Copy-Item "SKILL.md" "$env:USERPROFILE\.claude\skills\dbgraph\SKILL.md"

# macOS / Linux
mkdir -p ~/.claude/skills/dbgraph
cp SKILL.md ~/.claude/skills/dbgraph/SKILL.md

Then in Claude Code:

/dbgraph --sqlserver "Server=myhost;Database=mydb;UID=sa;PWD=secret"

The skill runs the full 7-step pipeline — introspect → extract → cluster → label → visualize — and guides you through exploring the result.


CLI reference

python -m dbgraph --sqlserver "..."      # SQL Server connection string
python -m dbgraph --oracle "..."         # Oracle DSN (user/pass@host:port/service)
python -m dbgraph --dsn ENV_VAR         # read connection from environment variable
python -m dbgraph --schema dbo          # limit to one schema
python -m dbgraph --schema "A,B,C"      # limit to multiple schemas
python -m dbgraph --include-columns     # add columns as graph nodes
python -m dbgraph --include-indexes     # add indexes as graph nodes
python -m dbgraph --no-procs            # skip stored procedures and functions
python -m dbgraph --mode deep           # LLM semantic enrichment of proc bodies
python -m dbgraph --cluster-only        # rerun community detection only
python -m dbgraph --no-viz              # skip HTML output
python -m dbgraph --output ./my-out     # custom output directory

python -m dbgraph query "What tables does sp_ProcessOrder touch?"
python -m dbgraph path "Orders" "InvoiceLines"
python -m dbgraph explain "sp_CalculateTax"

What gets extracted

Node types

Type Description
table Base table
view View
procedure Stored procedure
function Scalar, table-valued, or aggregate function
trigger DML/DDL trigger
schema Schema/owner container
column Column (--include-columns only)
index Index (--include-indexes only)

Edge relations

Relation Confidence Description
fk_references EXTRACTED Foreign key constraint
view_reads EXTRACTED View selects from table/view
proc_reads EXTRACTED Proc/function reads from table/view
proc_writes EXTRACTED Proc/function inserts/updates/deletes table
proc_calls EXTRACTED Proc/function calls another proc/function
trigger_on EXTRACTED Trigger fires on table
contains_column EXTRACTED Table/view owns a column
in_schema EXTRACTED Object belongs to schema
semantic_related INFERRED LLM-inferred business relationship (--mode deep)

EXTRACTED edges come from the database catalog (sys.sql_expression_dependencies / ALL_DEPENDENCIES) — definitive.
INFERRED edges are derived from regex parsing of procedure bodies — flagged as uncertain.


Community detection

DBGraph uses Leiden clustering (via graspologic) with Louvain fallback. Oversized communities are split automatically. The result is a set of named business-domain clusters like:

  • Order Processing — Orders, Bookings, Spots, related procs
  • Invoice & Financials — Invoices, InvoiceDetails, Contracts
  • Channel Configuration — Channels, ChannelGroups, SalesHouses
  • Creative Management — Copies, SpotUpload, ReplayTypes

God nodes

The most-connected objects in your schema — the tables or functions that everything else depends on. Renaming or dropping one of these without checking its 300+ dependents first is how production incidents happen.


Surprising connections

Cross-community or cross-schema edges the graph surfaces automatically. Real examples from a production media planning database:

  • aos.Creative_Create writes directly into dbo.Copies with no validation layer
  • dbo.Products is read by aos.GetProductTree across the schema boundary
  • A copy batch allocation proc references aos.MessageID (traced and confirmed as a regex false positive in this case — the audit trail tells you which edges to trust)

Installation

From source

git clone https://github.com/RobertvanDijk-Operative/DBGraph.git
cd DBGraph
pip install -e ".[sqlserver]"         # SQL Server support
pip install -e ".[oracle]"            # Oracle support
pip install -e ".[all]"               # everything

Dependencies

Package Required for
networkx Graph construction and traversal (always required)
pyodbc SQL Server ([sqlserver])
oracledb Oracle ([oracle])
graspologic Best-quality Leiden clustering ([cluster])

If graspologic is not installed, DBGraph falls back to NetworkX's built-in Louvain algorithm — results are good, just slightly less stable across runs.


Oracle notes

Oracle uses ALL_TABLES, ALL_VIEWS, ALL_PROCEDURES, ALL_SOURCE, ALL_CONSTRAINTS, ALL_DEPENDENCIES, and ALL_TRIGGERS. Procedure bodies are reconstructed by joining ALL_SOURCE rows in line order.

oracledb runs in thin mode by default — no Oracle Instant Client required. For thick mode (wallets, advanced auth), call oracledb.init_oracle_client() before connecting.


How DBGraph differs from graphify

graphify parses files (code, docs, PDFs, images) into a knowledge graph. DBGraph does the same but the input is a live database connection:

graphify DBGraph
Input File folder Database connection string
Extraction tree-sitter AST + LLM SQL catalog queries
Nodes Functions, classes, docs Tables, views, procs, triggers
Edges Imports, calls, references FK, proc_reads, proc_writes, view_reads
No-LLM mode No (LLM required for semantic) Yes — full graph with no LLM cost
LLM mode Semantic enrichment of code --mode deep: semantic enrichment of proc bodies

Honesty rules

  • FK edges are EXTRACTED from the catalog — definitive.
  • Dependency edges from sys.sql_expression_dependencies / ALL_DEPENDENCIES are EXTRACTED.
  • Edges inferred from regex parsing of proc bodies are INFERRED — treat as hints, not facts.
  • Encrypted or inaccessible proc bodies: any body-derived edges are omitted or marked INFERRED.
  • The report always shows edge confidence breakdown so you know what to trust.

License

MIT — see LICENSE.


Acknowledgements

Built on top of the graphify architecture by @safishamsi.
Community detection via graspologic (Leiden) and NetworkX (Louvain).
Visualization via D3.js.

About

Turn any SQL Server or Oracle database into a navigable knowledge graph — community detection, god nodes, cross-schema dependency tracing, interactive D3.js visualization. Claude Code skill included.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages