Skip to content

Latest commit

 

History

History
173 lines (117 loc) · 16.2 KB

File metadata and controls

173 lines (117 loc) · 16.2 KB

sysmledgraph – development plan

Plan version / iteration: Iteration 2 (GenerateMap).
Change note: Start iteration 2; promote GenerateMap from §6. Scope: implement GenerateMap (MCP tool to generate .md from graph, e.g. interconnection view). §6 backlog: DetectChanges, per-path resource, multi-path doc remain.


Source of truth: SysML v2 model in this project (deploy-sysmledgraph.sysml, requirements-sysmledgraph.sysml, behaviour-sysmledgraph.sysml). Requirements R1–R8 and part/action/state definitions in the model are authoritative; this plan is the implementation guide.

Design: docs/mcp/SYSMLEDGRAPH_MCP.md (parent SystemDesign repo).

Contents: 0 Workflow · 1 Tech stack · 2 Scope · 3 Phases · 4 Steps · 5 Verification · 6 After v1 · 7 References.

Related: sysmledgraph-CODEBASE_STRUCTURE.md · sysmledgraph-INTERCONNECTION_VIEW.md · sysmledgraph-MODEL_EXAMINATION_ERROR_CONDITIONS.md · sysmledgraph-GITNEXUS_ANALYSIS.md. Gaps: maintained in the codebase (implement side) at docs/GAPS.md.


0. Modelbase ↔ implement side workflow

Workflow (step-by-step, iteration): modelbase-development-WORKFLOW.md.

This repo is the modelbase for sysmledgraph: the SysML model, this plan, and related outputs live here (sysml-v2-models/projects/sysmledgraph/). The implement side (codebase) is the implementation repo: chouswei/codebase-sysmledgraph.

  • Modelbase (this repo) generates the plan (to do): this development plan, requirements R1–R8, deploy/behaviour models, CODEBASE_STRUCTURE, and checkable steps. The plan is the implementation guide for the current iteration.
  • Implement side (codebase) implements against the plan and model, then produces an implement report and delivers it to the modelbase. The report includes:
    • Alignment — whether the implementation matches the plan and model (e.g. docs/MODELBASE_ALIGNMENT.md in the codebase).
    • Gaps — what the model has but the code does not; what the code had but the model did not; pipeline alignment; limitations (e.g. docs/GAPS.md in the codebase).
    • Traceability — plan step → implementation (e.g. docs/IMPLEMENTATION_PLAN.md in the codebase).
  • Modelbase receives the implement report, reviews it, and decides whether the iteration is done. If done, the modelbase updates this plan for the next iteration (scope, §6 After v1, or close) and updates Plan version and change note (see PLAN_TEMPLATE). Gaps are maintained in the codebase.

Overview

sysmledgraph is a path-only indexer that builds a knowledge graph from .sysml files and exposes it via MCP (query, context, impact, rename, cypher, list, clean) and CLI (analyze, list, clean). It uses a Kuzu graph and MCP/CLI patterns similar to common code indexers but indexes SysML only; grouping is SysML-native (packages, containment).

Model alignment: The SysML model defines four parts (Indexer, GraphStore, McpServer, Cli), ports and connection items, eight actions (index/query/context/impact/rename/cypher/list/clean), a lifecycle state machine (idle, indexing, ready, cleaning; error states), and a pipeline state machine IndexPipelineStates (discovering → loadOrdering → parsing → mapping → writing). Implementation should align with deploy-sysmledgraph.sysml and behaviour-sysmledgraph.sysml.

Terms: Indexer — discovers .sysml under path(s), parses them, writes Document + Symbol nodes and edges to the graph. GraphStore — abstraction over Kuzu (open DB, add nodes/edges, get connection for Cypher). Symbol→graph mapping — LSP symbol kinds → graph node labels and edge types (PartDef, IN_PACKAGE, TYPES, etc.).


1. Tech stack and libraries

Layer Choice What it does
Runtime Node.js 20+ Same as sysml-v2-lsp; MCP/CLI patterns.
Language TypeScript Type safety; same ecosystem as sysml-v2-lsp.
Graph store Kuzu (kuzudb.com) Embedded graph DB; Cypher. Node bindings. Persistence: global registry (~/.sysmledgraph/registry.json + DB per indexed root).
SysML parsing sysml-v2-lsp (stdio in v1) ANTLR4 parsing, document symbols, go-to-def, references, rename. v1: stdio client (spawn LSP, documentSymbol/definition/references)—accepted; no stable programmatic API in sysml-v2-lsp. Library mode if API becomes available later.
File discovery fast-glob or Node fs + glob Find .sysml (and optionally .kerml) under path(s). Respect config.yaml (e.g. model_files) for load order.
MCP server @modelcontextprotocol/sdk (Node) Tools and resources; server name sysmledgraph.
CLI commander (Node) Subcommands: analyze, list, clean; env or config for storage.
Load order Config-driven when present config.yaml model_files at path; else deterministic (e.g. breadth-first by path).

Not in v1: No Tree-sitter (SysML only); no clustering, embeddings, or web UI. Same Kuzu + MCP pattern and list/clean lifecycle.

sysml-v2-lsp: daltskin/sysml-v2-lsp. v1: stdio client (spawn LSP, documentSymbol/definition/references)—accepted. Library mode when/if stable API is exported. Grammar: daltskin/sysml-v2-grammar; update in LSP with make update-grammar.


2. Scope and goals

Item Description
Objective Path-only indexer of .sysml into a knowledge graph; MCP server (query, context, impact, rename, cypher); CLI (index, list, clean).
In scope Index paths → Document + Symbol nodes and SysML edges (R1–R7); Kuzu storage; load order when config present (R7); list/clean; MCP tools and resources; error reporting and graph consistency on failure (R8).
Out of scope Code-oriented features not needed for SysML (execution flows, language-specific parsers).
Success One command indexes path(s); MCP answers query/context/impact/rename/cypher; list/clean work; schema exposed; errors reported; graph not left inconsistent on index/clean failure.

3. Phases (high level)

Phases are sequential: 2 depends on 1; 3 and 4 depend on 2. Phase 3 (MCP) and 4 (CLI) can be parallelised once the core index + graph API exists.

Phase Focus Done when
1. Repo and pipeline Repo, build, SysML parser, file discovery, load order. Repo builds; parsing a folder of .sysml yields symbol list and relations (e.g. IN_PACKAGE, TYPES).
2. Graph and index Kuzu schema, GraphStore, Indexer, list/clean; R8 for index/clean. indexDbGraph(paths) populates graph; list/clean work; on failure report and leave graph unchanged; Cypher returns expected nodes/edges.
3. MCP server MCP server sysmledgraph; tools + resources; R8 tool errors. From Cursor: index, query, context, impact, rename (dry_run), cypher, list_indexed, clean_index; schema resource; errors in tool result.
4. CLI and docs CLI (analyze/list/clean), README, schema doc; R8 CLI exit/stderr. User can index from terminal; failures exit non-zero and stderr; docs match behaviour; MCP setup documented.
5. GenerateMap (iteration 2) MCP tool generate_map: read from graph (first indexed path), produce Markdown (e.g. interconnection view: documents, nodes by label, edges). Output to caller (tool result) or optional file. R8: empty graph / no index → error in result. From Cursor: call generate_map; get markdown in result; optionally save to file. Errors returned in tool result.

Test strategy: Phase 1: unit tests discovery, parser shape. Phase 2: GraphStore unit + Indexer integration; list/clean; error cases (invalid path, non-indexed clean). Phase 3–4: smoke tests MCP and CLI; error cases (empty graph, invalid path). No full E2E in v1.

Decisions (made)

  • Storage: Global storage root (default ~/.sysmledgraph, or env SYSMEDGRAPH_STORAGE_ROOT); registry.json with paths: string[]; one DB per indexed path at db/{sanitized}.kuzu. See sysmledgraph-GITNEXUS_ANALYSIS.md.
  • Parser: v1 uses stdio client (accepted); library mode if stable API available later.
  • Symbol→graph mapping: Document LSP kinds → node labels and edge types in code and README/schema doc.
  • R8: MCP = structured error in tool result; CLI = non-zero exit and stderr.

Risks and mitigations

Risk Mitigation
sysml-v2-lsp no programmatic API v1: stdio client (spawn LSP, documentSymbol/definition/references)—accepted.
Load order wrong → broken refs Use config.yaml model_files when present; else deterministic; document rule.
Schema drift Define schema in one place (code); generate or copy for resource and README.
Index/clean fails mid-way Report and do not commit partial state; leave previous graph unchanged (R8).
Kuzu DB lock One process per DB. Per-process cache; document that CLI and MCP should not use same storage concurrently. Report lock errors (R8). See sysmledgraph-MODEL_EXAMINATION_ERROR_CONDITIONS.md §6.
Worker threads v1: single-threaded (event loop); indexer sequential; one GraphStore connection per process. If worker threads added later, writes must stay serialized; see §7 (OOP and threads) in prior plan if needed.

OOP: Indexer, GraphStore, McpServer, Cli map to modules/classes; ports to interfaces. No worker threads in v1; keeps Kuzu single-threaded.


4. Checkable steps (implementation)

Phase 1 – Repo and pipeline

  1. Create implementation location (codebase repo). Layout per sysmledgraph-CODEBASE_STRUCTURE.md. Node/TypeScript: src/, bin/, mcp/, test/, docs/. Schema in code (e.g. src/graph/schema.ts) is an accepted layout variant (no top-level schema/ required). Dependencies: Kuzu, @modelcontextprotocol/sdk, commander, fast-glob. Pin Node 20+; document sysml-v2-lsp version.
  2. Build and test (e.g. tsc + vitest). Scripts: build, test, optional analyze stub.
  3. SysML parsing: sysml-v2-lsp via stdio client (v1 accepted). Parse files → symbol tables → map to graph node/edge types.
  4. File discovery: Given path(s), list .sysml (and optionally .kerml). If config.yaml at path, read model_files and model_dir for load order.
  5. Verify: Parse sample .sysml; output symbol list and relations. Document symbol→graph mapping (LSP kinds → node labels and edge types).

Phase 2 – Graph and index

  1. Graph schema in Kuzu: node labels (Document, Package, PartDef, PartUsage, …); edge types (IN_DOCUMENT, IN_PACKAGE, PARENT, TYPES, …) per design in SYSMLEDGRAPH_MCP.md and codebase docs. Create in code or migration.
  2. GraphStore: open/create Kuzu DB. API: addDocument(path, indexedAt?), addSymbol(label, props), addEdge(from, to, type), getConnection(). Storage: global root + registry + DB per path.
  3. Indexer: discovery → load order → parse → map → write. Multiple roots; define re-index behaviour (replace or merge; document). R8: On failure report and leave graph unchanged.
  4. list (registry); clean (delete DB, update registry). R8: On failure report; graph unchanged.
  5. Verify: Index a project path; list; run Cypher; confirm nodes/edges.

Phase 3 – MCP server

  1. MCP server (stdio); server name sysmledgraph.
  2. Tools: indexDbGraph (path/paths), query, context, impact, rename (dry_run), cypher, list_indexed, clean_index; generate_map (iteration 2). R8: structured error in tool result.
  3. Resources: sysmledgraph://context (stats, staleness), sysmledgraph://schema. Optional: per-path resource.
  4. Verify: From Cursor: index, query, context, impact, rename dry_run, schema resource; errors returned.

Phase 4 – CLI and docs

  1. CLI: analyze (index path(s)), list, clean (path optional). R8: non-zero exit, stderr.
  2. README: install, usage, env/config, storage, schema summary, MCP setup.
  3. Verify: CLI analyze/list/clean; docs match behaviour.

Phase 5 – GenerateMap (iteration 2)

  1. generate_map MCP tool: no required params; optional output_path (if server can write). Reads graph from first indexed path; produces Markdown: sections for Documents (path, id), Nodes by label (name, path, id), Edges (from, type, to). Returns { ok: true, markdown } or { ok: false, error }. R8: no indexed path or empty graph → error in result.
  2. Verify: Index a path; call generate_map; confirm markdown in result; optionally save to .md file.

5. Verification (before “done”)

  • Index a real project path (e.g. sysml-v2-models/projects/sysmledgraph).
  • MCP: query, context, impact; rename dry_run; schema resource; results match expected symbols/relations.
  • CLI: analyze, list, clean; docs match behaviour.
  • R8: Invalid path index, clean non-indexed path → error to caller (MCP result or CLI stderr/exit); graph unchanged. Read tools return clear error when graph empty or symbol missing.
  • GenerateMap (iteration 2): Call generate_map after indexing; markdown in result; no indexed path → error in result.

6. After v1 (backlog for next iteration)

  • DetectChanges (model action): Git diff → affected symbols. Not in codebase yet.
  • GenerateMap: In scope for iteration 2 (Phase 5, steps 18–19).
  • Multi-path semantics (documented): One DB per indexed path. Tools that need a graph use the first indexed path; no merged view. Document and test when multiple roots are indexed.
  • Per-path MCP resource: Optional; deferred.

7. References

Context7 (MCP): /modelcontextprotocol/typescript-sdk (MCP server, tools, resources); /tj/commander.js (CLI). Kuzu: kuzudb.com/docs.