|
| 1 | +# Semantic Web Integration: What We Learned |
| 2 | + |
| 3 | +A team summary of GDS + OWL/SHACL/SPARQL integration via `gds-owl`. |
| 4 | + |
| 5 | +## The Short Version |
| 6 | + |
| 7 | +We can export **85% of a GDS specification** to Turtle/RDF files and |
| 8 | +import it back losslessly. The 15% we lose is Python callables (transition |
| 9 | +functions, constraint predicates, distance functions). This is a |
| 10 | +mathematical certainty, not a gap we can close. |
| 11 | + |
| 12 | +## What Gets Exported (R1 -- Fully Representable) |
| 13 | + |
| 14 | +Everything structural round-trips perfectly through Turtle: |
| 15 | + |
| 16 | +| GDS Concept | RDF Representation | Validated By | |
| 17 | +|---|---|---| |
| 18 | +| Block names, roles, interfaces | OWL classes + properties | SHACL shapes | |
| 19 | +| Port names and type tokens | Literals on Port nodes | SHACL datatype | |
| 20 | +| Wiring topology (who connects to whom) | Wire nodes with source/target | SHACL cardinality | |
| 21 | +| Entity/StateVariable declarations | Entity + StateVariable nodes | SHACL | |
| 22 | +| TypeDef (name, python_type, units) | TypeDef node + properties | SHACL | |
| 23 | +| Space fields | SpaceField blank nodes | SHACL | |
| 24 | +| Parameter schema (names, types, bounds) | ParameterDef nodes | SHACL | |
| 25 | +| Mechanism update targets (what writes where) | UpdateMapEntry nodes | SHACL | |
| 26 | +| Admissibility dependencies (what reads what) | AdmissibilityDep nodes | SHACL | |
| 27 | +| Transition read dependencies | TransitionReadEntry nodes | SHACL | |
| 28 | +| State metric variable declarations | MetricVariableEntry nodes | SHACL | |
| 29 | +| Canonical decomposition (h = f . g) | CanonicalGDS node | SHACL | |
| 30 | +| Verification findings | Finding nodes | SHACL | |
| 31 | + |
| 32 | +**13 SHACL shapes** enforce structural correctness on the RDF graph. |
| 33 | +**7 SPARQL query templates** enable cross-node analysis (blocks by role, |
| 34 | +dependency paths, entity update maps, parameter impact, verification summaries). |
| 35 | + |
| 36 | +## What Requires SPARQL (R2 -- Structurally Representable) |
| 37 | + |
| 38 | +Some properties can't be checked by SHACL alone (which validates individual |
| 39 | +nodes) but CAN be checked by SPARQL queries over the full graph: |
| 40 | + |
| 41 | +| Property | SPARQL Feature | Why SHACL Can't | |
| 42 | +|---|---|---| |
| 43 | +| Acyclicity (G-006) | Transitive closure (`p+`) | No path traversal in SHACL-core | |
| 44 | +| Completeness (SC-001) | `FILTER NOT EXISTS` | No "for all X, exists Y" | |
| 45 | +| Determinism (SC-002) | `GROUP BY` + `HAVING` | No cross-node aggregation | |
| 46 | +| Dangling wirings (G-004) | `FILTER NOT EXISTS` | Name existence, not class membership | |
| 47 | + |
| 48 | +These all terminate (SPARQL over finite graphs always does) and are decidable. |
| 49 | + |
| 50 | +## What Cannot Be Exported (R3 -- Not Representable) |
| 51 | + |
| 52 | +These are **fundamentally** non-exportable. Not a tooling gap -- a |
| 53 | +mathematical impossibility (Rice's theorem for callables, computational |
| 54 | +class separation for string processing): |
| 55 | + |
| 56 | +| GDS Concept | Why R3 | What Happens on Export | |
| 57 | +|---|---|---| |
| 58 | +| `TypeDef.constraint` (e.g. `lambda x: x >= 0`) | Arbitrary Python callable | Exported as boolean flag `hasConstraint`; imported as `None` | |
| 59 | +| `f_behav` (transition functions) | Arbitrary computation | Not stored in GDSSpec -- user responsibility | |
| 60 | +| `AdmissibleInputConstraint.constraint` | Arbitrary callable | Exported as boolean flag; imported as `None` | |
| 61 | +| `StateMetric.distance` | Arbitrary callable | Exported as boolean flag; imported as `None` | |
| 62 | +| Auto-wiring token computation | Multi-pass string processing | Results exported (WiringIR edges); process is not | |
| 63 | +| Construction validation | Python `@model_validator` logic | Structural result preserved; validation logic is not | |
| 64 | + |
| 65 | +**Key insight:** The *results* of R3 computation are always R1. Auto-wiring |
| 66 | +produces WiringIR edges (R1). Validation produces pass/fail (R1). Only the |
| 67 | +*process* is lost. |
| 68 | + |
| 69 | +## The Boundary in One Sentence |
| 70 | + |
| 71 | +> **You can represent everything about a system except what its programs |
| 72 | +> actually do.** The canonical decomposition `h = f . g` makes this |
| 73 | +> boundary explicit: `g` (topology) and `f_struct` (update targets) are |
| 74 | +> fully representable; `f_behav` (how state actually changes) is not. |
| 75 | +
|
| 76 | +## Practical Implications |
| 77 | + |
| 78 | +### What You Can Do With the Turtle Export |
| 79 | + |
| 80 | +1. **Share specs between tools** -- any RDF-aware tool (Protege, GraphDB, |
| 81 | + Neo4j via neosemantics) can import a GDS spec |
| 82 | +2. **Validate specs without Python** -- SHACL processors (TopBraid, pySHACL) |
| 83 | + can check structural correctness |
| 84 | +3. **Query specs with SPARQL** -- find all mechanisms that update a given |
| 85 | + entity, trace dependency paths, check acyclicity |
| 86 | +4. **Version and diff specs** -- Turtle is text, diffs are meaningful |
| 87 | +5. **Cross-ecosystem interop** -- other OWL ontologies can reference GDS |
| 88 | + classes/properties |
| 89 | + |
| 90 | +### What You Cannot Do |
| 91 | + |
| 92 | +1. **Run simulations from Turtle** -- you need the Python callables back |
| 93 | +2. **Verify behavioral properties** -- "does this mechanism converge?" requires |
| 94 | + executing `f_behav` |
| 95 | +3. **Reproduce auto-wiring** -- the token overlap computation can't run in SPARQL |
| 96 | + |
| 97 | +### Round-Trip Fidelity |
| 98 | + |
| 99 | +Tested with property-based testing (Hypothesis): 100 random GDSSpecs |
| 100 | +generated, exported to Turtle, parsed back, reimported. All structural |
| 101 | +fields survive. Known lossy fields: |
| 102 | + |
| 103 | +- `TypeDef.constraint` -> `None` |
| 104 | +- `TypeDef.python_type` -> falls back to `str` for non-builtin types |
| 105 | +- `AdmissibleInputConstraint.constraint` -> `None` |
| 106 | +- `StateMetric.distance` -> `None` |
| 107 | +- Port/wire ordering -> set-based (RDF is unordered) |
| 108 | +- Blank node identity -> content-based comparison, not node ID |
| 109 | + |
| 110 | +## Numbers |
| 111 | + |
| 112 | +| Metric | Count | |
| 113 | +|---|---| |
| 114 | +| R1 concepts (fully representable) | 12 | |
| 115 | +| R2 concepts (SPARQL-needed) | 3 | |
| 116 | +| R3 concepts (not representable) | 6 | |
| 117 | +| SHACL shapes | 13 | |
| 118 | +| SPARQL templates | 7 | |
| 119 | +| Verification checks expressible in SHACL | 6 of 15 | |
| 120 | +| Verification checks expressible in SPARQL | 6 more | |
| 121 | +| Checks requiring Python | 2 of 15 | |
| 122 | +| Round-trip PBT tests | 26 | |
| 123 | +| Random specs tested | ~2,600 | |
| 124 | + |
| 125 | +## Paper Alignment |
| 126 | + |
| 127 | +The structural/behavioral split is a **framework design choice**, not a |
| 128 | +paper requirement. The GDS paper (Zargham & Shorish 2022) defines |
| 129 | +`U: X -> P(U)` as a single map; we split it into `U_struct` (dependency |
| 130 | +graph, R1) and `U_behav` (constraint predicate, R3) for ontological |
| 131 | +engineering. Same for `StateMetric` and `TransitionSignature`. The |
| 132 | +canonical decomposition `h = f . g` IS faithful to the paper. |
| 133 | + |
| 134 | +## Files |
| 135 | + |
| 136 | +- `packages/gds-owl/` -- the full export/import/SHACL/SPARQL implementation |
| 137 | +- `docs/research/formal-representability.md` -- the 800-line formal analysis |
| 138 | +- `docs/research/verification/r3-undecidability.md` -- proofs for the R3 boundary |
| 139 | +- `docs/research/verification/representability-proof.md` -- R1/R2 decidability + partition independence |
0 commit comments