Skip to content

Commit 3adc661

Browse files
docs: add restriction semantics for non-downstream nodes and OR/AND
- Unrestricted nodes are not affected by operations - Multiple restrict() calls create separate restriction sets - Delete combines sets with OR (any taint → delete) - Export combines sets with AND (all criteria → include) - Within a set, multiple FK paths combine with OR (structural) - Added open questions on lenient vs strict AND and same-table restrictions Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 5be0a01 commit 3adc661

1 file changed

Lines changed: 121 additions & 10 deletions

File tree

docs/design/restricted-diagram.md

Lines changed: 121 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -57,9 +57,112 @@ rd.delete()
5757
rd.export('/path/to/backup/')
5858
```
5959

60+
### Unrestricted nodes are not affected
61+
62+
A restricted diagram distinguishes between three kinds of nodes:
63+
64+
1. **Directly restricted** — the user applied a restriction to this node
65+
2. **Indirectly restricted** — a restriction propagated to this node from an ancestor
66+
3. **Unrestricted** — no restriction reached this node
67+
68+
**Only restricted nodes (direct or indirect) participate in operations.** Unrestricted nodes are left untouched. This is critical for delete: if you restrict `Session & 'subject=1'`, only `Session` and its downstream dependents are affected. Tables in the diagram that are not downstream of `Session` (e.g., `Equipment`, `Lab`) are not deleted.
69+
70+
The restricted diagram's `topo_sort()` for operations should only yield nodes that carry a restriction. Unrestricted nodes are effectively invisible to the operation.
71+
72+
### Multiple restrictions: OR vs AND
73+
74+
When multiple restrictions are applied to different tables in the diagram, downstream nodes may receive restrictions from multiple parents. How these combine depends on the operation.
75+
76+
**Example:** A diagram with `Lab`, `Session → Recording`. `Recording` depends on both `Session` and `Lab`.
77+
78+
```python
79+
rd = dj.Diagram(schema)
80+
rd.restrict(Session & 'subject=1') # R1 propagates to Recording
81+
rd.restrict(Lab & 'lab="brody"') # R2 propagates to Recording
82+
```
83+
84+
Recording now has two propagated restrictions:
85+
- R1: rows referencing subject=1 sessions
86+
- R2: rows referencing brody lab
87+
88+
**For delete (OR / union):** A recording should be deleted if it is tainted by *any* restricted parent. Deleting subject 1 means all their recordings go, regardless of which lab. Deleting brody lab means all their recordings go, regardless of subject. The two restrictions combine with OR.
89+
90+
**For export/publish (AND / intersection):** A recording should be exported only if it satisfies *all* criteria. You want specifically brody lab's subject 1 recordings. The two restrictions combine with AND.
91+
92+
**Implementation:** The diagram stores restrictions as separate **restriction sets**, one per `restrict()` call. Each set propagates independently. The combination logic is deferred to the operation:
93+
94+
```python
95+
class RestrictedDiagram:
96+
# Each restrict() call creates a new restriction set.
97+
# A restriction set is a dict mapping table_name → list[restriction]
98+
# (list = OR within a set, for multiple FK paths from different parents)
99+
_restriction_sets: list[dict[str, list]]
100+
101+
def restrict(self, table_expr):
102+
"""Add a new restriction set. Propagate downstream."""
103+
new_set = {}
104+
# ... propagate and populate new_set ...
105+
self._restriction_sets.append(new_set)
106+
return self
107+
108+
def _effective_restriction(self, table_name, mode="or"):
109+
"""
110+
Compute the effective restriction for a node.
111+
112+
mode="or": union across sets — row included if ANY set restricts it
113+
(for delete: tainted by any restricted parent)
114+
mode="and": intersection across sets — row included only if ALL sets restrict it
115+
(for export: must satisfy all criteria)
116+
"""
117+
sets_with_table = [s[table_name] for s in self._restriction_sets
118+
if table_name in s]
119+
if not sets_with_table:
120+
return None # unrestricted — not affected
121+
122+
if mode == "or":
123+
# Union: flatten all restriction sets into one OR-list
124+
combined = []
125+
for restriction_list in sets_with_table:
126+
combined.extend(restriction_list)
127+
return combined # list = OR in DataJoint
128+
129+
elif mode == "and":
130+
# Intersection: each set is applied as a separate AND condition
131+
# Start with the table, apply each set's restrictions sequentially
132+
# Within each set, restrictions are OR (multiple FK paths)
133+
# Across sets, restrictions are AND (multiple restrict() calls)
134+
return sets_with_table # caller applies: for s in sets: expr &= s
135+
136+
def delete(self, ...):
137+
"""Delete uses OR — any restricted parent taints the row."""
138+
for table_name in reversed(self._restricted_topo_sort()):
139+
restriction = self._effective_restriction(table_name, mode="or")
140+
...
141+
142+
def export(self, ...):
143+
"""Export uses AND — row must satisfy all criteria."""
144+
for table_name in self._restricted_topo_sort():
145+
restriction = self._effective_restriction(table_name, mode="and")
146+
...
147+
```
148+
149+
**Why this works:**
150+
151+
Within a single restriction set, multiple restrictions at the same node (from different FK paths) are always OR — a row that references a restricted parent through *any* FK is affected. This is structural and operation-independent.
152+
153+
*Across* restriction sets (separate `restrict()` calls on different tables), the combination depends on the operation. The diagram stores them separately and lets the operation choose.
154+
155+
**Edge case — node restricted in some sets but not others:**
156+
157+
For AND mode (export): if a node is downstream of restriction set R1 but not R2, what happens? The node has restrictions from R1 but none from R2. Two options:
158+
- **Strict AND**: node is excluded (no data exported) because it doesn't satisfy all criteria
159+
- **Lenient AND**: only apply AND across sets that actually reach this node
160+
161+
Lenient AND is more useful: `restrict(Session & 'subject=1') & restrict(Stimulus & 'type="visual"')` should export recordings that are from subject 1 AND use visual stimuli, but should also export the `Session` rows for subject 1 even though `Stimulus` restrictions don't propagate up to `Session`. The lenient interpretation applies AND only where multiple restriction sets converge.
162+
60163
### Restriction propagation
61164

62-
Each node in the `RestrictedDiagram` carries a list of restrictions (combined with OR for multiple FK paths from different parents).
165+
Each restriction set propagates independently through the graph. Within a set, each node carries a list of restrictions (OR-combined for multiple FK paths).
63166

64167
**Propagation rules for edge `Parent → Child` with `attr_map`:**
65168

@@ -70,7 +173,7 @@ Each node in the `RestrictedDiagram` carries a list of restrictions (combined wi
70173
Restrict child by `parent.proj(**{fk: pk for fk, pk in attr_map.items()})`.
71174

72175
3. **Multiple FK paths to the same child** (via alias nodes):
73-
Each path produces a separate restriction. These combine with OR — a child row must be deleted if it references restricted parent rows through *any* FK.
176+
Each path produces a separate restriction within the same set. These combine with OR — a child row is affected if it references restricted parent rows through *any* FK.
74177

75178
This reuses the existing restriction logic from the current `cascade()` function (lines 1082–1090 of `table.py`), but applies it upfront during graph traversal rather than reactively from error messages.
76179

@@ -198,29 +301,31 @@ def _propagate_restriction(self, parent_name, parent_restriction):
198301
### API
199302

200303
```python
201-
# From a table with restriction
304+
# From a table with restriction — single restriction set
202305
rd = dj.Diagram(Session & 'subject_id=1')
203306

204-
# Explicit restrict call
307+
# Explicit restrict call — adds a restriction set
205308
rd = dj.Diagram(schema).restrict(Session & 'subject_id=1')
206309

207-
# Operator syntax (proposed in #865)
310+
# Operator syntax (proposed in #865) — each & adds a restriction set
208311
rd = dj.Diagram(schema) & (Session & 'subject_id=1')
209312

210-
# Multiple restrictions
313+
# Multiple restrictions — two separate restriction sets
314+
# For delete (OR): delete recordings from subject 1 OR from brody lab
315+
# For export (AND): export recordings from subject 1 AND from brody lab
211316
rd = dj.Diagram(schema) & (Session & 'subject_id=1') & (Lab & 'lab="brody"')
212317

213318
# With part_integrity policy
214319
rd = dj.Diagram(schema) & (PartTable & 'key=1')
215320
rd.delete(part_integrity="cascade")
216321

217322
# Preview before executing
218-
rd.preview() # show affected tables and row counts
323+
rd.preview() # show affected tables and row counts per restriction set
219324
rd.draw() # visualize with restricted nodes highlighted
220325

221-
# Other operations
222-
rd.delete()
223-
rd.export(path) # future: #864, #560
326+
# Operations choose combination logic
327+
rd.delete() # OR across restriction sets (any taint → delete)
328+
rd.export(path) # AND across restriction sets (all criteria → export)
224329
```
225330

226331
## Advantages over current approach
@@ -290,3 +395,9 @@ rd.export(path) # future: #864, #560
290395
Eager: propagate all restrictions when `restrict()` is called (computes row counts immediately).
291396
Lazy: store parent restrictions and propagate during `delete()`/`export()` (defers queries).
292397
Eager is better for preview but may issue many queries upfront. Lazy is more efficient when the user just wants to delete without preview. Consider lazy propagation with eager option for preview.
398+
399+
5. **Lenient vs strict AND for export:**
400+
When using AND mode across restriction sets, a node may be downstream of some restriction sets but not others. Lenient AND (apply intersection only where sets converge) is more practical but harder to reason about. Strict AND (node must be restricted by all sets) is simpler but may be too aggressive. Need to validate with real export use cases.
401+
402+
6. **Restricting the same table in multiple `restrict()` calls:**
403+
If the user calls `rd.restrict(Session & 'subject=1')` then `rd.restrict(Session & 'subject=2')`, these become two restriction sets. For delete (OR): deletes subject 1 and subject 2. For export (AND): exports rows that are somehow both subject 1 and 2 (empty set). Should restricting the same table in multiple calls be treated specially — perhaps accumulating within a single set instead?

0 commit comments

Comments
 (0)