This document specifies semantic matching for binary operators in DataJoint 2.0. Semantic matching ensures that attributes are only matched when they share both the same name and the same lineage (origin), preventing accidental matches on unrelated attributes that happen to share names. This replaces the name-based matching rules from pre-2.0 versions.
- Prevent incorrect matches on attributes that share names but represent different entities
- Enable valid operations that were previously blocked due to overly restrictive rules
- Maintain backward compatibility for well-designed schemas
- Provide clear error messages when semantic conflicts are detected
Semantic matching is enabled by default in DataJoint 2.0. For most well-designed schemas, no changes are required.
# Two tables with generic 'id' attribute
class Student(dj.Manual):
definition = """
id : int64
---
name : varchar(100)
"""
class Course(dj.Manual):
definition = """
id : int64
---
title : varchar(100)
"""
# This will raise an error because 'id' has different lineages
Student() * Course() # DataJointError!Option 1: Rename attributes using projection
Student() * Course().proj(course_id='id') # OKOption 2: Bypass semantic check (use with caution)
Student().join(Course(), semantic_check=False) # OK, but be careful!Option 3: Use descriptive names (best practice)
class Student(dj.Manual):
definition = """
student_id : int64
---
name : varchar(100)
"""If you have existing schemas created before DataJoint 2.0, rebuild their lineage tables:
import datajoint as dj
# Connect and get your schema
schema = dj.Schema('my_database')
# Rebuild lineage (do this once per schema)
schema.rebuild_lineage()
# Restart Python kernel to pick up changesImportant: If your schema references tables in other schemas, rebuild those upstream schemas first.
Rebuild the ~lineage table for all tables in this schema.
schema.rebuild_lineage()Description: Recomputes lineage for all attributes by querying FK relationships from the database's information_schema. Use this to restore lineage for schemas that predate the lineage system or after corruption.
Requirements:
- Schema must exist
- Upstream schemas (referenced via cross-schema FKs) must have their lineage rebuilt first
Side Effects:
- Creates
~lineagetable if it doesn't exist - Deletes and repopulates all lineage entries for tables in the schema
Post-Action: Restart Python kernel and reimport to pick up new lineage information.
Property indicating whether the ~lineage table exists in this schema.
if schema.lineage_table_exists:
print("Lineage tracking is enabled")Returns: bool - True if ~lineage table exists, False otherwise.
Property returning all lineage entries for the schema.
schema.lineage
# {'myschema.session.session_id': 'myschema.session.session_id',
# 'myschema.trial.session_id': 'myschema.session.session_id',
# 'myschema.trial.trial_num': 'myschema.trial.trial_num'}Returns: dict - Maps 'schema.table.attribute' to its lineage origin
Join two expressions with optional semantic checking.
result = A.join(B) # semantic_check=True (default)
result = A.join(B, semantic_check=False) # bypass semantic checkParameters:
other: Another query expression to join withsemantic_check(bool): IfTrue(default), raise error on non-homologous namesakes. IfFalse, perform natural join without lineage checking.
Raises: DataJointError if semantic_check=True and namesake attributes have different lineages.
Restrict expression with optional semantic checking.
result = A.restrict(B) # semantic_check=True (default)
result = A.restrict(B, semantic_check=False) # bypass semantic checkParameters:
other: Restriction condition (expression, dict, string, etc.)semantic_check(bool): IfTrue(default), raise error on non-homologous namesakes when restricting by another expression. IfFalse, no lineage checking.
Raises: DataJointError if semantic_check=True and namesake attributes have different lineages.
Equivalent to A.join(B, semantic_check=True).
Equivalent to A.restrict(B, semantic_check=True).
Restriction with negation. Semantic checking applies.
To bypass semantic checking: A.restrict(dj.Not(B), semantic_check=False)
Union of expressions. Requires all namesake attributes to have matching lineage.
dj.U('a', 'b') & A # Restriction: promotes a, b to PK
dj.U('a', 'b').aggr(A, ...) # Aggregation: groups by a, b
dj.U() & A # Distinct primary keys of Adj.U('a', 'b') - A # DataJointError: produces infinite set
dj.U('a', 'b') * A # DataJointError: use & insteadFor conceptual background on lineage, terminology, and matching rules, see Semantic Matching (Explanation).
Each schema has a hidden ~lineage table storing lineage information:
CREATE TABLE `schema_name`.`~lineage` (
table_name VARCHAR(64) NOT NULL,
attribute_name VARCHAR(64) NOT NULL,
lineage VARCHAR(255) NOT NULL,
PRIMARY KEY (table_name, attribute_name)
)At table declaration:
- Delete any existing lineage entries for the table
- For FK attributes: copy lineage from parent (with warning if parent lineage missing)
- For native PK attributes: set lineage to
schema.table.attribute - Native secondary attributes: no entry (lineage = None)
At table drop:
- Delete all lineage entries for the table
If ~lineage table doesn't exist:
- Warning issued during semantic check
- Semantic checking disabled (join proceeds as natural join)
If parent lineage missing during declaration:
- Warning issued
- Parent attribute used as origin
- Recommend rebuilding lineage after parent schema is fixed
The Heading class tracks whether lineage information is available:
heading.lineage_available # True if ~lineage table exists for this schemaThis property is:
- Set when heading is loaded from database
- Propagated through projections, joins, and other operations
- Used by
assert_join_compatibilityto decide whether to perform semantic checking
DataJointError: Cannot join on attribute `id`: different lineages
(university.student.id vs university.course.id).
Use .proj() to rename one of the attributes.
WARNING: Semantic check disabled: ~lineage table not found.
To enable semantic matching, rebuild lineage with: schema.rebuild_lineage()
WARNING: Lineage for `parent_db`.`parent_table`.`attr` not found
(parent schema's ~lineage table may be missing or incomplete).
Using it as origin. Once the parent schema's lineage is rebuilt,
run schema.rebuild_lineage() on this schema to correct the lineage.