This document specifies the table declaration mechanism in DataJoint Python. Table declarations define the schema structure using a domain-specific language (DSL) embedded in Python class definitions.
@schema
class TableName(dj.Manual):
definition = """
# table comment
primary_attr : int32
---
secondary_attr : float64
"""| Tier | Base Class | Table Prefix | Purpose |
|---|---|---|---|
| Manual | dj.Manual |
(none) | User-entered data |
| Lookup | dj.Lookup |
# |
Reference/enumeration data |
| Imported | dj.Imported |
_ |
Data from external sources |
| Computed | dj.Computed |
__ |
Derived from other tables |
| Part | dj.Part |
master__ |
Detail records of master table |
- Format: Strict CamelCase (e.g.,
MyTable,SessionAnalysis) - Pattern:
^[A-Z][A-Za-z0-9]*$ - Conversion: CamelCase to snake_case for SQL table name
- Examples:
SessionTrial->session_trialProcessedEMG->processed_emg
- Maximum length: 64 characters (MySQL limit)
- Final name: prefix + snake_case(class_name)
- Validation: Checked at declaration time
[table_comment]
primary_key_section
---
secondary_section
# Free-form description of the table purpose
- Must be first non-empty line if present
- Starts with
# - Cannot start with
#: - Stored in MySQL table COMMENT
---
- Three or more dashes
- Separates primary key attributes (above) from secondary attributes (below)
- Required if table has secondary attributes
Each non-empty, non-comment line is one of:
- Attribute definition
- Foreign key reference
- Index declaration
!!! version-added "New in 2.1"
Singleton tables were introduced in DataJoint 2.1.
A singleton table can hold at most one row. It is declared with no attributes in the primary key section:
@schema
class Config(dj.Lookup):
definition = """
# Global configuration
---
setting1 : varchar(100)
setting2 : int32
"""Behavior:
| Operation | Result |
|---|---|
| Insert | Works without specifying a key |
| Second insert | Raises DuplicateError |
fetch1() |
Returns the single row |
heading.primary_key |
Returns [] (empty) |
Use cases:
- Global configuration settings
- Pipeline parameters
- Summary statistics
- State tracking
Implementation:
Internally, singleton tables use a hidden _singleton attribute of type bool as the primary key. This attribute is:
- Automatically created and populated
- Excluded from
heading.attributes - Excluded from
fetch()results - Excluded from join matching
attribute_name [= default_value] : type [# comment]
| Component | Required | Description |
|---|---|---|
attribute_name |
Yes | Identifier for the column |
default_value |
No | Default value (before colon) |
type |
Yes | Data type specification |
comment |
No | Documentation (after #) |
- Pattern:
^[a-z_][a-z0-9_]*$ - Start: Lowercase letter or underscore
- Contains: Lowercase letters, digits, underscores
- Convention: snake_case
3.4 Hidden Attributes
Attributes with names starting with underscore (_) are hidden:
definition = """
session_id : int32
---
result : float64
_job_start_time : datetime(3) # hidden
_job_duration : float32 # hidden
"""Behavior:
| Context | Hidden Attributes |
|---|---|
heading.attributes |
Excluded |
heading._attributes |
Included |
| Default table display | Excluded |
to_dicts() / to_pandas() |
Excluded unless explicitly projected |
| Join matching (namesakes) | Excluded |
| Dict restrictions | Excluded (silently ignored) |
| String restrictions | Included (passed to SQL) |
Accessing hidden attributes:
# Visible attributes only (default)
results = MyTable.to_dicts()
# Explicitly include hidden attributes
results = MyTable.proj('result', '_job_start_time').to_dicts()
# Or with fetch1 for single row
row = (MyTable & key).fetch1('result', '_job_start_time')
# String restriction works with hidden attributes
MyTable & "_job_start_time > '2024-01-01'"
# Dict restriction IGNORES hidden attributes
MyTable & {'_job_start_time': some_date} # no effectUse cases:
- Job metadata (
_job_start_time,_job_duration,_job_version) - Internal tracking fields
- Attributes that should not participate in automatic joins
definition = """
# Experimental session with subject and timing info
session_id : int32 # auto-assigned
---
subject_name : varchar(100) # subject identifier
trial_number = 1 : int32 # default to 1
score = null : float32 # nullable
timestamp = CURRENT_TIMESTAMP : datetime # auto-timestamp
notes = '' : varchar(4000) # empty default
"""Scientist-friendly type names with guaranteed semantics:
| Type | SQL Mapping | Size | Description |
|---|---|---|---|
int8 |
tinyint |
1 byte | 8-bit signed integer |
int16 |
tinyint unsigned |
1 byte | 8-bit unsigned integer |
int16 |
smallint |
2 bytes | 16-bit signed integer |
int32 |
smallint unsigned |
2 bytes | 16-bit unsigned integer |
int32 |
int |
4 bytes | 32-bit signed integer |
int64 |
int unsigned |
4 bytes | 32-bit unsigned integer |
int64 |
bigint |
8 bytes | 64-bit signed integer |
int64 |
bigint unsigned |
8 bytes | 64-bit unsigned integer |
float32 |
float |
4 bytes | 32-bit IEEE 754 float |
float64 |
double |
8 bytes | 64-bit IEEE 754 float |
bool |
tinyint |
1 byte | Boolean (0 or 1) |
uuid |
binary(16) |
16 bytes | UUID stored as binary |
bytes |
longblob |
Variable | Binary data (up to 4GB) |
| Type | SQL Mapping | Description |
|---|---|---|
char(N) |
char(N) |
Fixed-length string |
varchar(N) |
varchar(N) |
Variable-length string (max N) |
enum('a','b',...) |
enum(...) |
Enumerated values |
| Type | SQL Mapping | Description |
|---|---|---|
date |
date |
Date (YYYY-MM-DD) |
datetime |
datetime |
Date and time |
datetime(N) |
datetime(N) |
With fractional seconds (0-6) |
| Type | SQL Mapping | Description |
|---|---|---|
json |
json |
JSON document |
decimal(P,S) |
decimal(P,S) |
Fixed-point decimal |
These SQL types are accepted but generate a warning recommending core types:
- Integer variants:
tinyint,smallint,mediumint,bigint,integer,serial - Float variants:
float,double,real(with size specifiers) - Text variants:
tinytext,mediumtext,longtext - Blob variants:
tinyblob,smallblob,mediumblob,longblob - Temporal:
time,timestamp,year - Numeric:
numeric(P,S)
Format: <codec_name> or <codec_name@store>
| Codec | In-table dtype | In-store dtype | Purpose |
|---|---|---|---|
<blob> |
bytes |
<hash> |
Serialized Python objects |
<hash> |
N/A (in-store only) | json |
Hash-addressed deduped storage |
<attach> |
bytes |
<hash> |
File attachments with filename |
<filepath> |
N/A (in-store only) | json |
Reference to managed file |
<object> |
N/A (in-store only) | json |
Object storage (Zarr, HDF5) |
In-store storage syntax:
<blob@>- default store<blob@store_name>- named store
Core types and codecs are stored in the SQL COMMENT field for reconstruction:
COMMENT ':float32:user comment here'
COMMENT ':<blob@store>:user comment'attribute_name = default_value : type
| Value | Meaning | SQL |
|---|---|---|
null |
Nullable attribute | DEFAULT NULL |
CURRENT_TIMESTAMP |
Server timestamp | DEFAULT CURRENT_TIMESTAMP |
"string" or 'string' |
String literal | DEFAULT "string" |
123 |
Numeric literal | DEFAULT 123 |
true/false |
Boolean | DEFAULT 1/DEFAULT 0 |
These values are used without quotes in SQL:
NULLCURRENT_TIMESTAMP
score = null : float32
- The special default
null(case-insensitive) makes the attribute nullable - Nullable attributes can be omitted from INSERT
- Primary key attributes CANNOT be nullable
Blob and JSON attributes can only have null as default:
# Valid
data = null : <blob>
# Invalid - raises DataJointError
data = '' : <blob>-> [options] ReferencedTable
| Option | Effect |
|---|---|
nullable |
All inherited attributes become nullable |
unique |
Creates UNIQUE INDEX on FK attributes |
Options are comma-separated in brackets:
-> [nullable, unique] ParentTable
Foreign keys automatically inherit all primary key attributes from the referenced table:
# Parent
class Subject(dj.Manual):
definition = """
subject_id : int32
---
name : varchar(100)
"""
# Child - inherits subject_id
class Session(dj.Manual):
definition = """
-> Subject
session_id : int32
---
session_date : date
"""| Position | Effect |
|---|---|
Before --- |
FK attributes become part of primary key |
After --- |
FK attributes are secondary |
-> [nullable] OptionalParent
- Only allowed after
---(secondary) - Primary key FKs cannot be nullable
- Creates optional relationship
-> [unique] ParentTable
- Creates UNIQUE INDEX on inherited attributes
- Enforces one-to-one relationship from child perspective
-> [nullable, unique] ParentTable
- Combines nullable and unique constraints
- Multiple rows can have NULL values (SQL standard: NULLs are not considered equal in UNIQUE constraints)
- At most one row per non-NULL parent reference
- Use case: optional one-to-one relationships where the child may not reference any parent
-> Parent.proj(alias='original_name')
- Reference same table multiple times with different attribute names
- Useful for self-referential or multi-reference patterns
All foreign keys use:
ON UPDATE CASCADE- Parent key changes propagateON DELETE RESTRICT- Cannot delete parent with children
Foreign key relationships are recorded in the ~lineage table:
{
'child_attr': ('parent_schema.parent_table', 'parent_attr')
}Used for semantic attribute matching in queries.
index(attr1, attr2, ...)
unique index(attr1, attr2, ...)
definition = """
# User contact information
user_id : int32
---
first_name : varchar(50)
last_name : varchar(50)
email : varchar(100)
index(last_name, first_name)
unique index(email)
"""Indexes can include SQL expressions:
index(last_name, (YEAR(birth_date)))
- Cannot be altered after table creation (via
table.alter()) - Must reference existing attributes
@schema
class Master(dj.Manual):
definition = """
master_id : int32
"""
class Detail(dj.Part):
definition = """
-> master
detail_id : int32
---
value : float32
"""- SQL name:
master_table__part_name - Example:
experiment__trial
Within Part definition, use:
-> master(lowercase keyword)-> MasterClassName(class name)
- Parts must reference their master
- Cannot delete Part records directly (use master)
- Cannot drop Part table directly (use master)
- Part inherits master's primary key
dj.Imported- Data from external sourcesdj.Computed- Derived from other DataJoint tables
All primary key attributes must come from foreign key references.
Valid:
class Analysis(dj.Computed):
definition = """
-> Session
-> Parameter
---
result : float64
"""Invalid (by default):
class Analysis(dj.Computed):
definition = """
-> Session
analysis_id : int32 # ERROR: non-FK primary key
---
result : float64
"""Override:
dj.config['jobs.allow_new_pk_fields_in_computed_tables'] = TrueWhen config['jobs.add_job_metadata'] = True, auto-populated tables receive:
| Column | Type | Description |
|---|---|---|
_job_start_time |
datetime(3) |
Job start timestamp |
_job_duration |
float64 |
Duration in seconds |
_job_version |
varchar(64) |
Code version |
| Check | Error |
|---|---|
| Unknown type | DataJointError: Unsupported attribute type |
| Invalid attribute name | DataJointError: Declaration error |
Comment starts with : |
DataJointError: comment must not start with colon |
| Non-null blob default | DataJointError: default value for blob can only be NULL |
| Check | Error |
|---|---|
| Table name > 64 chars | DataJointError: Table name exceeds max length |
| No primary key | DataJointError: Table must have a primary key |
| Nullable primary key attr | DataJointError: Primary key attributes cannot be nullable |
| Invalid CamelCase | DataJointError: Invalid table name |
| FK resolution failure | DataJointError: Foreign key reference could not be resolved |
The table.validate() method checks:
- Required fields present
- NULL constraints satisfied
- Primary key completeness
- Codec validation (if defined)
- UUID format
- JSON serializability
CREATE TABLE `schema`.`table_name` (
`attr1` TYPE1 NOT NULL COMMENT "...",
`attr2` TYPE2 DEFAULT NULL COMMENT "...",
PRIMARY KEY (`pk1`, `pk2`),
FOREIGN KEY (`fk_attr`) REFERENCES `parent` (`pk`)
ON UPDATE CASCADE ON DELETE RESTRICT,
INDEX (`idx_attr`),
UNIQUE INDEX (`uniq_attr`)
) ENGINE=InnoDB COMMENT="table comment"Core types and codecs are preserved in comments:
`value` float NOT NULL COMMENT ":float32:measurement value"
`data` longblob DEFAULT NULL COMMENT ":<blob>:serialized data"
`archive` json DEFAULT NULL COMMENT ":<blob@cold>:in-store data"| File | Purpose |
|---|---|
declare.py |
Definition parsing, SQL generation |
heading.py |
Attribute metadata, type reconstruction |
table.py |
Base Table class, declaration interface |
user_tables.py |
Tier classes (Manual, Computed, etc.) |
schemas.py |
Schema binding, table decoration |
codecs.py |
Codec registry and resolution |
lineage.py |
Attribute lineage tracking |
Potential improvements identified for the declaration system:
- Better error messages with suggestions and context
- Import-time validation via
__init_subclass__ - Parser alternatives (regex-based for simpler grammar)
- SQL dialect abstraction for multi-database support
- Extended constraints (CHECK, custom validation)
- Migration support for schema evolution
- Definition caching for performance
- IDE tooling support via structured intermediate representation