Learn DataJoint by building real pipelines.
These tutorials guide you through building data pipelines step by step. Each tutorial is a Jupyter notebook that you can run interactively. Start with the basics and progress to domain-specific and advanced topics.
Install DataJoint:
pip install datajointConfigure database credentials in your project (see Configuration):
# Create datajoint.json for non-sensitive settings
echo '{"database": {"host": "localhost", "port": 3306}}' > datajoint.json
# Create secrets directory for credentials
mkdir -p .secrets
echo "root" > .secrets/database.user
echo "password" > .secrets/database.passwordDefine and populate a simple pipeline:
import datajoint as dj
schema = dj.Schema('my_pipeline')
@schema
class Subject(dj.Manual):
definition = """
subject_id : int32
---
name : varchar(100)
date_of_birth : date
"""
@schema
class Session(dj.Manual):
definition = """
-> Subject
session_idx : int16
---
session_date : date
"""
@schema
class SessionAnalysis(dj.Computed):
definition = """
-> Session
---
result : float64
"""
def make(self, key):
# Compute result for this session
self.insert1({**key, 'result': 42.0})
# Insert data
Subject.insert1({'subject_id': 1, 'name': 'M001', 'date_of_birth': '2026-01-15'})
Session.insert1({'subject_id': 1, 'session_idx': 1, 'session_date': '2026-01-06'})
# Run computations
SessionAnalysis.populate()Continue learning with the structured tutorials below.
Choose your learning path based on your goals:
Goal: Understand core concepts and build your first pipeline
Path:
- First Pipeline — 30 min — Tables, queries, four core operations
- Schema Design — 45 min — Primary keys, relationships, table tiers
- Data Entry — 30 min — Inserting and managing data
- Queries — 45 min — Operators, restrictions, projections
- Try an example: University Database — Complete pipeline with realistic data
Next: Read Relational Workflow Model to understand the conceptual foundation.
Goal: Create automated, scalable data processing workflows
Prerequisites: Complete basics above or have equivalent experience
Path:
- Computation — Automated processing with Imported/Computed tables
- Object Storage — Handle large data (arrays, files, images)
- Distributed Computing — Multi-worker parallel execution
- Practice: Fractal Pipeline or Blob Detection
Next:
- Run Computations — populate() usage patterns
- Distributed Computing — Cluster deployment
- Handle Errors — Job management and recovery
Goal: Build scientific data pipelines for your field
Prerequisites: Complete basics, understand computation model
Production Software: DataJoint Elements
Standard pipelines for neurophysiology experiments, actively used in many labs worldwide. These are not tutorials—they are production-ready modular pipelines for calcium imaging, electrophysiology, array ephys, optogenetics, and more.
Learning tutorials (neuroscience):
- Calcium Imaging — Import movies, segment cells, extract traces
- Electrophysiology — Import recordings, spike detection, waveforms
- Allen CCF — Hierarchical brain atlas ontology
Complete demo pipeline:
- LC-MS Demo — Liquid chromatography-mass spectrometry pipeline showcasing DataJoint best practices with PostgreSQL: sample tracking, scan acquisition, mass spectral analysis, and parameterized peak detection
General patterns:
- Hotel Reservations — Booking systems with resource management
- Languages & Proficiency — Many-to-many relationships
Goal: Customize DataJoint for specialized needs
Prerequisites: Proficient with basics and production pipelines
Path:
- Custom Codecs — Create domain-specific data types
- JSON Data Type — Semi-structured data patterns
- SQL Comparison — Understand DataJoint's query algebra
Next:
- Codec API — Complete codec specification
- Create Custom Codec — Step-by-step codec development
Core concepts for getting started with DataJoint:
- First Pipeline — Tables, queries, and the four core operations
- Schema Design — Primary keys, relationships, and table tiers
- Data Entry — Inserting and managing data
- Queries — Operators and fetching results
- Computation — Imported and Computed tables
- Object Storage — Blobs, attachments, and object stores
Complete pipelines demonstrating DataJoint patterns:
- University Database — Academic records with students, courses, and grades
- Hotel Reservations — Booking system with rooms, guests, and reservations
- Languages & Proficiency — Language skills tracking with many-to-many relationships
- Fractal Pipeline — Iterative computation and parameter sweeps
- Blob Detection — Image processing with automated computation
Real-world scientific pipelines:
- Calcium Imaging — Import TIFF movies, segment cells, extract fluorescence traces
- Electrophysiology — Import recordings, detect spikes, extract waveforms
- Electrophysiology with Object Storage — Neural data with
<npy@>lazy loading - Allen CCF — Brain atlas with hierarchical region ontology
Extending DataJoint for specialized use cases:
- SQL Comparison — DataJoint for SQL users
- JSON Data Type — Semi-structured data in tables
- Distributed Computing — Multi-process and cluster workflows
- Custom Codecs — Extending the type system
# Clone the repository
git clone https://github.com/datajoint/datajoint-docs.git
cd datajoint-docs
# Start the tutorial environment
docker compose up -d
# Launch Jupyter
jupyter lab src/tutorials/All tutorials use a local MySQL database that resets between sessions.