Skip to content

Commit 24e9c20

Browse files
docs: add DataJoint 2.2 Instance API and thread-safe mode documentation
New pages: - explanation/whats-new-22.md: What's New in 2.2 overview - how-to/use-instances.md: Task-oriented guide for dj.Instance - reference/specs/thread-safe-mode.md: Thread-safe mode specification - tutorials/advanced/instances.ipynb: Step-by-step Instance tutorial Updated pages: - reference/configuration.md: Instance API and thread-safe mode sections - how-to/configure-database.md: Instance-based connections section - tutorials/basics/01-first-pipeline.ipynb: Instance alternative note - mkdocs.yaml: Bump version to 2.2, add nav entries Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent f99e102 commit 24e9c20

8 files changed

Lines changed: 957 additions & 28 deletions

File tree

mkdocs.yaml

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ nav:
1111
- Overview:
1212
- Data Pipelines: explanation/data-pipelines.md
1313
- What's New in 2.0: explanation/whats-new-2.md
14+
- What's New in 2.2: explanation/whats-new-22.md
1415
- FAQ: explanation/faq.md
1516
- Data Model:
1617
- Relational Workflow Model: explanation/relational-workflow-model.md
@@ -48,12 +49,14 @@ nav:
4849
- JSON Data Type: tutorials/advanced/json-type.ipynb
4950
- Distributed Computing: tutorials/advanced/distributed.ipynb
5051
- Custom Codecs: tutorials/advanced/custom-codecs.ipynb
52+
- Instances: tutorials/advanced/instances.ipynb
5153
- How-To:
5254
- how-to/index.md
5355
- Setup:
5456
- Installation: how-to/installation.md
5557
- Manage Secrets: how-to/manage-secrets.md
5658
- Configure Database: how-to/configure-database.md
59+
- Use Isolated Instances: how-to/use-instances.md
5760
- Configure Object Storage: how-to/configure-storage.md
5861
- Command-Line Interface: how-to/use-cli.md
5962
- Schema Design:
@@ -114,6 +117,8 @@ nav:
114117
- AutoPopulate: reference/specs/autopopulate.md
115118
- Job Metadata: reference/specs/job-metadata.md
116119
- Object Store Configuration: reference/specs/object-store-configuration.md
120+
- Instance & Thread Safety:
121+
- Thread-Safe Mode: reference/specs/thread-safe-mode.md
117122
- Configuration: reference/configuration.md
118123
- Definition Syntax: reference/definition-syntax.md
119124
- Operators: reference/operators.md
@@ -222,7 +227,7 @@ markdown_extensions:
222227
generic: true
223228
extra:
224229
generator: false # Disable watermark
225-
datajoint_version: "2.1" # DataJoint Python version this documentation covers
230+
datajoint_version: "2.2" # DataJoint Python version this documentation covers
226231
social:
227232
- icon: main/company-logo
228233
link: https://www.datajoint.com

src/explanation/whats-new-22.md

Lines changed: 207 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,207 @@
1+
# What's New in DataJoint 2.2
2+
3+
DataJoint 2.2 introduces **isolated instances** and **thread-safe mode** for applications that need multiple independent database connections—web servers, multi-tenant notebooks, parallel pipelines, and testing.
4+
5+
> **Upgrading from 2.0 or 2.1?** No breaking changes. All existing code using `dj.config` and `dj.Schema()` continues to work. The new Instance API is purely additive.
6+
7+
## Overview
8+
9+
DataJoint has traditionally used a global singleton pattern: one configuration (`dj.config`), one connection (`dj.conn()`), shared across all tables in a process. This works well for interactive sessions and single-user scripts, but breaks down when:
10+
11+
- A web server handles requests for different databases simultaneously
12+
- A notebook connects to production and staging databases side by side
13+
- Tests need isolated databases that don't interfere with each other
14+
- Parallel pipelines need independent connections
15+
16+
DataJoint 2.2 solves this with `dj.Instance`—an object that bundles its own configuration and connection, independent of global state.
17+
18+
## `dj.Instance` API
19+
20+
An Instance encapsulates a config and connection pair. Create one by providing database credentials directly:
21+
22+
```python
23+
import datajoint as dj
24+
25+
inst = dj.Instance(host="localhost", user="root", password="secret")
26+
```
27+
28+
Then use `inst.Schema()` instead of `dj.Schema()`:
29+
30+
```python
31+
schema = inst.Schema("my_database")
32+
33+
@schema
34+
class Experiment(dj.Manual):
35+
definition = """
36+
experiment_id : int32
37+
---
38+
description : varchar(255)
39+
"""
40+
```
41+
42+
Tables defined this way use the Instance's connection—completely independent of `dj.config` and `dj.conn()`.
43+
44+
### Instance Parameters
45+
46+
| Parameter | Type | Default | Description |
47+
|-----------|------|---------|-------------|
48+
| `host` | str || Database hostname (required) |
49+
| `user` | str || Database username (required) |
50+
| `password` | str || Database password (required) |
51+
| `port` | int | from config | Database port (default: 3306 for MySQL, 5432 for PostgreSQL) |
52+
| `use_tls` | bool or dict | `None` | TLS configuration |
53+
| `**kwargs` ||| Config overrides (e.g., `safemode=False`) |
54+
55+
### Instance Methods
56+
57+
| Method | Description |
58+
|--------|-------------|
59+
| `inst.Schema(name)` | Create a Schema bound to this Instance's connection |
60+
| `inst.FreeTable(full_name)` | Create a FreeTable bound to this Instance's connection |
61+
| `inst.config` | Access this Instance's Config object |
62+
| `inst.connection` | Access this Instance's Connection object |
63+
64+
### Config Overrides
65+
66+
Pass any config setting as a keyword argument. Use double underscores for nested settings:
67+
68+
```python
69+
inst = dj.Instance(
70+
host="localhost", user="root", password="secret",
71+
safemode=False,
72+
database__reconnect=False,
73+
)
74+
```
75+
76+
## Multiple Databases
77+
78+
Instances make it straightforward to work with multiple databases simultaneously:
79+
80+
```python
81+
production = dj.Instance(host="prod.example.com", user="analyst", password="...")
82+
staging = dj.Instance(host="staging.example.com", user="dev", password="...")
83+
84+
prod_schema = production.Schema("experiment_data")
85+
staging_schema = staging.Schema("experiment_data")
86+
87+
# Query both independently
88+
prod_data = ProdTable.to_dicts()
89+
staging_data = StagingTable.to_dicts()
90+
```
91+
92+
Each Instance maintains its own connection pool and configuration—no cross-contamination.
93+
94+
## Thread-Safe Mode
95+
96+
For applications where global state is dangerous (web servers, multi-threaded workers), enable thread-safe mode:
97+
98+
```bash
99+
export DJ_THREAD_SAFE=true
100+
```
101+
102+
When thread-safe mode is enabled:
103+
104+
- `dj.config` raises `ThreadSafetyError` on any access
105+
- `dj.conn()` raises `ThreadSafetyError`
106+
- `dj.Schema()` without an explicit connection raises `ThreadSafetyError`
107+
- Only `dj.Instance()` works, enforcing explicit connection management
108+
109+
This prevents accidental use of shared global state in concurrent environments.
110+
111+
### `ThreadSafetyError`
112+
113+
```python
114+
import os
115+
os.environ["DJ_THREAD_SAFE"] = "true"
116+
117+
import datajoint as dj
118+
119+
dj.config.database.host # raises ThreadSafetyError
120+
dj.conn() # raises ThreadSafetyError
121+
122+
# Instead, use Instance:
123+
inst = dj.Instance(host="localhost", user="root", password="secret")
124+
schema = inst.Schema("my_db") # works
125+
```
126+
127+
### Environment Variable
128+
129+
| Variable | Values | Default | Description |
130+
|----------|--------|---------|-------------|
131+
| `DJ_THREAD_SAFE` | `true`, `1`, `yes` / `false`, `0`, `no` | `false` | Enable thread-safe mode |
132+
133+
## Connection-Scoped Config
134+
135+
Each Instance carries its own `Config` object. Runtime configuration reads go through the Instance's config, not global state:
136+
137+
```python
138+
inst = dj.Instance(host="localhost", user="root", password="secret")
139+
140+
# Instance-scoped config
141+
inst.config.safemode = False
142+
inst.config.display.limit = 25
143+
144+
# Global config is unaffected
145+
print(dj.config.safemode) # still True (default)
146+
```
147+
148+
Tables created through an Instance's Schema read config from that Instance's connection, not from `dj.config`.
149+
150+
## When to Use Instances
151+
152+
| Scenario | Pattern |
153+
|----------|---------|
154+
| Interactive notebook, single database | `dj.config` + `dj.Schema()` (global pattern) |
155+
| Script connecting to one database | Either pattern works |
156+
| Web server (Flask, FastAPI, Django) | `dj.Instance()` per request/tenant |
157+
| Multi-database comparison | One `dj.Instance()` per database |
158+
| Parallel workers | `dj.Instance()` per worker + `DJ_THREAD_SAFE=true` |
159+
| Test suite | `dj.Instance()` per test for isolation |
160+
| Shared notebook server | `dj.Instance()` per user session |
161+
162+
## Comparison: Global vs Instance
163+
164+
### Global Pattern (unchanged)
165+
166+
```python
167+
import datajoint as dj
168+
169+
# Config set via environment, files, or programmatically
170+
dj.config["database.host"] = "localhost"
171+
172+
schema = dj.Schema("my_db")
173+
174+
@schema
175+
class MyTable(dj.Manual):
176+
definition = """
177+
id : int32
178+
---
179+
value : float64
180+
"""
181+
```
182+
183+
### Instance Pattern (new in 2.2)
184+
185+
```python
186+
import datajoint as dj
187+
188+
inst = dj.Instance(host="localhost", user="root", password="secret")
189+
schema = inst.Schema("my_db")
190+
191+
@schema
192+
class MyTable(dj.Manual):
193+
definition = """
194+
id : int32
195+
---
196+
value : float64
197+
"""
198+
```
199+
200+
Once a Schema is created, table definitions, inserts, queries, and all other operations work identically regardless of which pattern was used to create the Schema.
201+
202+
## See Also
203+
204+
- [Use Isolated Instances](../how-to/use-instances.md/) — Task-oriented guide
205+
- [Working with Instances](../tutorials/advanced/instances/) — Step-by-step tutorial
206+
- [Configuration Reference](../reference/configuration.md/) — Thread-safe mode settings
207+
- [Configure Database](../how-to/configure-database.md/) — Connection setup

src/how-to/configure-database.md

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -242,3 +242,32 @@ conn = dj.conn()
242242
conn.close()
243243
```
244244

245+
## Instance-Based Connections
246+
247+
!!! version-added "New in 2.2"
248+
`dj.Instance` provides isolated connections independent of global config.
249+
250+
For applications that need multiple connections or thread safety, use `dj.Instance` instead of global config:
251+
252+
```python
253+
import datajoint as dj
254+
255+
inst = dj.Instance(host="db.example.com", user="myuser", password="mypassword")
256+
schema = inst.Schema("my_schema")
257+
```
258+
259+
Each Instance has its own config and connection. This is useful for:
260+
261+
- **Web servers**: One Instance per request or tenant
262+
- **Testing**: Isolated databases per test
263+
- **Multi-database**: Connect to production and staging simultaneously
264+
- **Thread safety**: Set `DJ_THREAD_SAFE=true` to enforce Instance usage
265+
266+
```python
267+
# Multiple simultaneous connections
268+
prod = dj.Instance(host="prod.example.com", user="analyst", password="...")
269+
staging = dj.Instance(host="staging.example.com", user="dev", password="...")
270+
```
271+
272+
See [Use Isolated Instances](use-instances.md/) for a complete guide.
273+

0 commit comments

Comments
 (0)