Skip to content

Commit c121731

Browse files
Docs: Added section on merging files (#18)
* Added section on merging files * docs: reword a few sections of the "merge rules" --------- Co-authored-by: Dan Birman <danbirman@gmail.com>
1 parent 082e762 commit c121731

1 file changed

Lines changed: 49 additions & 1 deletion

File tree

docs/source/acquire_upload/upload.md

Lines changed: 49 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,4 +2,52 @@
22

33
[todo]
44

5-
## GatherMetadataJob
5+
## GatherMetadataJob
6+
7+
[todo]
8+
9+
### Merge rules
10+
11+
### When can multiple files be merged?
12+
13+
When data is acquired simultaneously using two or more distinct instruments (e.g., a behavior instrument and a physiology instrument), multiple `instrument.json` and/or `acquisition.json` metadata files can be provided. The GatherMetadataJob will merge these files during upload via the `aind-data-transfer-service`.
14+
15+
#### File Naming Convention
16+
17+
Each file must follow the naming pattern `<metadata_type>*.json` where `*` is any string. We recommend using modalities to organize the individual files:
18+
- `instrument_behavior.json` and `instrument_ecephys.json`
19+
- `acquisition_behavior.json` and `acquisition_ecephys.json`
20+
21+
#### Contraints
22+
23+
1. **Unique fields must match**: Certain identifier fields that should be unique across the dataset (like `subject_id` and `instrument_id`) **must have identical values** in all files being merged. If these fields conflict, the merge will fail and your upload job will be rejected.
24+
25+
2. **No shared devices, with the exception of a single shared clock**: In general, two instruments can be merged **if and only if there are no shared devices** between them. Devices are identified by their `name` field. If the same device name appears in both instrument files, they should really be defined as a single instrument, not two separate ones.
26+
27+
**Exception for clock synchronization**: When synchronizing data acquisition across multiple instruments (e.g., recording behavior and physiology simultaneously), a shared clock device is permitted. For AIND instruments this must be a [HarpDevice](https://aind-data-schema.readthedocs.io/en/latest/components/devices.html#harpdevice) configured as a clock generator (`HarpDevice.is_clock_generator=True`).
28+
29+
3. **Enable validation**: It is **strongly recommended** to turn on the `raise_if_invalid` setting in the `GatherMetadataJob` job settings. This validates that the merge will succeed *before* upload, making it much easier to identify and fix problems compared to dealing with a raw data asset with broken metadata.
30+
31+
4. **Python merging**: You can test merging locally in Python using the `+` operator:
32+
33+
```python
34+
from aind_data_schema.core.instrument import Instrument
35+
from aind_data_schema.core.acquisition import Acquisition
36+
37+
# Merge instruments
38+
instrument1 = Instrument.model_validate_json(json_string_1)
39+
instrument2 = Instrument.model_validate_json(json_string_2)
40+
merged_instrument = instrument1 + instrument2
41+
42+
# Merge acquisitions
43+
acquisition1 = Acquisition.model_validate_json(json_string_1)
44+
acquisition2 = Acquisition.model_validate_json(json_string_2)
45+
merged_acquisition = acquisition1 + acquisition2
46+
```
47+
48+
#### Implementation details
49+
50+
The exact merge logic for each metadata type is defined in the `__add__` methods in the [aind-data-schema repository](https://github.com/AllenNeuralDynamics/aind-data-schema). See the following files:
51+
- `src/aind_data_schema/core/instrument.py`
52+
- `src/aind_data_schema/core/acquisition.py`
53+
- `tests/test_composability_merge.py` (for test examples)

0 commit comments

Comments
 (0)