|
2 | 2 |
|
3 | 3 | [todo] |
4 | 4 |
|
5 | | -## GatherMetadataJob |
| 5 | +## GatherMetadataJob |
| 6 | + |
| 7 | +[todo] |
| 8 | + |
| 9 | +### Merge rules |
| 10 | + |
| 11 | +### When can multiple files be merged? |
| 12 | + |
| 13 | +When data is acquired simultaneously using two or more distinct instruments (e.g., a behavior instrument and a physiology instrument), multiple `instrument.json` and/or `acquisition.json` metadata files can be provided. The GatherMetadataJob will merge these files during upload via the `aind-data-transfer-service`. |
| 14 | + |
| 15 | +#### File Naming Convention |
| 16 | + |
| 17 | +Each file must follow the naming pattern `<metadata_type>*.json` where `*` is any string. We recommend using modalities to organize the individual files: |
| 18 | +- `instrument_behavior.json` and `instrument_ecephys.json` |
| 19 | +- `acquisition_behavior.json` and `acquisition_ecephys.json` |
| 20 | + |
| 21 | +#### Contraints |
| 22 | + |
| 23 | +1. **Unique fields must match**: Certain identifier fields that should be unique across the dataset (like `subject_id` and `instrument_id`) **must have identical values** in all files being merged. If these fields conflict, the merge will fail and your upload job will be rejected. |
| 24 | + |
| 25 | +2. **No shared devices, with the exception of a single shared clock**: In general, two instruments can be merged **if and only if there are no shared devices** between them. Devices are identified by their `name` field. If the same device name appears in both instrument files, they should really be defined as a single instrument, not two separate ones. |
| 26 | + |
| 27 | + **Exception for clock synchronization**: When synchronizing data acquisition across multiple instruments (e.g., recording behavior and physiology simultaneously), a shared clock device is permitted. For AIND instruments this must be a [HarpDevice](https://aind-data-schema.readthedocs.io/en/latest/components/devices.html#harpdevice) configured as a clock generator (`HarpDevice.is_clock_generator=True`). |
| 28 | + |
| 29 | +3. **Enable validation**: It is **strongly recommended** to turn on the `raise_if_invalid` setting in the `GatherMetadataJob` job settings. This validates that the merge will succeed *before* upload, making it much easier to identify and fix problems compared to dealing with a raw data asset with broken metadata. |
| 30 | + |
| 31 | +4. **Python merging**: You can test merging locally in Python using the `+` operator: |
| 32 | + |
| 33 | +```python |
| 34 | +from aind_data_schema.core.instrument import Instrument |
| 35 | +from aind_data_schema.core.acquisition import Acquisition |
| 36 | + |
| 37 | +# Merge instruments |
| 38 | +instrument1 = Instrument.model_validate_json(json_string_1) |
| 39 | +instrument2 = Instrument.model_validate_json(json_string_2) |
| 40 | +merged_instrument = instrument1 + instrument2 |
| 41 | + |
| 42 | +# Merge acquisitions |
| 43 | +acquisition1 = Acquisition.model_validate_json(json_string_1) |
| 44 | +acquisition2 = Acquisition.model_validate_json(json_string_2) |
| 45 | +merged_acquisition = acquisition1 + acquisition2 |
| 46 | +``` |
| 47 | + |
| 48 | +#### Implementation details |
| 49 | + |
| 50 | +The exact merge logic for each metadata type is defined in the `__add__` methods in the [aind-data-schema repository](https://github.com/AllenNeuralDynamics/aind-data-schema). See the following files: |
| 51 | +- `src/aind_data_schema/core/instrument.py` |
| 52 | +- `src/aind_data_schema/core/acquisition.py` |
| 53 | +- `tests/test_composability_merge.py` (for test examples) |
0 commit comments