Skip to content

Commit a4619be

Browse files
dyfdbirman
andauthored
flattening IA (#42)
* flattening IA * docs: fix some bad xrefs and adding missing extensions --------- Co-authored-by: Dan Birman <danbirman@gmail.com>
1 parent 393aabc commit a4619be

26 files changed

Lines changed: 253 additions & 241 deletions

docs/source/acquire_upload.md

Lines changed: 0 additions & 23 deletions
This file was deleted.

docs/source/acquire_upload/on_rig.md renamed to docs/source/acquire_upload/acquire_data.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# During acquisition
1+
# Acquire data
22

33
During data acquisition you are responsible for running version-controlled acquisition software and ensuring your data files for each modality are organized according to standardized conventions.
44

@@ -8,7 +8,7 @@ Metadata generated during acquisition captures **what data** should appear in th
88

99
### Data organization conventions
1010

11-
Raw data assets are required to be organized according to our [data organization conventions](../philosophy/data_organization.md).
11+
Raw data assets are required to be organized according to our [data organization conventions](../policies_practices/data_organization.md).
1212

1313
#### Per-modality file standards
1414

@@ -24,7 +24,7 @@ Rigs are responsible for generating the [acquisition.json](https://aind-data-sch
2424

2525
If you can't generate your aind-data-schema formatted metadata on your rig, you can use what we call the “extractor/mapper” pattern. We refer to the code on the rig that extracts metadata from data files as the extractor. We prefer for you to maintain this code in [aind-metadata-extractor](https://github.com/AllenNeuralDynamics/aind-metadata-extractor/) but you can also maintain it yourself. The code that takes the extractor output and transforms it to aind-data-schema is called the mapper. Scientific computing will help develop the mapper as well as maintain it, you are responsible for your extractor. The key to the extractor/mapper pattern is the data contract that defines the extractor output. The data contract must be a pydantic model or JSON schema file and must live in the [aind_metadata_extractor.models](https://github.com/AllenNeuralDynamics/aind-metadata-extractor/tree/main/src/aind_metadata_extractor/models) module.
2626

27-
On your rig you should output files that match the name of the corresponding mapper that will be run. So if your mapper is called fip you should write a `fip.json` file that validates against the fip extractor schema. The [GatherMetadataJob](upload.md#GatherMetadataJob) will automatically run your mapper.
27+
On your rig you should output files that match the name of the corresponding mapper that will be run. So if your mapper is called fip you should write a `fip.json` file that validates against the fip extractor schema. The [GatherMetadataJob](upload_data.md#gathermetadatajob) will automatically run your mapper.
2828

2929
#### Multiple independent rigs
3030

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
---
2+
orphan: true
3+
---
4+
5+
# Acquire, upload & process
6+
7+
## I want to...
8+
9+
[Prepare my metadata](prepare_before_acquisition.md) before acquisition.
10+
11+
[Calibrate and test](calibration.md) an instrument.
12+
13+
[Acquire data](acquire_data.md) on an instrument.
14+
15+
[Upload data](upload_data.md) to the Cloud.
16+
17+
[Process data](process_data.md) using our per-modality standard pipelines.
18+
19+
```{toctree}
20+
:hidden:
21+
22+
prepare_before_acquisition
23+
acquire_data
24+
calibration
25+
upload_data
26+
process_data
27+
```

docs/source/acquire_upload/prepare_before_acquisition.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -309,7 +309,7 @@ The `instrument_id` for AIND should be the SIPE ID for an instrument. If an inst
309309

310310
#### Multiple instruments
311311

312-
Multiple `instrument.json` files can be provided when multiple separate instruments are used simultaneously to acquire a data asset. The combined instrument metadata stored with the associated data asset will have an `instrument_id` that is the combined names of the individual instruments, joined with the `'_'` character. See [metadata merging rules](upload.md#metadata-merging-rules) for information about how metadata files are merged during data upload.
312+
Multiple `instrument.json` files can be provided when multiple separate instruments are used simultaneously to acquire a data asset. The combined instrument metadata stored with the associated data asset will have an `instrument_id` that is the combined names of the individual instruments, joined with the `'_'` character. See [metadata merging rules](upload_data.md#merge-rules) for information about how metadata files are merged during data upload.
313313

314314
#### Upload options
315315

@@ -337,7 +337,7 @@ Users have two options for providing instrument metadata files:
337337

338338
The data transfer service will then pull the instrument metadata from the database during upload.
339339

340-
Note that it is possible to combine these methods. For example, a user could pass the instrument JSON for the behavior instrument in the data directory (named something like `instrument_behavior.json`) and also specify a physiology rig by instrument ID in the `gather_preliminary_metadata` job type settings. The two instrument files would be merged by the data transfer service. See [metadata merging rules](upload.md#metadata-merging-rules).
340+
Note that it is possible to combine these methods. For example, a user could pass the instrument JSON for the behavior instrument in the data directory (named something like `instrument_behavior.json`) and also specify a physiology rig by instrument ID in the `gather_preliminary_metadata` job type settings. The two instrument files would be merged by the data transfer service. See [metadata merging rules](upload_data.md#merge-rules).
341341

342342
Also note that we require all devices in the database to have a unique `instrument_id`.
343343

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
# Process data
2+
3+
Scientific computing is currently re-organizing pipelines to be per-modality, rather than per-project.
4+
5+
Pipeline development requirements are documented in [Pipeline development](../policies_practices/pipeline_development.md).
6+
7+
## Per-modality physiology pipelines
8+
9+
| Modality | Modalities | Pipeline repository |
10+
|---|---|---|
11+
| Barcoded anatomy resolved by sequencing | `barseq` | |
12+
| Brightfield microscopy | `brightfield` | |
13+
| Confocal microscopy | `confocal` | |
14+
| Extracellular electrophysiology | `ecephys` | [aind-ephys-pipeline](https://github.com/AllenNeuralDynamics/aind-ephys-pipeline) |
15+
| Electron microscopy | `EM` | |
16+
| Electromyography | `EMG` | |
17+
| Fiber photometry | `fib` | [aind-fiber-photometry-harp-pipeline](https://github.com/AllenNeuralDynamics/aind-fiber-photometry-harp-pipeline) |
18+
| Fluorescence micro-optical sectioning tomography | `fMOST` | |
19+
| Intracellular electrophysiology | `icephys` | |
20+
| Intrinsic signal imaging | `ISI` | [isi_segmentation](https://github.com/AllenNeuralDynamics/isi_segmentation) |
21+
| Multiplexed analysis of projections by sequencing | `MAPseq` | |
22+
| Multiplexed error-robust fluorescence in situ hybridization | `merfish` | |
23+
| Magnetic resonance imaging | `MRI` | |
24+
| Planar optical physiology | `pophys` | [aind-pophys-pipeline](https://github.com/AllenNeuralDynamics/aind-pophys-pipeline) |
25+
| Single cell RNA sequencing | `scRNAseq` | |
26+
| Random access projection microscopy | `slap2` | |
27+
| Selective plane illumination microscopy | `SPIM` | |
28+
| Serial two-photon tomography | `STPT` | |
29+
30+
## Behavior pipelines
31+
32+
| Behavior | Modalities | Pipeline repository |
33+
|---|---|---|
34+
| Patch foraging behavior | `behavior`, `behavior-videos` | [aind-vr-foraging-pipeline](https://github.com/AllenNeuralDynamics/aind-vr-foraging-pipeline) |
35+
| Camstim/Sync Behavior | `behavior`, `behavior-videos` | [aind-behavior-camstim-pipeline](https://github.com/AllenNeuralDynamics/aind-behavior-camstim-pipeline) |
36+
37+
## Per-project pipelines
38+
39+
| Project name | Modalities | Pipeline repository |
40+
|---|---|---|
41+
| Dynamic foraging | `behavior`, `behavior-videos`, `fib` | [aind-dynamic-foraging-pipeline](https://github.com/AllenNeuralDynamics/aind-dynamic-foraging-pipeline) |

docs/source/acquire_upload/processing.md

Lines changed: 0 additions & 84 deletions
This file was deleted.
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Upload
1+
# Upload data
22

33
Uploading data is done by using the [aind-data-transfer-service](http://aind-data-transfer-service/) ([docs](https://aind-data-transfer-service.readthedocs.io/en/latest/index.html)) which handles running containerized tasks for data copying, compression, metadata gathering, and final upload to S3 and Code Ocean.
44

File renamed without changes.
File renamed without changes.
Lines changed: 5 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -1,22 +1,4 @@
1-
# Scientific Computing at AIND
2-
3-
## Core principles
4-
5-
All of the teams in Scientific Computing adhere to a set of core principles about our software.
6-
7-
### Code Review
8-
9-
At least one other software developer needs to approve a pull request in order for it to be merged. Please be courteous when providing feedback. The team lead can resolve any conflicts.
10-
11-
### Style
12-
13-
We use `black`, `flake8`, and `interrogate` to enforce [PEP 8](https://peps.python.org/pep-0008/) standards with [docstrings](https://peps.python.org/pep-0257/) in [NUMPY](https://numpydoc.readthedocs.io/en/latest/format.html) format.
14-
15-
### Versioning
16-
17-
We use [semver](https://semver.org/) major.minor.patch versions, these are automatically incremented when you use the `aind-library-template`. Note that for major versions you need to put the exact string "BREAKING CHANGE" in the commit *comment* (not the title).
18-
19-
You should set patch version floor `>=1.0.0` and major version ceiling `<2` for each internal dependency that you use. This is good practice for all dependencies.
1+
# Scientific Computing Teams
202

213
## Data Infrastructure
224

@@ -32,7 +14,7 @@ FastAPI service to run data compression and transfer jobs on the HPC
3214

3315
**aind-metadata-service**
3416

35-
REST service to retrieve metadata from AIND databases
17+
REST service to retrieve metadata from AIND databases
3618

3719
[link](http://aind-metadata-service/) | [readthedoc](http://aind-metadata-service/docs) | [repo](https://github.com/AllenNeuralDynamics/aind-metadata-service)
3820

@@ -44,11 +26,10 @@ Library to interface with AIND databases
4426

4527
**aind-data-asset-indexer**
4628

47-
Index jobs for AIND metadata in AWS DocumentDB and S3
29+
Index jobs for AIND metadata in AWS DocumentDB and S3
4830

4931
[readthedoc](https://aind-data-asset-indexer.readthedocs.io/en/latest/) | [repo](https://github.com/AllenNeuralDynamics/aind-data-asset-indexer)
5032

51-
5233
## Data & Outreach
5334

5435
The Data & Outreach team maintains the data schema and associated downstream tools and is responsible for coordinating workshops and other outreach events.
@@ -61,7 +42,7 @@ Metadata schema for neuroscience
6142

6243
**aind-metadata-mapper**
6344

64-
Repository to help gather and map metadata from different sources
45+
Repository to help gather and map metadata from different sources
6546

6647
[readthedoc](https://aind-metadata-mapper.readthedocs.io/en/latest/) | [repo](https://github.com/AllenNeuralDynamics/aind-metadata-mapper)
6748

@@ -75,15 +56,8 @@ Data book for the SWDB course
7556

7657
## Physiology & Behavior
7758

78-
The Physiology & Behavior team maintains the pipelines that process each modality of data asset that we acquire in AIND. Details can be found on the [processing](../acquire_upload/processing.md)
59+
The Physiology & Behavior team maintains the pipelines that process each modality of data asset that we acquire in AIND. Details can be found in [Process data](../acquire_upload/process_data.md).
7960

8061
## Computer Vision
8162

8263
[todo]
83-
84-
## Resources for SWEs
85-
86-
For research software engineers, [Good Research Code](https://goodresearch.dev/) is a good primer.
87-
88-
[Data structure fundamentals](https://www.crackingthecodinginterview.com/)
89-

0 commit comments

Comments
 (0)