Skip to content

Commit 3e71945

Browse files
arielleleondyf
andauthored
31 add documentation for pipeline versioning (#40)
* chore: add first few sections * feature: add final draft of version pipelines documentation * chore: edit based on some AI feedback * Revise versioning policies for data processing pipelines Updated versioning policies and guidelines for pipelines, emphasizing semantic versioning, GitHub integration, and the importance of maintaining accurate version information. * chore: pull in changes from `dyf-patch-1` * fix: add metadata example and state that CO publishes version as an int * chore: describe a use case for versioning not being syncronized. * chore: integrate more descriptive edits from reviewer feedback --------- Co-authored-by: David Feng <dyf@users.noreply.github.com>
1 parent bec9223 commit 3e71945

1 file changed

Lines changed: 55 additions & 0 deletions

File tree

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
# Versioning pipelines
2+
3+
Users need to understand how to interact with computed results produced by data processing pipelines. If there are changes in the structure or interpretation of results because of a change to a processing pipeline, it must be easy for users to understand the nature of these changes and detect these changes reliably in code.
4+
5+
## Policies
6+
7+
Core data processing pipelines MUST adopt [semantic versioning](https://semver.org/).
8+
- Major version changes indicate that the structure or interpretation of the data has changed.
9+
- Minor version changes indicate new, backwards compatible features were added to the pipeline.
10+
- Patch version changes indicate bug fixes.
11+
12+
The pipeline's name and semantic version MUST be stored in aind-data-schema [Processing](https://github.com/AllenNeuralDynamics/aind-data-schema/blob/dev/src/aind_data_schema/core/processing.py#L970) metadata at the top level of the results.
13+
14+
The pipeline's name and semantic version MUST be stored in the pipeline repository and easily accessible to pipeline code. We recommend a `.env` file containing `PIPELINE_VERSION`, `PIPELINE_NAME`, and `PIPELINE_URL` variables. These environment variables can be pulled using standard tools such as `os` and added to the `aind-data-schema` `Processing` core object for proper documentation. Specifically, the following fields of the `Processing` object should be populated with these enironment variables:
15+
16+
`Processing.pipeline_version=os.getenv("PIPELINE_VERSION", "No version reported.")`
17+
`Processing.pipeline_url=os.getenv("PIPELINE_URL", "No pipeline URL reported.")`
18+
19+
The pipeline repository and the repositories of all individual capsules MUST be public on GitHub.
20+
21+
To deploy a new release of a pipeline:
22+
23+
- Pipelines and component capsules MUST update their semantic version appropriately.
24+
- Pipelines and component capsules MUST be synchronized with GitHub.
25+
- Pipelines and component capsules used in production MUST have a Code Ocean "internal release."
26+
- Pipelines MUST update their `CHANGELOG` indicating what has changed in the release.
27+
28+
This process ensures production pipelines are not subject to accidental changes and versioning is always communicated consistently to users downstream.
29+
30+
## Code Ocean versioning
31+
32+
When a capsule or pipeline is internally released in Code Ocean, Code Ocean creates an immutable copy of the pipeline and issues it a release version. This version, which is published as an `int` value, is unrelated to the semantic version of the pipeline, but it is a necessary parameter for those triggering pipelines via the API (e.g. the AIND data transfer service).
33+
34+
## Implementation
35+
36+
Developers can create a pipeline from this template: [`aind-pipeline-template`](https://github.com/AllenNeuralDynamics/aind-pipeline-template). Once created, the pipeline uses a [workflow](https://github.com/AllenNeuralDynamics/.github/blob/main/.github/docs/Release%20Tag%20and%20Publish%20Pipeline.md) that will, on every pull request into main, bump the version using [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/). The version and GitHub repository of the pipeline created with this template are added to the pipeline's environment variables as `PIPELINE_VERSION`, `PIPELINE_NAME` and `PIPELINE_URL` in the repostory's `nextflow.config` file.
37+
38+
The developer is still responsible for ensuring that the `PIPELINE_VERSION`, `PIPELINE_NAME`, and `PIPELINE_URL` values, as well as the `CHANGELOG` are correct and up-to-date in the repository.
39+
40+
To address Git versions being out-of-sync with the Code Ocean version, a table is provided below that explains the relationship. Version numbers are only illustrative and meant to demonstrate that Code Ocean pipeline version always increases as an integer while semantic versions increase according to update level.
41+
42+
| Code Ocean Version | GitHub Version | Git Commit |
43+
|--------------------|----------------|------------|
44+
| 18.0 | - | - |
45+
| 19.0 | 0.1.0 | feat: add release.yml file for semantic versioning |
46+
| 20.0 | 0.1.1 | fix: correct mislabeled metadata in processing |
47+
| 21.0 | 0.2.0 | feat: add two new QC plots |
48+
49+
Because some pipelines already have mature Code Ocean releases, there will be a mismatch between Code Ocean versions and the semantic versions reported in the `Processing` object. Assets processed before semantic versioning was adopted will only have a Code Ocean version in their metadata (e.g., `18.0`). Assets processed after adoption will have a semantic version (e.g., `0.1.0`).
50+
51+
When querying the metadata database for `Processing.pipeline_version`, users and developers must account for both version formats. For example, to find all assets processed with this pipeline before version `0.2.0`, the query would need to match:
52+
- Semantic versions `< 0.2.0` (i.e., `0.1.0`, `0.1.1`)
53+
- Code Ocean versions from before semantic versioning was adopted (i.e., `18.0`)
54+
55+
For pipelines that have adopted semantic versioning, users and developers will always be able to find a pipelines semantic version in the `nextflow.config`.

0 commit comments

Comments
 (0)