Skip to content

Latest commit

 

History

History
75 lines (49 loc) · 6.33 KB

File metadata and controls

75 lines (49 loc) · 6.33 KB

Versioning pipelines

Users need to understand how to interact with computed results produced by data processing pipelines. If there are changes in the structure or interpretation of results because of a change to a processing pipeline, it must be easy for users to understand the nature of these changes and detect these changes reliably in code.

Policies

Core data processing pipelines MUST adopt semantic versioning. See Major/minor/patch section below

The pipeline's name and semantic version MUST be stored in aind-data-schema Processing metadata at the top level of the results.

The pipeline's name and semantic version MUST be stored in the pipeline repository and easily accessible to pipeline code. We recommend a .env file containing PIPELINE_VERSION, PIPELINE_NAME, and PIPELINE_URL variables. These environment variables can be pulled using standard tools such as os and added to the aind-data-schema Processing core object for proper documentation. Specifically, the following fields of the Processing object should be populated with these enironment variables:

Processing.pipeline_version=os.getenv("PIPELINE_VERSION", "No version reported.") Processing.pipeline_url=os.getenv("PIPELINE_URL", "No pipeline URL reported.")

The pipeline repository and the repositories of all individual capsules MUST be public on GitHub.

To deploy a new release of a pipeline:

  • Pipelines and component capsules MUST update their semantic version appropriately.
  • Pipelines and component capsules MUST be synchronized with GitHub.
  • Pipelines and component capsules used in production MUST have a Code Ocean "internal release."
  • Pipelines MUST update their CHANGELOG indicating what has changed in the release.

This process ensures production pipelines are not subject to accidental changes and versioning is always communicated consistently to users downstream.

Code Ocean versioning

When a capsule or pipeline is internally released in Code Ocean, Code Ocean creates an immutable copy of the pipeline and issues it a release version. This version, which is published as an int value, is unrelated to the semantic version of the pipeline, but it is a necessary parameter for those triggering pipelines via the API (e.g. the AIND data transfer service).

Implementation

Developers can create a pipeline from this template: aind-pipeline-template. Once created, the pipeline uses a workflow that will, on every pull request into main, bump the version using Conventional Commits. The version and GitHub repository of the pipeline created with this template are added to the pipeline's environment variables as PIPELINE_VERSION, PIPELINE_NAME and PIPELINE_URL in the repostory's nextflow.config file.

The developer is still responsible for ensuring that the PIPELINE_VERSION, PIPELINE_NAME, and PIPELINE_URL values, as well as the CHANGELOG are correct and up-to-date in the repository.

To address Git versions being out-of-sync with the Code Ocean version, a table is provided below that explains the relationship. Version numbers are only illustrative and meant to demonstrate that Code Ocean pipeline version always increases as an integer while semantic versions increase according to update level.

Code Ocean Version GitHub Version Git Commit
18.0 - -
19.0 0.1.0 feat: add release.yml file for semantic versioning
20.0 0.1.1 fix: correct mislabeled metadata in processing
21.0 0.2.0 feat: add two new QC plots

Because some pipelines already have mature Code Ocean releases, there will be a mismatch between Code Ocean versions and the semantic versions reported in the Processing object. Assets processed before semantic versioning was adopted will only have a Code Ocean version in their metadata (e.g., 18.0). Assets processed after adoption will have a semantic version (e.g., 0.1.0).

When querying the metadata database for Processing.pipeline_version, users and developers must account for both version formats. For example, to find all assets processed with this pipeline before version 0.2.0, the query would need to match:

  • Semantic versions < 0.2.0 (i.e., 0.1.0, 0.1.1)
  • Code Ocean versions from before semantic versioning was adopted (i.e., 18.0)

For pipelines that have adopted semantic versioning, users and developers will always be able to find a pipelines semantic version in the nextflow.config.

Major/minor/patch

Standard semantic versioning alone does not fully capture the needs of data processing pipelines. In particular, conventional commit types such as fix, refactor, and feat can each be either breaking or non-breaking, and breaking changes can differ in kind: some change output content (a downstream process may produce wrong results), while others change output structure or processing fundamentals (a downstream process fails entirely).

The table below maps conventional commit types to the appropriate version bump:

Commit type Description Version bump
build Changes to build system or external dependencies minor
feat A new feature added to output without changing existing output minor
refactor A code change that neither fixes a bug nor adds a feature minor
perf A code change that improves performance with output unchanged minor
fix A bug fix that resolves failures only patch
ci Changes to CI configuration files and scripts patch
docs Documentation only changes patch
style Changes that do not affect the meaning of the code (white-space, formatting, etc.) patch
test Adding missing tests or correcting existing tests patch
fix! Breaking change to something in the code minor
feat! Breaking change and will change how it's highlighted in the changelog minor
BREAKING (footer) Fundamental changes the processing approach or output structure such that results before and after are not directly comparable major