Container-ready R workflow for ingesting Xenium and Visium/Visium HD spatial transcriptomics outputs, applying the Giotto analysis toolkit, and exporting reproducible QC artefacts, embeddings, and clustering summaries. The project uses a Container-as-a-Function interface: one stable entrypoint, explicit inputs, explicit outputs, and a consistent /data plus /output mount contract.
Visium HD (square_008um, --max_cells 6000):
These figures are exported automatically beneath results/<run_id>/plots/ and results/<run_id>/qc/ for every pipeline run.
- Xenium ingest (
cell_feature_matrix.h5+ cells metadata) with automatic project ID detection - Visium + Visium HD ingest (Spaceranger outs with filtered/raw feature matrix + spatial metadata)
- Automatically extracts Spaceranger
*spatial*.zipbundles when spatial assets are archived - AnnData (
.h5ad) ingest with automatic discovery of expression matrices (X, layers, raw) and spatial coordinates (obs columns orobsm['spatial']) - Optional QC filters (e.g.,
--min_genes_per_cell,--min_total_expr_per_cell,--max_mito_pctwith configurable prefixes) - Optional
--max_cellsdownsampling to trim oversized datasets before Giotto normalization - Stage-aware workflow control (
--stage validate|ingest|qc|analyze|export|all) with checkpointed Giotto objects - Giotto-based normalization, dimensionality reduction, and Leiden clustering
- Structured outputs: cluster tables, spatial and UMAP plots, QC metrics, per-filter QC summary, Giotto object, run metadata
- Container-as-a-Function execution contract (single entrypoint, explicit inputs/outputs)
renv.lockfor reproducible local or containerized restores
- Local-first execution is working via
scripts/run_all.R - A lean GHCR image is published at
ghcr.io/nidap-community/giotto-st-pipeline:latest - The published image excludes the optional
arrowpackage and therefore expects non-parquet Visium metadata - A GHCR publish helper is available at
container/publish.shusingsha-<git-sha>,lean, andlatesttags - A manual GitHub Actions workflow at
.github/workflows/publish-ghcr.ymlcan rebuild and republish the lean image outside the local network path - If GH Actions cannot push to GHCR with
GITHUB_TOKEN, configure aGHCR_TOKENActions secret withwrite:packages results/is ignored by git; generate outputs locally or inside the container and archive externally as needed
When running in Docker or Apptainer/Singularity, mount host paths as:
/data: read-only inputs such as Xenium outputs, Visium Spaceranger directories,.h5adfiles, and config files/output: writable results directory
This keeps the container callable like a pure function across Docker and Apptainer/Singularity runtimes.
The canonical happy path is now the published GHCR image.
docker pull ghcr.io/nidap-community/giotto-st-pipeline:latest
mkdir -p "$PWD/results/xenium_r1"
docker run --rm \
-v /path/to/xenium:/data:ro \
-v "$PWD/results":/output \
ghcr.io/nidap-community/giotto-st-pipeline:latest \
--stage all \
--input_format xenium \
--input_dir /data/output-XETG00202__0024834_Right__SCAF04264_Right_R1__20240912__162834 \
--output_dir /output/xenium_r1 \
--project_id XETG00202_R1
# Visium / Visium HD (Spaceranger outs directory)
docker run --rm \
-v /path/to/visium-outs:/data:ro \
-v "$PWD/results":/output \
ghcr.io/nidap-community/giotto-st-pipeline:latest \
--stage all \
--input_format visium \
--input_dir /data \
--output_dir /output/sample123 \
--project_id sample123
# AnnData (.h5ad)
docker run --rm \
-v /path/to/h5ad:/data:ro \
-v "$PWD/results":/output \
ghcr.io/nidap-community/giotto-st-pipeline:latest \
--stage all \
--input_format h5ad \
--input_path /data/sample123.h5ad \
--output_dir /output/sample123_h5ad \
--project_id sample123_h5ad \
--python_path ~/.local/share/r-miniconda/envs/giotto_env/bin/pythonReview results/xenium_r1/metadata/run_parameters.json for a structured summary of the run configuration. Adjust --cores, --python_path, or --seed as needed for your environment. For Apptainer/Singularity pull-and-run commands and local source execution, see QUICKSTART.md.
renv is available, but it is not auto-activated by .Rprofile.
module load R/4.4.3
Rscript -e 'install.packages("renv", repos = "https://cloud.r-project.org")'
Rscript -e 'renv::restore(prompt = FALSE)'For interactive local work, activate the project library explicitly:
R
source('renv/activate.R')For non-interactive runs, call Rscript scripts/run_all.R ... from the repo root after restoring renv.
| Flag | Description |
|---|---|
--stage |
Workflow stage: all, validate, ingest, qc, analyze, or export. Default is all. |
--input_format |
Choose xenium, visium, or h5ad explicitly (default auto infers from directory structure or file extension). |
--input_dir |
Standardized ST input directory (Xenium cell_feature_matrix.h5 + cells CSV; Visium/Visium HD Spaceranger outs/). |
--input_path |
Direct path to a single-file input (currently .h5ad). Ignored for directory-based formats. |
--input_object |
Existing Giotto object RDS used as input for qc, analyze, or export stages. |
--max_cells |
Randomly subsample cells/spots before analysis (helpful for Visium HD on memory-constrained nodes). |
--min_genes_per_cell |
Drop cells whose detected genes fall below this integer threshold prior to normalization. |
--min_total_expr_per_cell |
Drop cells whose total expression counts fall below this integer threshold. |
--max_mito_pct |
Drop cells whose mitochondrial expression fraction (percentage) exceeds this value. |
--mito_gene_prefixes |
Comma-separated gene symbol prefixes treated as mitochondrial (default MT-; pass none to disable). |
--project_id |
Optional short identifier used to prefix plot/table artefacts; defaults to folder name. |
--python_path |
Optional Python binary for Giotto (e.g., ~/.local/share/r-miniconda/envs/giotto_env/bin/python). |
--cores |
Number of CPU cores to dedicate to Giotto (default: 4). |
--seed |
Random seed applied before dimensionality reduction and clustering (default: 1). |
--dry_run |
Validate inputs (including .h5ad content) and exit before running the Giotto workflow. |
When ingesting Visium HD outputs, install the R arrow package via renv::install('arrow') (or provide a CSV tissue positions file) so parquet spatial metadata can be parsed. For very high-resolution Visium HD runs (hundreds of thousands of spots), use --max_cells (e.g., --max_cells 4000) or schedule the job on a compute node to avoid login-node OOM kills. Matrix-format ingest remains TODO.
The published lean GHCR image assumes non-parquet Visium metadata at runtime. For container usage, prefer CSV tissue-position files rather than parquet inputs unless and until a fuller image variant is released.
Visium runs generated by Spaceranger sometimes package spatial assets inside *spatial*.zip; the pipeline now unpacks these archives automatically into a temporary directory during ingest.
AnnData ingest relies on Python packages anndata (≥0.7) and scipy being available to the Giotto reticulate environment. Supply --python_path if Giotto should use a specific virtual environment.
Mitochondrial filtering treats prefixes case-insensitively; the default of MT- matches human gene symbols. Provide multiple prefixes separated by commas (e.g., MT-,nd), or set --mito_gene_prefixes none to skip mitochondrial checks entirely.
results/<project_id>/
├── metadata/
│ ├── run_parameters.json
│ ├── session_info.txt
│ └── <project_id>_filter_summary.csv
├── objects/
│ ├── <project_id>_ingested_giotto.rds # stage=ingest
│ ├── <project_id>_qc_giotto.rds # stage=qc
│ ├── <project_id>_analyzed_giotto.rds # stage=analyze
│ └── <project_id>_giotto_object.rds # stage=export or stage=all
├── qc/
│ ├── <project_id>_qc_metrics.csv
│ ├── <project_id>_qc_summary.txt
│ ├── <project_id>_nr_genes_hist.png
│ ├── <project_id>_total_expr_hist.png
│ └── <project_id>_genes_vs_expr.png
├── plots/
│ ├── <project_id>_spatial.png
│ └── <project_id>_umap.png
└── tables/
└── clusters.csv
When executed inside the published container the same directory layout will be written under the mounted --output_dir.
For restartable workflow execution, run earlier stages once, then pass the saved object to later stages with --input_object.
- Restore the R environment locally (
renv::restore()) - Build the image on a workstation with Docker:
./container/build.sh giotto-st-pipeline:devOn Apple Silicon, pass the target platform explicitly when needed:
./container/build.sh giotto-st-pipeline:dev --platform linux/amd64- Optionally export to a tarball and convert to
.siffor Apptainer/Singularity:
docker save giotto-st-pipeline:dev -o giotto-st-pipeline.tar
singularity build giotto-st-pipeline.sif docker-archive://giotto-st-pipeline.tarHPC environments without Docker should rely on the published GHCR image or pre-built .sif artefacts generated off-cluster.
For standardized Apptainer/Singularity binds, use container/run_apptainer.sh with DATA_DIR and OUTPUT_DIR environment variables.
If you are only consuming the published GHCR image, you should not need to manage enterprise CA certificates locally. Enterprise CA handling is only relevant when rebuilding the image from source behind a TLS-inspecting proxy.
For GHCR publication details, including tag conventions and Docker login requirements, see container/README.md.
renv.lockpins CRAN, Bioconductor, and GitHub package revisions (Giotto and spatstat suites).Rprofiledoes not auto-activate renv; activate it explicitly withsource("renv/activate.R")for local interactive sessions- System dependencies required for compiled R packages are documented in
container/Dockerfile
Use renv::status() before committing dependency changes to ensure the lockfile stays current.
- Add matrix-format ingest helper
- Decide whether to publish a second fuller image variant with parquet-enabled Visium support
- Automate lightweight tests under
tests/ - Document example configs under
configs/




