Consolidate airlock storage architecture and implement metadata-based management#4853
Consolidate airlock storage architecture and implement metadata-based management#4853marrobi wants to merge 54 commits intomicrosoft:mainfrom
Conversation
Co-authored-by: marrobi <17089773+marrobi@users.noreply.github.com>
…copying Co-authored-by: marrobi <17089773+marrobi@users.noreply.github.com>
…ted storage Co-authored-by: marrobi <17089773+marrobi@users.noreply.github.com>
Co-authored-by: marrobi <17089773+marrobi@users.noreply.github.com>
…e airlock storage Co-authored-by: marrobi <17089773+marrobi@users.noreply.github.com>
Co-authored-by: marrobi <17089773+marrobi@users.noreply.github.com>
…ng container names unchanged Co-authored-by: marrobi <17089773+marrobi@users.noreply.github.com>
Co-authored-by: marrobi <17089773+marrobi@users.noreply.github.com>
…ag support Co-authored-by: marrobi <17089773+marrobi@users.noreply.github.com>
Co-authored-by: marrobi <17089773+marrobi@users.noreply.github.com>
…a stages Co-authored-by: marrobi <17089773+marrobi@users.noreply.github.com>
Co-authored-by: marrobi <17089773+marrobi@users.noreply.github.com>
Co-authored-by: marrobi <17089773+marrobi@users.noreply.github.com>
…age accounts Co-authored-by: marrobi <17089773+marrobi@users.noreply.github.com>
…twork isolation Co-authored-by: marrobi <17089773+marrobi@users.noreply.github.com>
…AC with PE filtering Co-authored-by: marrobi <17089773+marrobi@users.noreply.github.com>
…e access per stage Co-authored-by: marrobi <17089773+marrobi@users.noreply.github.com>
…for isolation Co-authored-by: marrobi <17089773+marrobi@users.noreply.github.com>
Co-authored-by: marrobi <17089773+marrobi@users.noreply.github.com>
Co-authored-by: marrobi <17089773+marrobi@users.noreply.github.com>
Co-authored-by: marrobi <17089773+marrobi@users.noreply.github.com>
Co-authored-by: marrobi <17089773+marrobi@users.noreply.github.com>
…ptions Co-authored-by: marrobi <17089773+marrobi@users.noreply.github.com>
…iltering
- Created global workspace storage account in core (stalairlockg{tre_id})
- Updated workspace Terraform to reference global storage instead of per-workspace accounts
- Each workspace maintains own PE for network isolation
- ABAC filters by workspace_id + PE + stage for access control
- Updated helper functions in API and airlock processor to use global storage
- Removed newly created documentation files
- Updated CHANGELOG with Option B cost savings (91% reduction, $7,943/month at 100 workspaces)
Cost impact at 100 workspaces:
- Before: $8,747/month (506 accounts)
- Option A (per-workspace): $1,762/month (101 accounts)
- Option B (global workspace): $772/month (2 accounts)
- Additional savings: $990/month = $11,880/year
This implements Option B as requested - we cannot maintain alternate solutions.
Co-authored-by: marrobi <17089773+marrobi@users.noreply.github.com>
… compatibility - Updated docs/azure-tre-overview/airlock.md with new storage architecture - Documented 2-account consolidation with metadata-based stage management - Explained ABAC workspace_id filtering for global workspace storage - Updated stage transition explanations (metadata vs. copy) - Verified UI compatibility - no changes needed (URLs generated server-side) UI Analysis: - AirlockRequestFilesSection.tsx parses SAS URLs generically using regex - No hardcoded storage account names in UI - Blob URLs come from API dynamically with correct account names - URL structure unchanged, just different account names - Result: UI works without modification ✅ Documentation: - Removed 9 temporary docs created during development - Updated existing airlock.md with Option B architecture - Explained 96% storage account reduction and cost savings - Documented metadata-based vs. copy-based transitions Co-authored-by: marrobi <17089773+marrobi@users.noreply.github.com>
…nsitions Created comprehensive e2e tests to verify Option B functionality: 1. test_workspace_isolation_via_abac() - Verifies global workspace storage usage (stalairlockg) - Tests that workspaces use correct storage account - Validates SAS token generation - Confirms ABAC filtering setup 2. test_metadata_based_stage_transitions() - Tests that same-storage transitions use metadata updates - Verifies container name stays the same (no copy) - Measures transition speed (should be fast ~1s) - Confirms draft→submitted→in-review uses metadata only 3. test_global_storage_account_usage() - Validates import requests use core storage (stalairlock) - Validates export requests use global workspace storage (stalairlockg) - Ensures correct account assignment per request type Tests marked with @pytest.mark.optionb for easy filtering. Run with: pytest -m optionb All tests verify the key aspects of Option B: - Global workspace storage consolidation - Workspace isolation via ABAC - Metadata-based stage management - Correct storage account routing Co-authored-by: marrobi <17089773+marrobi@users.noreply.github.com>
Replaced all mentions of "Option B" with neutral terminology: - "Global workspace storage" - "Consolidated storage" - "Metadata-based approach" This is now the standard implementation, not an alternative option. Changes: - CHANGELOG.md: Removed "(Option B - Global Workspace Storage)" reference - Renamed e2e_tests/test_airlock_option_b.py → test_airlock_consolidated.py - Updated test markers from @pytest.mark.optionb to @pytest.mark.airlock_consolidated - Updated all Python comments in api_app and airlock_processor modules - Updated all Terraform comments in core and workspace templates - Updated test descriptions and log messages All functionality remains the same - only terminology updated. Co-authored-by: marrobi <17089773+marrobi@users.noreply.github.com>
App Gateway integration was documented but NOT actually implemented. Removed to prevent Terraform errors and align with actual design. Changes: - Removed PE #2 (App Gateway → Core Storage) from storage_accounts.tf - Removed undefined variable reference (var.app_gw_subnet_id) - Updated comments to reflect actual implementation (SAS token access) - Corrected documentation in airlock.md Actual implementation uses SAS tokens for public access (import-external, export-approved) matching the original airlock design. This is fully functional without App Gateway. PE count updated: - Core storage: 1 PE (processor) + 1 PE (import-review workspace, added by workspace) - Global workspace: N PEs (1 per workspace, added by each workspace) Total for 100 workspaces: 102 PEs (was incorrectly documented as 103) Co-authored-by: marrobi <17089773+marrobi@users.noreply.github.com>
Changes:
1. Removed storage_accounts_option_a.tf backup file
2. Removed "Option A - per workspace" comment from workspace locals.tf
3. Removed "option-b" tag from core storage_accounts.tf
4. Updated comment to clarify legacy per-workspace storage name
Architecture is now clear: 2 storage accounts total
- stalairlock{tre_id} in core (all 5 core stages)
- stalairlockg{tre_id} in core (all workspace stages, shared globally)
- Total reduction: 506 → 2 accounts (99.6%)
No Option A or Option B variants - this is the single implementation.
Co-authored-by: marrobi <17089773+marrobi@users.noreply.github.com>
The sa_airlock_core storage account had network_rules with default_action=Deny, which blocks external clients (CI runners, browsers, research tools) from uploading to import-draft containers via the direct SAS URL. In the original architecture, sa_import_external had no network_rules (publicly accessible), secured only by user delegation SAS tokens. The consolidated core storage serves the same purpose and should have the same accessibility model. Security is maintained by: - ABAC conditions restrict API identity to import-external + export-approved stages - User delegation SAS tokens inherit ABAC restrictions of the signing identity - SAS tokens are only generated for publicly-accessible stages (is_publicly_accessible_stage) - Internal stages are protected by ABAC even with public network access
|
/test-extended bd14845 |
|
🤖 pr-bot 🤖 🏃 Running extended tests: https://github.com/microsoft/AzureTRE/actions/runs/21870857583 (with refid (in response to this comment from @marrobi) |
…ithub.com/marrobi/AzureTRE into copilot/redesign-airlock-storage-accounts
|
@copilot this needs amending so each workspace has an "airlock_version" property, of 1, or 2 for this version. this will allow backwards compatibility, leave current (v1) code in there etc. |
|
Here's the test plan, needs testing fully
|
…rlock-storage-accounts # Conflicts: # CHANGELOG.md # api_app/_version.py # core/version.txt # templates/workspaces/airlock-import-review/porter.yaml # templates/workspaces/base/porter.yaml
There was a problem hiding this comment.
Pull request overview
This PR redesigns Azure TRE’s Airlock storage model to reduce per-workspace infrastructure by consolidating to shared storage accounts and using container metadata to represent stages, while keeping a workspace-level airlock_version switch for gradual migration and backwards compatibility.
Changes:
- Introduces consolidated Airlock “v2” storage (core + global workspace accounts) with metadata-based stage transitions and ABAC conditions.
- Adds
airlock_version(workspace property) andenable_legacy_airlock(core toggle) to allow mixed v1/v2 operation during migration. - Updates API, airlock processor, e2e tests, and docs to support versioned behavior and the new architecture.
Reviewed changes
Copilot reviewed 61 out of 63 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| ui/app/src/components/shared/airlock/AirlockRequestFilesSection.tsx | Updates SAS URL validation and Azure CLI command generation |
| templates/workspaces/base/terraform/workspace.tf | Gates legacy airlock module; adds airlock_v2 module |
| templates/workspaces/base/terraform/variables.tf | Adds airlock_version; marks legacy malware vars unused |
| templates/workspaces/base/terraform/airlock_v2/variables.tf | New module inputs for v2 airlock workspace integration |
| templates/workspaces/base/terraform/airlock_v2/storage_accounts.tf | Adds PE + ABAC role assignment against global workspace storage |
| templates/workspaces/base/terraform/airlock_v2/providers.tf | New module provider constraints/aliases |
| templates/workspaces/base/terraform/airlock_v2/locals.tf | New locals for core RG and global storage naming |
| templates/workspaces/base/terraform/airlock_v2/eventgrid_topics.tf | Adds EventGrid subscription for global workspace blob events |
| templates/workspaces/base/terraform/airlock_v2/data.tf | Adds data sources for core resources (UAI, SB, DNS zone, etc.) |
| templates/workspaces/base/template_schema.json | Exposes airlock_version in workspace schema when airlock enabled |
| templates/workspaces/base/porter.yaml | Bumps bundle version; adds airlock_version parameter wiring |
| templates/workspaces/airlock-import-review/terraform/import_review_resources.terraform | Updates import review workspace to use consolidated core storage + ABAC identity |
| templates/workspaces/airlock-import-review/porter.yaml | Bumps bundle version |
| mkdocs.yml | Re-indents nav and adds legacy airlock page to docs nav |
| e2e_tests/test_airlock.py | Adjusts v1 E2E expectations when consolidated storage is used |
| e2e_tests/test_airlock_consolidated.py | Adds new E2E coverage for consolidated airlock scenarios |
| e2e_tests/resources/workspace.py | Formatting tweak for scope ID normalization |
| e2e_tests/pytest.ini | Adds airlock_consolidated marker |
| e2e_tests/conftest.py | Allows automatic auth setup if manual app reg not provided |
| docs/azure-tre-overview/airlock.md | Rewrites Airlock documentation for consolidated architecture |
| docs/azure-tre-overview/airlock-legacy.md | Adds legacy airlock architecture + migration guidance |
| core/version.txt | Core version bump |
| core/terraform/variables.tf | Adds enable_legacy_airlock variable |
| core/terraform/main.tf | Passes airlock FQDN into app gateway module; wires enable_legacy_airlock |
| core/terraform/appgateway/variables.tf | Adds airlock_core_storage_fqdn input |
| core/terraform/appgateway/locals.tf | Adds backend pool/http settings/probe names for airlock storage |
| core/terraform/appgateway/appgateway.tf | Adds backend, probe, and path rewrite for /airlock-storage/* |
| core/terraform/api-webapp.tf | Adds APP_GATEWAY_FQDN app setting |
| core/terraform/airlock/variables.tf | Adds enable_legacy_airlock to airlock module |
| core/terraform/airlock/storage_accounts.tf | Introduces consolidated storage accounts + ABAC; adds unified blob-created system topics/subscriptions |
| core/terraform/airlock/storage_accounts_v1.tf | Adds v1 storage accounts behind enable_legacy_airlock |
| core/terraform/airlock/outputs.tf | Exposes consolidated core storage blob host for app gateway |
| core/terraform/airlock/locals.tf | Adds consolidated storage locals; moves v1 locals under legacy toggle |
| core/terraform/airlock/identity.tf | Removes legacy role assignment loops (moved to v1 file) |
| core/terraform/airlock/eventgrid_topics.tf | Replaces per-stage subscriptions with unified blob-created subscription; conditional diagnostics include v1 |
| core/terraform/airlock/eventgrid_topics_v1.tf | Adds v1 EventGrid system topics/subscriptions behind legacy toggle |
| core/terraform/airlock/data.tf | Updates diagnostics categories data source to new system topic |
| config.sample.yaml | Documents enable_legacy_airlock |
| config_schema.json | Adds schema for enable_legacy_airlock |
| CHANGELOG.md | Adds airlock migration enhancement notes (needs cleanup per review) |
| api_app/tests_ma/test_services/test_airlock.py | Adds tests for public-stage logic and v2 link resolution |
| api_app/tests_ma/test_services/test_airlock_storage_helper.py | Adds unit tests for stage/account resolution helper |
| api_app/services/airlock.py | Adds airlock_version routing for account selection; adds public-stage helper |
| api_app/services/airlock_storage_helper.py | New helper for stage + storage account resolution |
| api_app/resources/constants.py | Adds consolidated storage names + stage constants |
| api_app/models/domain/events.py | Extends status-changed event payload with review workspace + airlock_version |
| api_app/models/domain/airlock_request.py | Adds airlock_version to domain model |
| api_app/event_grid/event_sender.py | Includes airlock_version and versioned workspace IDs in status-changed events |
| api_app/db/repositories/airlock_requests.py | Stamps airlock_version onto new Airlock requests |
| api_app/core/config.py | Adds APP_GATEWAY_FQDN setting |
| api_app/api/routes/api.py | Removes redundant global statements |
| api_app/api/routes/airlock.py | Passes workspace airlock_version when creating requests |
| api_app/_version.py | API version bump |
| airlock_processor/tests/test_status_change_queue_trigger.py | Adds tests for review_workspace_id and legacy/v2 routing helpers |
| airlock_processor/tests/shared_code/test_blob_operations_metadata.py | Adds unit tests for metadata-based blob/container operations |
| airlock_processor/tests/shared_code/test_airlock_storage_helper.py | Adds unit tests for storage/stage helper logic |
| airlock_processor/StatusChangedQueueTrigger/init.py | Adds v1/v2 branching and metadata-based stage transitions |
| airlock_processor/shared_code/constants.py | Adds consolidated storage + stage constants |
| airlock_processor/shared_code/blob_operations_metadata.py | Implements container metadata create/update helpers |
| airlock_processor/shared_code/airlock_storage_helper.py | Adds shared helper for stage/account mapping (v1/v2) |
| airlock_processor/BlobCreatedTrigger/init.py | Adds guard for unknown storage accounts in topic |
| airlock_processor/_version.py | Airlock processor version bump |
| .gitignore | Ignores *_old.tf |
templates/workspaces/base/terraform/airlock_v2/eventgrid_topics.tf
Outdated
Show resolved
Hide resolved
|
/test-extended 3b9dbd6 |
|
🤖 pr-bot 🤖 🏃 Running extended tests: https://github.com/microsoft/AzureTRE/actions/runs/23923706774 (with refid (in response to this comment from @marrobi) |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 61 out of 63 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (1)
e2e_tests/test_airlock.py:191
is_consolidated_storageis inferred by substring-matching the returned URL ("stalairlock" in container_url_without_sas). This will break if v2 links are later returned via App Gateway (e.g.,https://<appgw>/airlock-storage/...) or if naming changes. Prefer basing the branching on an explicit signal from the API response (e.g.,airlockRequest.airlock_version/ workspaceairlock_version) rather than parsing URLs.
| # Return standard blob storage URL format | ||
| return "https://{}.blob.{}/{}?{}" \ | ||
| .format(account_name, STORAGE_ENDPOINT, airlock_request.id, token) |
There was a problem hiding this comment.
The SAS URL returned here always uses the direct storage endpoint (https://{account}.blob...). If the intended design is to route public-stage access via App Gateway (/airlock-storage/*) (as described in the PR summary/docs and supported by the new App Gateway path rule), this should conditionally return an App Gateway URL for publicly accessible stages; otherwise users will bypass the gateway/WAF and the /airlock-storage routing is unused.
| locals { | ||
| core_resource_group_name = "rg-${var.tre_id}" | ||
| # STorage AirLock IMport InProgress | ||
| import_in_progress_storage_name = lower(replace("stalimip${var.tre_id}", "-", "")) | ||
| core_resource_group_name = "rg-${var.tre_id}" | ||
| airlock_core_storage_name = lower(replace("stalairlock${var.tre_id}", "-", "")) | ||
| } | ||
|
|
||
| module "terraform_azurerm_environment_configuration" { | ||
| source = "git::https://github.com/microsoft/terraform-azurerm-environment-configuration.git?ref=0.7.0" | ||
| arm_environment = var.arm_environment | ||
| } | ||
|
|
||
| data "azurerm_storage_account" "sa_import_inprogress" { | ||
| data "azurerm_storage_account" "sa_airlock_core" { | ||
| provider = azurerm.core | ||
| name = local.import_in_progress_storage_name | ||
| name = local.airlock_core_storage_name | ||
| resource_group_name = local.core_resource_group_name |
There was a problem hiding this comment.
This import-review workspace template now targets the consolidated core storage account (stalairlock{tre_id}), but legacy (airlock_version=1) import requests store the in-review data in the v1 in-progress account (stalimip{tre_id}). If you expect v1 workspaces to remain supported while enable_legacy_airlock=true, this change will prevent review VMs from accessing v1 import data. Consider keeping v1 support here (e.g., selecting stalimip{tre_id} when reviewing legacy requests) or documenting that review workspaces must be upgraded only when all workspaces are on airlock v2.
| # Import submit: copy to review workspace storage, or tre_id for legacy compatibility | ||
| dest_id = review_workspace_id if review_workspace_id else tre_id | ||
| return constants.STORAGE_ACCOUNT_NAME_IMPORT_INPROGRESS + dest_id |
There was a problem hiding this comment.
In legacy (v1) mode this changes the destination of the submit copy from stalimip{tre_id} to stalimip{review_workspace_id} when a review workspace is configured. Core Terraform still provisions only stalimip{tre_id} for v1, so stalimip{review_workspace_id} is unlikely to exist and import submissions would fail. The in-progress storage account name should remain stalimip{tre_id}; the review workspace accesses it via private endpoints/DNS, not by changing the account suffix.
| # Import submit: copy to review workspace storage, or tre_id for legacy compatibility | |
| dest_id = review_workspace_id if review_workspace_id else tre_id | |
| return constants.STORAGE_ACCOUNT_NAME_IMPORT_INPROGRESS + dest_id | |
| # Import submit: in legacy/v1 mode the in-progress storage account remains TRE-scoped. | |
| return constants.STORAGE_ACCOUNT_NAME_IMPORT_INPROGRESS + tre_id |
Resolves #4358
Consolidate airlock storage architecture and implement metadata-based management
Summary
Redesigns the airlock storage architecture from 56+ per-workspace storage accounts (5 core + ~5 per workspace) down to 2 shared storage accounts, using container metadata to track request stages instead of physically copying blobs between accounts.
A new per-workspace
airlock_versionproperty (1= legacy,2= consolidated) enables a gradual, backwards-compatible migration path.Problem
The existing airlock design creates 5–10 storage accounts per workspace, each with private endpoints, EventGrid topics, and role assignments. At scale (100 workspaces) this results in:
Changes
Core Terraform
sa_airlock_core(external-facing) andsa_airlock_workspace_global(internal)enable_legacy_airlocktoggle (default:true)/airlock-storage/*routes public SAS requests to the consolidated accountAPI (api_app)
AirlockRequestmodel gains anairlock_versionfield (default1)StatusChangedDataevent includesreview_workspace_idandairlock_versionAirlock Processor (airlock_processor)
StatusChangedQueueTriggerbased onairlock_versionblob_operations_metadata.pyfor in-place metadata updates (no blob copy)airlock_storage_helper.pyto resolve the correct storage account per version/stageDocumentation
Migration path
enable_legacy_airlock = true(default) — both v1 and v2 accounts exist side-by-sideairlock_version = 2on individual workspaces — new requests use consolidated storage; in-flight v1 requests complete normallyenable_legacy_airlock = falseto remove legacy accountsImpact
Version bumps
Testing
airlock_version = 1