This project includes pre-configured CI/CD pipelines for github_actions.
The CI/CD pipeline expects the bundle project to be at the root of the repository:
my_data_project/
├── databricks.yml # ← Must be at repo root
├── .github/
│ └── workflows/
├── resources/
├── src/
├── tests/
└── ...
- Databricks CLI (installed automatically by the pipeline)
- Python 3.11+ (for running unit tests)
Before CI/CD can deploy resources, Unity Catalog catalogs must exist and service principals must have appropriate permissions.
This template uses a single service principal per environment that serves as both:
- Deployer: Authenticates the Databricks CLI to run
databricks bundle deployin CI/CD - Runtime: Executes jobs and pipelines (configured via
run_asindatabricks.yml)
The service principal configured for CI/CD authentication should be the same one specified as stage_service_principal and prod_service_principal in variables.yml. The CI/CD pipeline authenticates as this SP to deploy bundles, and the run_as directive ensures deployed jobs execute under the same identity.
Advanced: It is technically possible to use different SPs for deployment and runtime (e.g., a deployment SP with workspace-level permissions and a separate runtime SP with specific data access). In that case, configure the CI/CD variables with the deployer SP credentials and
variables.ymlwith the runtime SP's application ID. However, this template assumes a single SP per environment for simplicity.
Ensure the following pre-existing catalogs are accessible (created by your platform/infra team):
stage_analyticsprod_analytics
Note: The
usertarget shares thedev_analyticscatalog with per-user schema prefixes and doesn't require CI/CD access.
Before granting Unity Catalog permissions, ensure the service principal is added to your Databricks workspace:
- Go to your workspace → Settings → Identity and access
- Click Service principals → Add service principal
- Add your relevant stage/prod SPs
The service principal only needs catalog-level permissions. Schema-level grants are handled automatically by the bundle deployment (defined in databricks.yml).
Don't have a service principal yet? See Creating OAuth M2M Credentials below for step-by-step instructions, then return here to grant catalog permissions.
Identify your Service Principal: In Databricks Account Console → User management → Service principals → your SP, copy the Client ID from the OAuth secrets section.
-- STAGING Environment Permissions
-- Replace <STAGING_SP_ID> with your staging service principal's application/client ID
GRANT USE CATALOG ON CATALOG stage_analytics TO `<STAGING_SP_ID>`;
GRANT CREATE SCHEMA ON CATALOG stage_analytics TO `<STAGING_SP_ID>`;
-- PRODUCTION Environment Permissions
-- Replace <PROD_SP_ID> with your production service principal's application/client ID
GRANT USE CATALOG ON CATALOG prod_analytics TO `<PROD_SP_ID>`;
GRANT CREATE SCHEMA ON CATALOG prod_analytics TO `<PROD_SP_ID>`;Schema permissions are automatic: When the bundle deploys and creates schemas, the SP becomes the schema owner. Additional grants defined in databricks.yml (for groups like developers, qa_team, etc.) are applied automatically during deployment.
| Permission | Purpose |
|---|---|
USE CATALOG |
Access the catalog hierarchy |
CREATE SCHEMA |
Create bronze, silver, gold schemas during first deployment |
Note that you might want to grant other CREATE <SECURABLE> privileges e.g. to create Volume via DABs.
What you DON'T need to grant manually:
- Schema-level privileges (SP owns schemas it creates)
- Table-level privileges (inherited from schema ownership)
This project uses an environment-branch promotion model based on GitLab Flow. Feature branches merge into the default branch (staging), which promotes to the release branch (production). It is simpler than Gitflow (no develop branch) and more structured than GitHub Flow (explicit production gating via a long-lived release branch). This pattern is well-suited for data pipeline projects where stability matters more than rapid feature shipping.

Branch name mapping: The diagram above uses generic branch names. In your project:
mainis configured as the staging integration branch andreleaseas the production release branch. Map any references to "main"/"release" in the diagram to your configured branch names.
| Branch | Environment | Purpose | CI/CD Trigger |
|---|---|---|---|
feature/* |
User | Development work | None (Developers run bundle validate locally) |
main |
Staging | Integration & Pre-prod | PR: Validates bundle Merge: Deploys to Staging |
release |
Production | Production Releases | Merge: Deploys to Production |
- Create feature branch from
main - Develop and test locally using
databricks bundle validate -t user - Open Pull Request to
main- CI pipeline runs unit tests
- CI pipeline validates bundle for staging and production
- Merge to
mainafter approval- CD pipeline deploys to staging environment
- Open Pull Request from
maintorelease- Review changes for production readiness
- No CI validation runs (already validated on
main)
- Merge to
release- CD pipeline deploys to production environment
For production issues, we recommend:
-
Fix Forward (Preferred): Pause the broken job in the workspace, then push the fix through the normal
feature/*→main→releaseflow. This is almost always possible for data pipelines because they tolerate short delays better than user-facing services. This ensures full validation. -
Upstream-First (Emergency): When you cannot wait (e.g., a corrupted table blocking downstream consumers):
- Create a branch from
main, fix the issue, and merge tomain(ensuring it passes CI) - Cherry-pick the merge commit to
releasefor immediate deployment - Why? This prevents regression bugs where a hotfix exists in production but is overwritten by the next staging deployment because it never made it to the main branch
- Create a branch from
Advanced: You can create a
hotfix/*branch fromrelease, merge it directly torelease, then cherry-pick the fix back tomain. However, this bypasses CI validation on the main path — treat it as an emergency escape hatch only, not a routine practice.
| Pipeline Stage | Trigger | Action |
|---|---|---|
| Bundle CI | Pull Request to main |
Runs unit tests and validates bundle configuration |
| Staging CD | Merge to main |
Deploys bundle to staging environment |
| Production CD | Merge to release |
Deploys bundle to production environment |
- Go to your GitHub repository
- Navigate to Settings → Secrets and variables → Actions
- Click New repository secret for each secret listed below
| Secret Name | Value Source (Databricks) |
|---|---|
STAGING_DATABRICKS_HOST |
Your Databricks workspace URL (e.g., https://xxx.cloud.databricks.com) |
STAGING_DATABRICKS_CLIENT_ID |
Account Console → User management → Service principals → [your SP] → OAuth secrets → Client ID |
STAGING_DATABRICKS_CLIENT_SECRET |
Account Console → User management → Service principals → [your SP] → OAuth secrets → Secret |
PROD_DATABRICKS_HOST |
Same as above, for production workspace |
PROD_DATABRICKS_CLIENT_ID |
Same as above, for production SP |
PROD_DATABRICKS_CLIENT_SECRET |
Same as above, for production SP |
- Go to Databricks Account Console (accounts.cloud.databricks.com)
- Navigate to User management → Service principals
- Create or select a service principal for each environment
- Generate OAuth secret:
- Go to OAuth secrets tab → Generate secret
- Copy Client ID →
*_DATABRICKS_CLIENT_ID - Copy Secret (shown only once!) →
*_DATABRICKS_CLIENT_SECRET
- Add to workspace:
- Go to Workspaces → [your workspace] → Settings → Identity and access
- Add the service principal
- Grant Unity Catalog permissions (see above)
The workflow file is located at:
.github/workflows/my_data_project_bundle_cicd.yml
This workflow is triggered on:
- Pull requests to
main: Runs unit tests and validates bundle configuration - Push to
main: Deploys to staging environment - Push to
release: Deploys to production environment - Manual dispatch: Run workflow manually via GitHub Actions UI
If GitHub Actions is not already enabled for your repository:
- Go to your repository Settings → Actions → General
- Under "Actions permissions", select Allow all actions and reusable workflows
- Under "Workflow permissions", select Read and write permissions
- Click Save
For better code quality, set up branch protection rules:
- Go to Settings → Branches
- Click Add branch ruleset
- For branch name pattern:
main - Enable the following:
- Require a pull request before merging
- Require approvals: 1 or more
- Require status checks to pass before merging
- Search and add:
Validate and Test(see note below)
- Search and add:
- Require conversation resolution before merging
- Require a pull request before merging
- Click Create or Save changes
Note: The
Validate and Teststatus check will only appear in the search after you've run the workflow at least once. Create a test PR first, then return here to add the status check requirement.
For the release branch:
- Repeat the above for
release - Consider stricter policies for production (e.g., 2+ required approvals)
The CI pipeline automatically runs unit tests before bundle validation. Tests are discovered in the tests/ directory.
# Install development dependencies
pip install -r requirements_dev.txt
# Run tests
pytest tests/ -Vtests/
├── __init__.py
├── test_placeholder.py # Example test (replace with your tests)
└── ... # Add your test files here
- Ensure the variable group is named exactly:
vg_my_data_project - Check that the variable group is linked to the pipeline (first run: click "Permit")
- Ensure
databricks.ymlis at the repository root - This template requires the bundle project to be at the repo root, not in a subdirectory
- Verify that
DATABRICKS_HOSTis correct (includehttps://) - Verify OAuth credentials:
DATABRICKS_CLIENT_IDandDATABRICKS_CLIENT_SECRET - Ensure the service principal has been added to the Databricks workspace
- Check that the SP has appropriate Unity Catalog permissions
- Create the required catalog:
stage_analyticsorprod_analytics - Grant the service principal
USE CATALOGpermission
- Ensure the service principal has
CREATE SCHEMAon the catalog - After schemas are created, grant
ALL PRIVILEGESon each schema
- Run
pytest tests/ -Vlocally to see detailed output - Check that all test dependencies are in
requirements_dev.txt