Transitioning to Slurm Native Auth with resilient workbench keys distribution#5695
Transitioning to Slurm Native Auth with resilient workbench keys distribution#5695arpit974 wants to merge 6 commits into
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request initiates the transition from legacy MUNGE-based authentication to Slurm Native Authentication. It updates the infrastructure-as-code blueprints, frontend forms, and deployment logic to support this new standard. To ensure a smooth transition, the PR includes a detailed migration guide and deprecation warnings for existing clusters, while also updating internal testing pipelines to validate the new authentication model. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces Slurm Native Authentication as a secure alternative to MUNGE, including a migration guide and deprecation notices for MUNGE scheduled for 2026. The changes affect numerous blueprints, the frontend workbench, Terraform modules, and the Go-based validator logic, which now supports implicit variable binding. Feedback highlights critical issues in the workbench implementation, specifically the risks of compiling Slurm from source within a startup script and the use of hardcoded hostname patterns. Additionally, improvements to shell script error handling were suggested, and the new implicit variable binding in the Go validator was flagged for better documentation and adherence to repository rules regarding programmatic injection.
Objective
This PR transitions the SchedMD Slurm-GCP v6 scheduler controller and associated tooling from legacy MUNGE-based authentication (
auth/munge) to the secure, modern Slurm Native Authentication (auth/slurm) standard.Support for MUNGE is deprecating and scheduled for complete removal on July 31, 2026. This PR implements a production-grade transition runway with complete backward compatibility for legacy clusters and daily testing pipelines.
Technical Highlights & Key Gaps Resolved
1. Resilient Key Distribution for Workbenches (OFE Frontend)
workbenchinfo.py) to build stable Slurm 24.05 (RPC-compatible). It mounts SchedMD's exports target/slurm/key_distributionto a temporary directory (/mnt/clusterkey), securely copies theslurm.keylocally, sets strict0400permissions owned byslurm:slurm, and instantly unmounts and deletes the temporary mount point.False(getattr(..., False)). Upgrades to the dashboard will not force Native Auth on old clusters, protecting legacy users from NFS mount startup crashes.2. Upgraded Go Validator Settings Resolution (Pruned Blueprints)
metadata_validator_helpers.go).vars:block.3. Unified Dynamic CI/CD Pipeline Playbooks
key_type: "munge"variables out of integration playbooks (slurm-integration-test.yml).test -f /etc/slurm/slurm.key || test -S /var/run/munge/munge.socket.2Submission Checklist
NOTE: Community submissions can take up to 2 weeks to be reviewed.
Please take the following actions before submitting this pull request.