Skip to content

Secure Slurm Solution#6

Open
chinmaybaikar wants to merge 1 commit into
crusoecloud:mainfrom
chinmaybaikar:private/chinmay/restrict-ssh
Open

Secure Slurm Solution#6
chinmaybaikar wants to merge 1 commit into
crusoecloud:mainfrom
chinmaybaikar:private/chinmay/restrict-ssh

Conversation

@chinmaybaikar

Copy link
Copy Markdown
Collaborator
  • A new VPC and subnet will now be created to spin up the Slurm solution
  • The SSH access will only be allowed to the login nodes
  • All VMs within the subnet can talk to each other without any restrictions
  • NFS traffic is now only restricted from the subnet all the VMs are created in

@pacharya-pf9 pacharya-pf9 left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The one problem I see here, these changes don't seem to be backward compatible. Does deleting or managing a cluster created with previous tf work after making this change?
I don't have full understanding of the changes here so feel free to ignore my feedback.

@chinmaybaikar

Copy link
Copy Markdown
Collaborator Author

Correct! This would need to be a new release/tag since it won't be backward compatible.

Comment thread main.tf Outdated

provisioner "local-exec" {
command = "ansible-playbook -i ansible/inventory/inventory.yml ansible/slurm.yml -f 128"
command = "ansible-playbook --ssh-common-args=\"-o StrictHostKeyChecking=no -o ProxyCommand='ssh -W %h:%p -q ${local.bastion_host} -o UserKnownHostsFile=/dev/null'\" -i ansible/inventory/inventory.yml ansible/slurm.yml -f 128"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could recommend to set StrictHostKeyChecking=accept-new to make it interaction-free and Trust-on-First-Use, while still guarding against being MITM'ed.

Without this, a MITM between where ansible is running and Crusoe Cloud could result in secrets being sent to an untrusted destination 🙀

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed!

Comment thread main.tf Outdated
ansible_host = each.value.network_interfaces[0].private_ipv4.address
instance_type = each.value.type
location = each.value.location
cidr = crusoe_vpc_subnet.slurm_vpc_subnet.cidr

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this name (cidr) a bit misleading when it gets referenced in the ansible as hostvars['slurm-nfs-node-0'].cidr; it maybe implied that it was taking the CIDR of the network address of the NFS node itself.

Maybe rename cidr to slurm_vpc_subnet_cidr?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed! Appreciate the feedback

@chinmaybaikar chinmaybaikar force-pushed the private/chinmay/restrict-ssh branch from 8d855fb to 7af9e09 Compare January 16, 2025 17:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants