Connecting VSCode to a Slurm dynamically-allocated worker node
The motivation is to access dynamically scale-up/scale-down resources on a Slurm cluster but with the convenience of interactive development.
Implementation idea is that the Slurm “job” consists of a netcat to port 22, and the rest flows from that. Here a .ssh config file:
Host headnode
Hostname i-head
User myuser
ProxyCommand ssh-aws-ssm.sh %h %p
IdentityFile ~/.ssh/id_ed25519
ForwardAgent yes
ControlMaster auto
ControlPath /tmp/ssh-%r@%h:%p
Host workernode
ForwardAgent yes
User myuser
ProxyCommand ssh headnode /opt/slurm/bin/srun --unbuffered nc -q 1 localhost 22
IdentityFile ~/.ssh/id_ed25519
StrictHostKeyChecking no
ControlMaster auto
ControlPath /tmp/ssh-%r@%h:%p
And explanation:
- Head node connection is via AWS session manager helper script (as the example uses AWS parallelcluster )
- Forward agent for easier access to workers
- ControlMaster on both head and worker node for easy connection multiplexing
- Connection to worker node is a
ProxyCommand
whichssh
to the headnode and then usessrun
to run an netcatnc
job on the worker - Option “–unbuffered” is required for srun
- Option “-q 1” to netcat cleans up the job after the original ssh connection drops
StrictHostKeyChecking no
needed as the host being dynamic it is expected to change over time
With this setup VScode can connect directly to a worker node which will be automatically be spun up if needed and shutdown when no more connections.