How to securely connect to AWS SageMaker using SSH through a Bastion Host

AWS does not natively support SSH-ing into SageMaker notebook instances, but nothing really prevents you from setting up SSH yourself.
The only problem is that these instances do not have a public IP address, and because of this the only way is to either create a reverse proxy or connect to the instance through a bastion host.
In our last blog, we explained how you are able to connect to the AWS SageMaker notebook instance through SSH using ngrok reverse proxy which is a 3rd party proxy.
This is one of the ways of using SSH with AWS SageMaker, today we are going to look at another method of creating communication between your local computer with the AWS SageMaker notebook instance, using Bastion host (Internal proxy solution)
If you haven’t gone through our last blog on Using SSH to connect to AWS SageMaker using ngrok reverse proxy, I advise you read it before reading this.
Don’t want to set this up yourself?
JupyterHub installations can be complex to set up and even more complex to manage. If you want a quicker solution for your team, consider Saturn Cloud Hosted Organizations or Saturn Cloud Enterprise.
Introduction to Bastian Host
A bastion host which is also called a “Jump box or server” is the only host or computer which is allowed to access a public network. It is usually powerful with high network security as this is the only host which is allowed public access.
Amazon VPC enables you to launch AWS resources on a virtual private network that you have defined. The bastion host runs on an Amazon EC2 instance that is typically in a public subnet of your Amazon VPC.
In the context of using SSH to connect to AWS SageMaker, a bastion host is a server instance (EC2 instance) that you have to SSH into before you are able to SSH into the AWS SageMaker notebook instance.
Creating a Bastion Host
In the following steps, we are going to set up a bastion host that will help us to create a connection to SageMaker.
Step 1: Create an EC2 instance on your AWS account. The only purpose of this EC2 instance is to access other servers, so you should take one that is small with less cost.
Step 2: After creating the EC2 instance, create a security group for the Bastion host that opens up port 22, which is for SSH, and then select “My IP” as the source. This will control who accesses the Bastion so that it will be only you and other teammates who add their IPs.
Step 3: In your AWS account, change the security groups of existing instances so that any inbound SSH is only accessible via the Bastion host’s IP address.
Step 4: Now, edit your local ~/.ssh/config
file and add the following:
Host bastion
Hostname 11.111.111.11
User username
ForwardAgent yes
Where hostname is the IP address of the bastion host and the username is the one that use to log into the server.
ForwardAgent yes
sets up SSH forwarding from your local machine to the Bastion host so that the .pem file you use to access your EC2 instances will be made available when you try to connect to the AWS SageMaker notebook instance.
After following the above steps successfully, you are able to SSH into your bastion server by just typing in ssh bastion
from the command line.
Now after we have made sure that we can SSH into our bastion, it is time to set it up ready to connect to the AWS SageMaker machine.
- In your AWS account, change the security group of your SageMaker machine to allow inbound TCP traffic on port 22 from the bastion group.
- In your
~/.ssh/config
file, add the following that will make sure all connections to hostnames ending with.ec2.internal
go through our bastion box.
Host *.ec2.internal
User ec2-user
UserKnownHostsFile /dev/null
StrictHostKeyChecking no
ProxyCommand ssh -W %h:%p ec2-user@bastion
- The next step is to make sure that we always know the internal address of the SageMaker machine so as to be able to create the connection. We will have to edit the lifecycle configuration to write that address into a file so we can see it from the browser.
Follow the steps as done below to edit the lifecycle configuration, and if you are new to the term lifecycle configuration and how to set it up, check our previous blog.
echo "Downloading on-start.sh..."
# save the existing on-start script into on-start.sh
aws sagemaker describe-notebook-instance-lifecycle-config --notebook-instance-lifecycle-config-name "$CONFIGURATION_NAME" | jq '.OnStart[0].Content' | tr -d '"' | base64 --decode > on-start.sh
echo "Adding bastion SSH setup to on-start.sh..."
# add the code to persist conda environments
echo '' >> on-start.sh
echo '# write ssh instructions' >> on-start.sh
echo 'curl https://raw.githubusercontent.com/mariokostelac/sagemaker-setup/master/scripts/ssh/on-start-bastion.sh | bash' >> on-start.sh
echo "Uploading on-start.sh..."
# update the lifecycle configuration config with updated on-start.sh script
aws sagemaker update-notebook-instance-lifecycle-config \
--notebook-instance-lifecycle-config-name "$CONFIGURATION_NAME" \
--on-start Content="$((cat on-start.sh)| base64)"
That is it. Now we are close to finishing our connection to the SageMaker machine.
Connecting to AWS SageMaker Machine using SSH Keys
Now that we have defined a way to make a connection from our local machine to SageMaker machine, we will now make an authorization using SSH keys.
Copy SSH keys from your local ~/.ssh/id_rsa.pub
into ~SageMaker/ssh/authorized_keys
on your SageMaker instance.
Once you’ve done that, make sure you run copy-ssh-keys
on the SageMaker terminal so these keys are copied to the correct location.
Now, from the SageMaker machine terminal, run the following code;
/sbin/ifconfig eth2 | grep 'inet' | cut -d: -f2 | /sbin/ifconfig eth2 | grep 'inet' | cut -d: -f2
And then copy the private IP as below;
In my case is 10.0.0.20, then return to your bastion host terminal and type
ssh ec2-user@10.0.0.20
and wuala! you are in your AWS SageMaker Notebook instance.
Conclusion
We have learned how to create a connection from your local computer to the AWS SageMaker machine using a bastion host (Internal proxy).
This method is suitable for organizations with multiple teams because it provides more control over who can access the machine and also no third-party services involved.
Don’t want to set this up yourself?
JupyterHub installations can be complex to set up and even more complex to manage. If you want a quicker solution for your team, consider Saturn Cloud Hosted Organizations or Saturn Cloud Enterprise.