Setting up HTTPS and SSL for JupyterHub

How to secure JupyterHub for your data science team.

JupyterHub provides a shared computational environment for data science teams and other groups of users, allowing for customized collaboration that scales for big data. Importantly, it also allows for a single place to implement security protocols. In this post, we will go over some basic measures you can take to secure your JupyterHub deployments.

In our previous blog post on JupyterHub, we walked through the basic deployment steps for The Littlest JupyterHub (TLJH) and Zero-to-JupyterHub (ZTJH). Our recommendation for anyone looking to deploy JupyterHub as a data science platform in production was to use ZTJH. We’ll assume you’re using that for this blog post.

Once you have Zero-JupyterHub up and running, security is the top priority. You should feel confident that your data science platform is safe and that your users can access it easily. In this post, we strive to not only show how to secure your JupyterHub with HTTPS and SSL, but why each of these steps is important. When we’re done, you will have the most common security measures in place to keep bad actors out.

Reminder: the helm upgrade command

As described in the previous post, Helm is the Kubernetes package manager used to install and update JupyterHub running on our Kubernetes cluster and in our case deployed on AWS EKS.

When we update config.yaml, we will run the helm upgrade command given below. We will refer back to it throughout the blog post:

helm upgrade --cleanup-on-fail \
  <your-release-name> jupyterhub/jupyterhub \
  --namespace <your-namespace> \
  --version=<JH-helm-chart-version> \
  --values config.yaml

NOTE: In our previous post, we recommended that you save your values, those in brackets <...>, as comments in your config.yaml.

Values included in the helm upgrade command:

  • <your-release-name> - given that the same “chart” (package) can be installed multiple times on the same Kubernetes cluster, this release name is simply a way of distinguishing between those different installations.
    • In our case, we used ztjh-release.
  • <your-namespace> - this is the Kubernetes namespace that JupyterHub will be created in. If that namespace doesn’t exist, it will create it for you.
    • In our case, we went with ztjh.
  • <JH-helm-chart-version> - each version of JupyterHub is associated with a Helm chart version. Reference this document for more details.
    • In our case, because we are deploying JupyterHub version 1.5, we use Helm chart version 1.2.0.

Security and HTTPS

From our first blog post, our ZTJH deployment is up and running, but in its most basic form. To login as a user, we have to navigate to the EXTERNAL-IP. That is a long and confusing URL string that AWS provided. Let’s use an easier domain name instead.

We will first get a new domain name that is short and easy to remember. Then we will set up automatic HTTPS by creating a Let’s Encrypt certificate, which auto-renews every few months. This will keep our friendly domain name secure behind HTTPS.

HTTP stands for hyper-text transfer protocol. It is the standard protocol used to transfer data over the internet. HTTPS is simply the encrypted or secured (hence the “S”) extension of HTTP. By using HTTPS you can guard the connection from third parties being able to read it. We establish the secure connection using transport layer security, or TLS.

Register your domain name

The JupyterHub documentation for this step is quite sparse. This is because of how many different domain providers there are. To give you a sense of how to do this with your provider, we will walk through each step of the process with hover.com as an example. First buy the domain name you would like to use. In our case, we chose “demohub.tech”, which at the time of this writing was on sale for five bucks.

1. Create a CNAME record for your domain

With a newly purchased domain, create a “CNAME” record that points to the EXTERNAL-IP. A “CNAME”, or Canonical Name, is a DNS record that points to another domain name, in our case the one provided by AWS, whereas an A-record points to an IP address. How you do this depends on which domain provider you’re using.

For our hover.com example, we will first navigate to the “DNS” tab, then select “ADD A RECORD”.

Screenshot of Hover DNS tab with Add a Record button highlighted

For the DNS record, use these options:

  • “TYPE”, select “CNAME”
  • “HOSTNAME”, choose a hostname. In our case, we selected “my.demohub.tech”
    • If you’d like to use the domain name without any prefix, enter “@”.
  • “TARGET”, paste the EXTERNAL-IP URL from AWS.

Screenshot of Hover Create DNS Record menu

2. Wait for the DNS to propagate

DNS records take time to be updated on the servers, so be patient while that happens over the next few minutes (or hours in some cases). For those interested to learn more on how DNS works, have a read through this amusing comic.

You will know when the DNS changes have propagated successfully when you can access your JupyterHub from your new domain.

NOTE: It’s CRITICAL that you wait for these changes to propagate before proceeding.

Add Let’s Encrypt certificate

Now that we can access our JupyterHub from an easy domain name, we’ll add a TLS certificate to increase security even more. Just as the JupyterHub docs outline, we will use Let’s Encrypt for us.

1. Update the config

Update the config.yaml that you used for your initial deployment by adding the following:

proxy:
  https:
    enabled: true
    hosts:
      - <your-domain-name>
    letsencrypt:
      contactEmail: <your-email-address>

In our example, our domain name is my.demohub.tech.

2. Run helm upgrade

Run the helm upgrade command.

Wait a few minutes and then navigate to your domain. You should see that your JupyterHub is further secured by TLS, represented by the little lock symbol next to your domain name in the browser. You may also notice that the address no longer starts with http, but instead with https.

Screenshot of domain name showing lock HTTPS symbol

Conclusion

We covered a few of the most important and common security topics that should be considered for any JupyterHub deployment. For more on additional security topics not covered here, feel free to review the security section of the Zero-to-JupyterHub docs.

Ultimately, we hope this blog helped you understand the steps needed to provide a base level of security, and some of the reasons each piece helps to keep your JupyterHub safe. It is certainly important to consider security early in your deployment so that you can establish the necessary protocols before your users log in. By properly securing your data science platform, you can prevent vulnerabilities that bad actors can exploit.

Don’t want to set this up yourself?

Check out other resources on setting up JupyterHub:

Additional Resources: