Setting up JupyterHub with Single Sign-on (SSO) on AWS

How to securely set up JupyterHub with SSO for your data science team

Single sign-on (SSO) is a method to authenticate login into multiple services with a single set of user credentials. SSO offers an increased security layer to your data science team, code and data by reducing the attack surface area to only one set of user credentials. It is considered to be a standard enterprise feature for any software used in modern corporate environments. Below, we will discuss how to set up JupyterHub as well as how to set up SSO to meet your team’s security needs.

In our previous blog post on JupyterHub, we walked through the basic deployment steps for The Littlest JupyterHub (TLJH) and Zero-to-JupyterHub (ZTJH). Our recommendation for anyone looking to deploy JupyterHub as a data science platform in production was to use ZTJH. We’ll assume you’re using that for this blog post.

Once you have Zero-JupyterHub up and running, security is the top priority. You should feel confident that your data science platform is safe and that your users can access it easily. In this post, we strive to not only show how to secure your JupyterHub, but why each of these steps is important. When we’re done, you will have the most common security measures in place to keep the bad actors out.

Reminder: the helm upgrade command

As described in the previous post, Helm is the Kubernetes package manager used to install and update JupyterHub running on our Kubernetes cluster and in our case deployed on AWS EKS.

When we update config.yaml, we will run the helm upgrade command, given below. We will refer back to it throughout the blog post:

helm upgrade --cleanup-on-fail \
<your-release-name> jupyterhub/jupyterhub \
--namespace <your-namespace> \
--version=<JH-helm-chart-version> \
--values config.yaml

NOTE: In our previous post, we recommended that you save your values, those in brackets <...>, as comments in your config.yaml.

  • <your-release-name> - given that the same “chart” (package) can be installed multiple times on the same Kubernetes cluster, this release name is simply a way of distinguishing between those different installations.
    • In our case, we used ztjh-release.
  • <your-namespace> - this is the Kubernetes namespace that JupyterHub will be created in. If that namespace doesn’t exist, it will create it for you.
    • In our case, we went with ztjh.
  • <JH-helm-chart-version> - each version of JupyterHub is associated with a Helm chart version. Reference this document for more details.
    • In our case, because we are deploying JupyterHub version 1.5, we use Helm chart version 1.2.0.

Managing users using OAuth 2.0

To add users to your JupyterHub, you currently need to add them to the config.yaml and have them set a password upon first login. Although this is better than nothing, we can go a step further and configure JupyterHub to use an OAuth 2.0 provider (from here on referred to simply as OAuth).

One of the most obvious benefits of using OAuth is that you get single sign-on (SSO). Your users no longer need to remember an additional username and password to login. Using an OAuth provider like GitHub or Google makes it so the users only need to remember account information for accounts they already regularly use. Making it easier for your users to log in securely is a security benefit in itself. Also, multi-factor authentication (MFA) can be enabled for these providers if desired.

Besides easy logins, there are many technical reasons why OAuth is the industry standard protocol for authenticating users. At a high-level they include the use of tokens, which limit the scope of user information that is shared, and the fact that the authentication server is also required to use TLS to keep the data encrypted.

The ZTJH docs detail how to configure and setup an OAuth for a variety of different providers including GitHub, Google, Azure Active Directory, Auth0, etc. The steps needed to setup an OAuth application for each provider will be slightly different, but the overall procedure is similar. We will walk through the steps on GitHub to give a detailed example.

GitHub OAuth setup

Before getting started, you will need a GitHub account if you don’t have one already. It’s free and the process of setting up the OAuth application is fairly straight-forward.

  1. Create the OAuth application in Github.

Once logged in, navigate to the Settings page by clicking on your profile picture in the top-right of the screen.

GitHub OAuth 4

Then click Developer Settings at the bottom left. Select OAuth Apps, and then click New OAuth App.

GitHub OAuth 3

On this screen, we need to change the following values:

  • “Application name” - give your OAuth application a memorable name.
    • We went with ztjh-oauth.
  • “Homepage URL” - enter your domain name.
  • “Authorization callback URL” - it’s important this is configured correctly for the authorization process to work. Enter https://<your-domain-name>/hub/oauth_callback.

Click “Register application” when you’re done.

Github OAuth

Now just copy the “Client ID”, and create and copy a “Client secret”. You will use these in the next step.

Github OAuth

  1. Update the helm configuration file

With the GitHub OAuth application created, and the Client ID and Client secret in hand, update your config.yaml accordingly.

  hub:
    config:
      GitHubOAuthenticator:
        client_id: <your-client-id>
        client_secret: <your-client-secret>
        oauth_callback_url: https://<your-domain-name>/hub/oauth_callback
      JupyterHub:
        authenticator_class: github

You don’t need to delete anything (such as the Authenticator key). Simply ensure that the fields shown above are populated.

  1. Run the helm upgrade command for the changes to take effect.

This may take a minute or two, but once the changes are in, you can navigate to to find a “Sign in with GitHub” button.

sign in button

Upon your first login, you will be rerouted to GitHub and asked to login.

If you encounter an issue like “400: Bad Request” or similar, try accessing your JupyterHub in a private browser session. It’s also worthwhile double-checking that the oauth_callback_url in the config.yaml matches what you have configured in the GitHub OAuth application.

Conclusion

SSO is a critical building block for setting up a secure JupyterHub installation. This article walks through setting up SSO via github authentication, however the same approach can be used for Okta, Google Auth, Azure AD, and many other identity providers.

Other important topics include setting up SSL, and loading images from private container repositories. These are covered in our article on JupyterHub and security.

Check out other resources on setting up JupyterHub:


About Saturn Cloud

Saturn Cloud is your all-in-one solution for data science & ML development, deployment, and data pipelines in the cloud. Spin up a notebook with 4TB of RAM, add a GPU, connect to a distributed cluster of workers, and more. Request a demo today to learn more.