Load Data from Kaggle

Load Kaggle datasets into Saturn Cloud
Load Data from Kaggle
Try this example in seconds on Saturn Cloud

Overview

Kaggle, in addition to its competitions and other offerings, has an expansive offering of curated and community submitted datasets. The datasets span numerous domains, sizes, and file types. This tutorial will give you the foundational information to load data from Kaggle directly into Saturn Cloud, quickly and easily!

Before starting this, you should create a Jupyter server resource. See our quickstart if you don’t know how to do this yet.

Process

Create Kaggle Credentials

The first step for accessing data from Kaggle is to create an API token.

Access the account page of your Kaggle account by signing in and clicking on your username and picture in the top right. Click on the Account tab:

Kaggle account menu with arrow pointing to the Account tab

Then scroll down to the API section, and click Create New API Token:

Kaggle account page with arrow pointing to Create New API Token

This will download a file named “kaggle.json.” This file contains your username and API key. Save it in a safe place!

Open the “kaggle.json” file in your favorite text editor and you will see your Kaggle username and key.

Add Kaggle Credentials to Saturn Cloud

Sign in to your Saturn Cloud account and select Secrets from the menu on the left.

Saturn Cloud left menu with rectangle around Secrets tab

This is where you will add your Kaggle API key information. This is a secure storage location, and it will not be available to the public or other users without your consent.

At the top right corner of this page, you will find the New button. Click here, and you will be taken to the Secrets Creation form.

Screenshot of Saturn Cloud Create Credentials form

You will be adding two credentials items: your Kaggle username and API key. Complete the form one time for each item.

CredentialName
Kaggle Usernamekaggle-username
Kaggle API Keykaggle-key

Copy the values from your “kaggle.json” file into the Value section of the secret creation form. You must use the provided variable names above for Kaggle to connect correctly.

With this complete, your Kaggle credentials will be accessible by Saturn Cloud resources!

Setting Up Your Resource

First, you will need to attach your Kaggle credentials to your resource. On the resource’s page, navigate to the Secrets tab, and then click on the Attach Secret Environment Variable button.

Screenshot of Saturn Cloud Attach Secret Environment Variable button

Next, add the kaggle-username and kaggle-key variables one at a time by selecting them from the dropdown list. The correct Environment Variable Name field will be generated by default. Once your two secrets are attached, you’re ready to connect to Kaggle from your resource!

Kaggle is not installed by default in Saturn images, so you will need to install it onto your resource. This is already done in this example recipe, but if you are using a custom resource you will need to pip install kaggle. Check out our page on installing packages to see the various methods for achieving this!

Download a Dataset

Now that you have set up the credentials for Kaggle and installed kaggle, downloading Kaggle data is really straightforward!

In Kaggle, find the dataset you want to download.

On the dataset page, click on the three dots to the right and select Copy API Command.

Kaggle dataset page with arrow pointing to Copy API command

Now, in Saturn Cloud, create a new terminal (or open an existing one) then paste the API command. For example:

kaggle datasets download -d deepcontractor/swarm-behaviour-classification

That’s it! Your dataset will download to your current path, and you will be able to use it for calculations!

Download a Competition Dataset

Downloading a competition dataset is similarly straightforward, but it is a slightly different process.

In Kaggle, find the competition you want to download the dataset for.

Click on Data in the top menu and then copy the command displayed.

Kaggle competition dataset page with arrows pointing to the Data tab and the API command

Now, in Saturn Cloud, create a new terminal (or open an existing one) then paste the API command. For example:

kaggle competitions download -c titanic

That’s it! Your dataset will download to your current path, and you will be able to use it for calculations!