Load Data from Kaggle

Load Kaggle datasets into Saturn Cloud
Load Data from Kaggle
Try this example in seconds on Saturn Cloud

Overview

Kaggle, in addition to its competitions and other offerings, has an expansive offering of curated and community submitted datasets. The datasets span numerous domains, sizes, and file types. This tutorial will give you the foundational information to load data from Kaggle directly into Saturn Cloud, quickly and easily!

Before starting this, you should create a RStudio server resource. See our quickstart if you don’t know how to do this yet.

Process

Create Kaggle Credentials

The first step for accessing data from Kaggle is to create an API token.

Access the account page of your Kaggle account by signing in and clicking on your username and picture in the top right. Click on the Account tab:

Kaggle account menu with arrow pointing to the Account tab

Then scroll down to the API section, and click Create New API Token:

Kaggle account page with arrow pointing to Create New API Token

This will download a file named “kaggle.json.” This file contains your username and API key. Save it in a safe place!

Open the “kaggle.json” file in your favorite text editor and you will see your Kaggle username and key.

Add Kaggle Credentials to Saturn Cloud

Sign in to your Saturn Cloud account and select Credentials from the menu on the left.

Saturn Cloud left menu with arrow pointing to Credentials tab

This is where you will add your Kaggle API key information. This is a secure storage location, and it will not be available to the public or other users without your consent.

At the top right corner of this page, you will find the New button. Click here, and you will be taken to the Credentials Creation form.

Screenshot of Saturn Cloud Create Credentials form

You will be adding two credentials items: your Kaggle username and API key. Complete the form one time for each item.

CredentialTypeNameVariable Name
Kaggle UsernameEnvironment Variablekaggle-usernameKAGGLE_USERNAME
Kaggle API KeyEnvironment Variablekaggle-api-keyKAGGLE_KEY

Copy the values from your “kaggle.json” file into the Value section of the credential creation form. The credential names are recommendations; feel free to change them as needed for your workflow. You must, however, use the provided Variable Names for Kaggle to connect correctly.

With this complete, your Kaggle credentials will be accessible by Saturn Cloud resources! You will need to restart any Jupyter Server or Dask Clusters for the credentials to populate to those resources.

Setting Up Your Resource

Kaggle is not installed by default in Saturn images, so you will need to install it onto your resource. This is already done in this example recipe, but if you are using a custom resource you will need to pip install kaggle. Check out our page on installing packages to see the various methods for achieving this!

Download a Dataset

Now that you have set up the credentials for Kaggle and installed kaggle, downloading Kaggle data is really straightforward!

In Kaggle, find the dataset you want to download.

On the dataset page, click on the three dots to the right and select Copy API Command.

Kaggle dataset page with arrow pointing to Copy API command

Now, in Saturn Cloud, open the terminal, then paste the API command. For example:

kaggle datasets download -d deepcontractor/swarm-behaviour-classification

That’s it! Your dataset will download to your current path, and you will be able to use it for calculations!

Download a Competition Dataset

Downloading a competition dataset is similarly straightforward, but it is a slightly different process.

In Kaggle, find the competition you want to download the dataset for.

Click on Data in the top menu and then copy the command displayed.

Kaggle competition dataset page with arrows pointing to the Data tab and the API command

Now, in Saturn Cloud, open the terminal, then paste the API command. For example:

kaggle competitions download -c titanic

That’s it! Your dataset will download to your current path, and you will be able to use it for calculations!