Easily Connect to Dask from Outside of Saturn Cloud

While Saturn Cloud provides client resources to connect to Dask clusters, you can also directly connect from external locations.

Sometimes you’re running code and you come across such a slow function that you wish that you could run it on some other, faster machine (or set of machines in parallel!). Maybe you’re trying to train 1000 neural networks to generate memes at the same time. Or maybe you have to download 5,000 individual csvs and aggregate each one before combining them. Dask is a great tool for Python users to do such a thing–it lets you pass Python code to a cluster of workers to execute in parallel. But setting up a Dask cluster takes effort, and depending on how you set it up, you may have to switch from whatever environment your code previously was executing on to another environment (like one on the cloud). But with Saturn Cloud, you can easily rely on the power of Dask, all from whatever computing environment your code calls home.

Dask system diagram

Dask works by having the client running the primary code call a scheduler, which passes tasks to workers to execute. Those tasks can be any Python commands–from PyTorch model training to the Dask libraries that mimic Pandas in a distributed way. But what’s even better is that client can be anywhere. In Saturn Cloud we set up a Jupyter Server so you can have your notebooks and code entirely in the cloud, but you could just as easily have that client be:

Saturn Cloud external locations

To use Saturn Cloud from another location, there are only a few steps:

Install client libraries

You need to set up the client Python to be able to communicate with Dask on Saturn Cloud. This means you need an environment that exactly matches the Dask cluster’s. It’s especially important that you install the same versions of dask, dask.distributed and dask-saturn.

pip install dask==2.30.0 distributed==2.30.1 dask-saturn==0.2.2

Then, in Python create an external Saturn Cloud connection on the client. You need the user_token and project_id from Saturn Cloud so that the platform knows who you are and where the code should execute (see the docs on how to get these).

from dask_saturn import ExternalConnection, SaturnCluster
from dask.distributed import Client

conn = ExternalConnection(
    project_id="[project_id]",
    base_url='https://app.community.saturnenterprise.io',
    saturn_token="[user_token]"
)

Lastly, create the Dask cluster from the client and wait for it to be online.

cluster = SaturnCluster(
    external_connection=conn,
    n_workers=4,
    worker_size='8xlarge',
    scheduler_size='2xlarge',
    nthreads=32,
    worker_is_spot=False,
)

client = Client(cluster)
client.wait_for_workers(4)

And that’s it! You now have a working Dask cluster within Saturn Cloud that you can call from anywhere! You can monitor the cluster performance and schedule jobs and deployments from the Saturn Cloud app. Check out our getting started documentation for more guides, and consider whether our Saturn Hosted Free, Saturn Hosted Pro, or Enterprise plan is best for you!


About Saturn Cloud

Saturn Cloud is your all-in-one solution for data science & ML development, deployment, and data pipelines in the cloud. Spin up a notebook with 4TB of RAM, add a GPU, connect to a distributed cluster of workers, and more. Request a demo today to learn more.