Logging

Log output of tasks on Dask workers in Saturn Cloud
Logging
Try this example in seconds on Saturn Cloud

Logging in Dask

When writing code a natural method of keeping track of how code runs is through logging. Typically in Python, logging is done using the built in logging module, like this:

import logging

logging.warning("This is a warning")
logging.info("This is non-essential info")

Unfortunately, if you try and use this style of logging from within a Dask Delayed function, you won’t see any output at all. You won’t see it in the console if you’re running a Python script nor will you see it after a cell within a Jupyter Notebook. This is also the case for print calls–they won’t be captured if they are run within a Dask Delayed function. So an alternate approach is needed for logging within Dask.

Instead, to do logging we’ll need to use the distributed.worker Python module, and import logger. This will give us a logging mechanism that does work in Dask. Here is an example of it in action.

First, start the Dask cluster associated with your Saturn Cloud resource.

from dask_saturn import SaturnCluster
from dask.distributed import Client

client = Client(SaturnCluster())

After running the above command, it’s recommended that you check on the Saturn Cloud resource page that the Dask cluster as fully online before continuing. Alternatively, you can use the command client.wait_for_workers(3) to halt the notebook execution until all three of the workers are ready.

Next is an example of a Dask command that logs the result in a way that can be saved. Notice the logger.info call using the special logger from distributed.worker:

import dask
from distributed.worker import logger


@dask.delayed
def lazy_exponent(args):
    x, y = args
    result = x**y
    # the logging call to keep tabs on the computation
    logger.info(f"Computed exponent {x}^{y} = {result}")
    return result


inputs = [[1, 2], [3, 4], [5, 6], [9, 10], [11, 12]]
outputs = [lazy_exponent(i) for i in inputs]
futures = client.compute(outputs, sync=False)

results = [x.result() for x in futures]
results

The logs generated using distributed.worker won’t show up in the console output or in a Jupyter Notebook still. Instead they’ll be within the Saturn Cloud resource logs. First, click the “logs” link of the resource you’re working in:

From there, expand each of the Dask workers. The logs from each worker are stored individually, but select Aggregated Logs to view them all at once:

dask-workers

Those will show the logs created by the Dask worker. Notice that there is lots of information there, including how the worker was started by Dask. Near the bottom you should see the logs we wanted, in this case the ones generated by lazy_exponent:

logs-dask

There we correctly see that the logs included the info logging we did within the function. That concludes the example of how to generate logs from within Dask. This can be a great tool for understanding how code is running, debugging code, and better propagating warnings and errors.

import logging

logging.warning("This is a warning")
logging.info("This is non-essential info")


from dask_saturn import SaturnCluster
from dask.distributed import Client

client = Client(SaturnCluster())


import dask
from distributed.worker import logger


@dask.delayed
def lazy_exponent(args):
    x, y = args
    result = x**y
    # the logging call to keep tabs on the computation
    logger.info(f"Computed exponent {x}^{y} = {result}")
    return result


inputs = [[1, 2], [3, 4], [5, 6], [9, 10], [11, 12]]
outputs = [lazy_exponent(i) for i in inputs]
futures = client.compute(outputs, sync=False)

results = [x.result() for x in futures]
results