Dealing with Long Running Jupyter Notebooks
We’ve gotten a number of customers struggling with long running Jupyter notebooks–ones that take several hours or more to execute. Often they would come to us because these long running notebooks would at some point lose connectivity between the server and the browser, as it common with cloud services. Normally cloud services gracefully reconnect and there are no issues, but in the case of Jupyter if the connection is lost then Jupyter stops saving any output. Jupyter notebooks store all the state in the browser, meaning if there is a connectivity issue between the server running the code and the browser viewing it then the state of the notebook is lost.
If our customer’s long running code has an error in it and the connection ever cuts out, then the user has no ability to see what the output of the code was and the error messages that it created. Trying to debug these models without output is an exercise in futility. This isn’t an issue when using Jupyter locally because a computer’s connection to itself is infinitely stable, but it’s an issue when working in the cloud.
Jupyter notebooks store all their state in the browser and thus require constant network connectivity. This is a well known design issue, with many implications. While having network issues won’t cause the code in a notebook to stop executing, it will affect how the output gets saved to your notebook. The flow of a Jupyter notebook is:
- the server pushes output to your browser.
- your browser adds it to the notebook object (and renders it to the screen).
- your browser saves the notebook back to the server.
In the case when the network cuts out then this flow breaks, and no output is saved. The long term solution is for Jupyter itself to be modified to handle intermittent connections, which is a pretty active area of discussion. There is no current timeline for this to be added to the open source Jupyter.
However there is a short term strategy.
We can adjust Jupyter with just a pinch of code so that it saved the output directly to a file on the server. By doing so, even if the network connectivity cuts out the server will still have the output stored to it. It’s not perfect–in an ideal world this output would still show up in the notebook itself, but it’s an improvement to have them stored somewhere instead of lost. Put this code at the top of your long-running notebook:
import sys import logging so = open("data.log", 'w', 10) sys.stdout.echo = so sys.stderr.echo = so get_ipython().log.handlers.stream = so get_ipython().log.setLevel(logging.INFO)
Execute that at the top of your notebook. TADA! Now when you’re running the notebook all output will be mirror in the
data.log flat file.
How it works: In the Jupyter notebook, the normal
stderr File objects are replaced with
ipykernel.iostream.OutStream objects (that’s how they get displayed in the browser). This object has an echo object, which defaults to
None which can propagate output. So the first set of lines sticks a Python file object in place of the echo, and all your normal
stderr is now also being copied to disk. Exceptions are handled by the python logging system. In the default configuration, that’s not outputting to
stderr, so the second set of lines patches it to do so, and sets the log leve.
With this workaround, the worst pain of having long running Jupyter notebooks is gone. That said, aat Saturn we generally recommend making use of better hardware (GPUs) or parallelization (Dask) to avoid having to wait 10 hours for your notebook to run. However, if your problem isn’t parallelizable - this is a reasonable workaround. However if you don’t know how to parallelize it but wish you did, you should talk to us! We’re really good at it!