How to Install Tensorflow on the GPU with Docker

In this blog, we’ll discuss and explore how to install Tensorflow-GPU using docker.

This tutorial will discuss setting up Tensorflow on GPUs with docker.

How to Install Tensorflow on the GPU with Docker

Introduction

The pace at which deep learning has risen is speedy and spectacular. It has led to significant innovations and several new research and training methods.

An example is the popular deep learning library used to build and construct models to find solutions to numerous tasks, i.e., Tensorflow. It is regarded as one of the best libraries which can solve almost any question related to deep learning and neural networks.

Though tensorflow performs effectively with most simpler and smaller datasets on a CPU, its real power is utilizing the Graphics Processing Unit (GPU).

GPU coupled with this deep learning framework will get you great results when we talk about performance in the tasks you are doing. But many times, installation of Tensorflow on a GPU environment is not easy because of CUDA errors that may arise.

In this blog, we’ll discuss and explore how to install Tensorflow-GPU using docker.

Dont want to set up Tensorflow with GPU support Locally?

With Saturn Cloud, you can use TensorFlow on the cloud for free with GPU support.

Why Docker

When it comes to modern state-of-art models, they are renowned for being extremely large and over-parameterized; in fact, they have many more parameters than data points in the dataset. These models depend on multiprocessing and distribution modules like torch.distributed or tf.distribute since they demand enormous amounts of computing to train.

Now let’s say you somehow manage to write a parallel code successfully; you still need to ensure that all of your accelerators are “visible” and your CUDA version matches what your primary library supports (dependency hell ☠️)

What is the solution to this?

By offering preset images with the best CUDA setup for each version, Docker makes this process infinitely better. To further simplify the process, you can even build open these pre-existing images and add your unique libraries and frameworks.

Installing Docker for GPU

We want to run the TensorFlow container image and take advantage of the GPUs in our system, and to do this, we need to have a particular version of docker to work with the GPU. This is because docker containers are platform and hardware agnostic, so there will be a problem when using specialized hardware such as NVIDIA GPUs as they require kernel modules and user-level libraries.

Due to this, Docker does not natively support NVIDIA GPUs within containers.

We will use Nvidia docker to enable portability in our Docker image, leveraging NVIDIA GPUs in our system.

Nvidia-docker is essentially a wrapper around the docker command that transparently provisions a container with the necessary components to execute code on the GPU.

To install docker for GPU, we will run the following commands:

curl https://get.docker.com | sh \
  && sudo systemctl --now enable docker

To point to the specific installation files for GPU-compatible Docker, we will execute the following command:

distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
   && curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo     apt-key add - \
   &&curl-s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

Now, we will update Ubuntu’s repositories using the following command so that the new changes we have made will be made available.

# Updating ubuntu’s repositories
sudo apt-get update

Now, we can install the Nvidia GPU compatible docker version using the following command:

sudo apt-get install -y nvidia-docker2

We’ll need to restart Docker to ensure the installation changes take effect.

sudo systemctl restart docker

Setting Up TensorFlow With GPU Support

TensorFlow provides several images depending on your use case, such as latest, nightly, and devel, devel-gpu.

But most of the time, when working on a project, you must work with other additional libraries or packages not included in the standard TensorFlow image.

Because of this, building a custom TensorFlow image will be useful since you can augment it with other additional libraries you are working with.

Through the following steps, we can build a custom TensorFlow image with Docker:

Step 1: Creating a Dockerfile

To begin, we will need first to create a Dockerfile which defines how our custom image will be built.

  1. Choosing the base image

    • Since TensorFlow causes most issues, TF, CUDA, and cuDNN versions must be compatible. See this site for the appropriate TF, cuDNN, and CUDA versions.

    • Most people choose a base image from a TF docker image however, when you check on this site, official Tesnorflow only supports CUDA 11.2 ( or 11.0 or 10.1), which makes it impossible to start from CUDA 11.3.

    • To solve that, you can choose a base image that has already installed cuda=11.3 and cudnn=8 and then look for a way to install TensorFlow.

    • We will use nvidia/cuda:11.3.1-cudnn8-devel-ubuntu20.04 as our base image.

Using your preferred text editor, create a new file named Dockerfile in a new directory and then add the following content.


FROM nvidia/cuda:11.3.1-cudnn8-devel-ubuntu20.04

# Install additional packages
RUN apt-get -y update && \
         apt-get -y upgrade && \
         apt-get install -y python3-pip python3-dev

RUN apt-get install -y git

# Install any python packages you need
COPY requirements.txt requirements.txt

RUN python3 -m pip install --upgrade pip && \
   python3 -m pip install -r requirements.txt


COPY . .
# alias
RUN echo 'alias [python](https://saturncloud.io/glossary/python)="python3" ' >> ~/.bashrc
RUN echo 'alias pip="pip3" ' >> ~/.bashrc

CMD tail -f  /dev/null

Step 2: Building and running the Docker image

While in the same directory as our Dockerfile, we will run the following command to build the image from the Dockerfile.

# Create the image “tensorflow_image” from the file “Dockerfile”
docker build -t tensorflow_image . -f Dockerfile

After building the image, using the following command, we will create a container from that image and run it.

# create and run a container from the above image
docker run --name tensorflow_container --gpus all -w="/working" tensorflow_image bash

Then, execute the following command to enter the container:

# Enter the “tensorrflow_container”
[docker](https://saturncloud.io/glossary/docker) exec -it tensorflow_container bash

While within the container, we can check the following:

# Check if the NVIDIA Driver is recognized  
nvidia-smi
# Check the version of CUDA  
nvcc --version
# Check the version of cuDNN  
cat /usr/include/cudnn_version.h | grep CUDNN_MAJOR -A 2

Now, we can go ahead to install Tensorflow as highlighted in step 5 of this official tutorial. TensorFlow requires a recent version of pip, so we will first upgrade our pip installation and install TensorFlow using pip.

pip install --upgrade pip  
pip install tensorflow==2.9.1

And YES! We have installed Tensorflow.

Now let us check if it works.

Testing our installation

To check if TensorFlow GPU has been installed properly on the machine, we will check as it is in the official tutorial step 6.

# Verify the CPU setup  
python3 -c "import tensorflow as tf; print(tf.reduce_sum(tf.random.normal([1000, 1000])))"  
# A tensor should be return, something like  
# tf.Tensor(-686.383, shape=(), dtype=float32)  
  
# Verify the GPU setup  
python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('[GPU](https://saturncloud.io/glossary/gpu)'))"  
# A list of GPU devices should be return, something like  
# [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

Conclusion

In this article, we have seen how you can set up Tensorflow to train deep learning models on all of your GPUs using Docker to make distributed training easier.

You may also be interested in:


About Saturn Cloud

Saturn Cloud is your all-in-one solution for data science & ML development, deployment, and data pipelines in the cloud. Spin up a notebook with 4TB of RAM, add a GPU, connect to a distributed cluster of workers, and more. Request a demo today to learn more.