Skip to content

Setup your Data Science stack with GPU enabled TensorFlow the easy way [Updated]

Last updated on 2021-02-16

For us Data Science practitioners, sooner than later we cross paths with TensorFlow. Due to the intensive computing nature of any deep learning task, the expected performance difference between leveraging on GPU power or just CPU force is abysmal. This DATAmadness post shows the results of an experiment with a powerful but not out-of-this-world desktop setup. While using the GPU (an RTX 2080 Ti) the required time to train a model was reduced by 85%. Just think of what this represents in a complex model!

Considering the before mentioned performance, it simply is a no-brainer to use your GPU power if possible. As with everything in life, this has some drawbacks which are mainly:

  • Compatibility and Cost. Right now TensorFlow can only work in a straightforward way with NVIDIA’s GPUs. RTX models with the Turing architecture are the ones to shoot for. If you are serious about this, your entry-level card should be at the very minimum an RTX 2070, which as when this was written is being sold for $519 USD on Amazon (I recommend EVGA above other brands).
  • No love for macOS. Even if you try to use Thunderbolt and an eGPU case this won’t work on macOS. You strictly need to natively run WSL2 on Windows (Build 20150 or higher) or Linux.
  • Portability. I really prefer working on a desktop, but sometimes you have to work on the go. I doubt you’re willing to carry everywhere a 3 to 4.5 kg (7 to 9 pounds) eGPU on a daily basis. My practical solution? ultrabook (for prototyping) → ssh connection → workstation at home (doing the heavy lifting).

Once the inconveniences were discussed, if you’re up to it, let’s get started on how to set up this.


Step 1: Make sure you’re using the latest proprietary NVIDIA drivers

  • If you’re In the penguin world, you know what the proprietary word means. The nouveau drivers won’t do the trick here. YMMV but after struggling for a while with Fedora trying to get everything to work properly I gave up and moved to Pop!_OS. There’s a version with NVIDIA drivers pre-installed. That’s the one you may want to grab. Update after installing of course.  I’m not missing Fedora at all by the way!.  Edit Feb 2021: There are valid reasons to miss Fedora! I had to come back to it because of a BIOS update. Therefore I’m adding the necessary steps for it and distinguishing between Ubuntu / Pop!_OS  and Fedora when necessary.
  • In the Fedora case, there’s an excellent guide on how to achieve this available at . The guide is written by JR and constantly updated. Once you get through it come back here to follow the process.


Step 2: Get ready to ship with Docker

Even if you’re completely unfamiliar with Docker as I was not so long ago, fear not. This is easier than you think. If this is your case, you may want to check later a video that will kindly introduce you to Docker from the plethora available on YouTube, like this one.

To install docker from the official docker repositories we shall do as follows:


# For Pop!_OS & Ubuntu
$ sudo apt update $ sudo apt install apt-transport-https ca-certificates curl software-properties-common $ curl -fsSL | sudo apt-key add -
# if your version is 20.04 
$ sudo add-apt-repository "deb [arch=amd64] focal stable"
# if your version is 20.10
$ sudo add-apt-repository "deb [arch=amd64] groovy stable"
$ sudo apt update
$ sudo apt install docker-ce

# For Fedora
$ sudo dnf update
## remove old versions
$sudo dnf remove docker \
                  docker-client \
                  docker-client-latest \
                  docker-common \
                  docker-latest \
                  docker-latest-logrotate \
                  docker-logrotate \
                  docker-selinux \
                  docker-engine-selinux \
$ sudo dnf -y install dnf-plugins-core
$ sudo dnf config-manager \
--add-repo \
$ sudo dnf install docker-ce docker-ce-cli

The docker daemon is bound to a Unix socket owned by the user root. To be able to use docker without using sudo use the following commands:

$ sudo usermod -aG docker $USER
$ su - ${USER}
$ id -nG

Verify that you can use docker without sudo:

$ docker pull hello-world:latest
$ docker run hello-world 
Hello from Docker!

[UPDATE October 2020]:

It seems that there are some issues with Docker 19.3 and later versions that prevent containers from working with GPUs properly. In such case an error similar to: “Could not select device driver … with capabilities: [[gpu]] ” is thrown. In order to fix this, the nvidia-container-toolkit must be installed. The first step is to check your distribution identifier here. In my case I’m using Pop!_OS 20.04, therefore “ubuntu20.04” is the right distribution id. As for Fedora 33 we’ll use “rhel8.3”.

# For Pop!_OS and Ubuntu
$ curl -s -L | sudo apt-key add -
# Substitute with your DISTRIBUTION_ID_VALUE
$ curl -s -L[YOUR_DISTRIBUTION_ID]/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
$ sudo apt update
$ sudo apt install nvidia-container-toolkit
$ sudo systemctl restart docker

# For Fedora
$ curl -s -L | sudo gpg --import -
$ curl -s -L | sudo tee /etc/yum.repos.d/nvidia-docker.repo
$ curl -s -L | sudo tee /etc/yum.repos.d/nvidia-docker.repo
$ sudo dnf update
$ sudo dnf install nvidia-container-toolkit
$ sudo systemctl restart docker

Fedora only:

Because Fedora uses cgroupsv2 instead of the old cgroups we have to tweak the nvidia-containter-runtime configuration

sudo nano /etc/nvidia-container-runtime/config.toml

once in the file uncomment the no-cgroups parameter by removing the # at the beginning of the line and set it to true. this is,

no-cgroups = true

Exit the file, save the changes and reboot.


Step 3: Pull the TensorFlow Docker image and customize it with your toys

Next up we will pull the latest official Docker image with TensorFlow and GPU support enabled:

$ docker pull tensorflow/tensorflow:latest-gpu-jupyter

By now we could start a container from the image we just got, but the issue is that most likely it does not have all the libraries you may require on your day-to-day work (e.g. pandas, seaborn, etc ). When a running container is stopped it “dies” with everything it had on it, when you start it again, what you’re really doing is creating a totally new container from the image. Therefore, running a container and installing the libraries isn’t really a smart move because next time you run a container from that image those libraries won’t be there.

The way to tackle this problem is obviously building our own customized image -with our wanted libraries- taking the one we pulled as a base as follows:

$ cd /tmp
$ mkdir tf2-gpu-custom && cd tf2-gpu-custom
$ nano Dockerfile

Inside the nano text editor, we introduce the following lines, the first indicates the base image and the second installs all of our goodies. Fit that one to your requirements.

FROM tensorflow/tensorflow:latest-gpu-jupyter
RUN pip install --upgrade pip
RUN pip install pandas matplotlib seaborn plotly scikit-learn statsmodels sqlalchemy pg8000 psycopg2-binary beautifulsoup4 html5lib lxml xlrd filemagic ipywidgets prettyprint autokeras keras-tuner

We exit nano and save the file by pressing Ctrl + X and then confirm with Y followed by Enter. Now everything is ready to build our customized TensorFlow and we do it by issuing the following command:

$ docker build -t tensorflow2-gpu-custom .

Yes, the “.” at the end of the command does matter. Also, make sure there’s nothing else besides the Dockerfile on the folder you’re located. The -t parameter indicates the tag you want to use for your image. You can use whatever you want but I recommended something easy to remember as you will need it often. It will take a while depending on the libraries you decided to install but at the end, if everything went right you should get an output similar to this:

Successfully built 62201d310835
Successfully tagged tensorflow2-gpu-custom:latest

Step 4: Let’s take it for a spin!

Now I’ll show you how to start your custom docker image in a container, in my particular case :

$ docker run -p 8888:8888 -v ${HOME}/docker/tf2-gpu-custom:/mnt --gpus all -it --rm tensorflow2-gpu-custom

More generally:

$ docker run -p 8888:8888 -v /path/to/host/folder:/path/to/dockercontainer/folder --gpus all -it --rm your_custom_docker_image_tag

What was that? Let’s analyze it:

  • The -p option publish the specified container’s port to the host port. Since I’ll be using the jupyter server I want to be able to access it.
  • Flag -v mounts a volume. These are the preferred mechanism for the container to access and save persistent data. I’d suggest creating a specific folder on the host for this purpose. In my case, to keep it simple I mount the volume on /mnt.
  • The -it option keeps the STDIN open even if not attached and allocates a pseudo TTY.
  • As you can imagine, the --gpus all section is what sweetens the deal and the reason we started with all this fuss.

After issuing the above command, if everything worked properly you should see the output from the just started jupyter server on your terminal similar to this:

As usual, we can access the server’s GUI in our web browser by entering the address shown at the terminal, <token_value>. To verify everything is running smoothly, we can conduct a little trial in a new jupyter notebook along with some monitoring on the GPU, to do the later I recommend using the amazing nvtop command line utility.

In the first cell of our notebook we enter the following code:

import tensorflow as tf
print(tf.reduce_sum(tf.random.normal([1000, 1000])))

When we execute the code, we should get an output similar to tf.Tensor(540.166, shape=(), dtype=float32), and more importantly, you should see a dramatic change in your GPU memory usage.

The last lines of the terminal showing the STDOUT from the jupyter server should look similar to this:

2020-09-10 21:04:25.550487: I tensorflow/core/common_runtime/gpu/] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6267 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080, pci bus id: 0000:08:00.0, compute capability: 7.5)

If you got that the expected output from the jupyter notebook, the sudden increase on the GPU memory usage and the right STDOUT: Congratulations, using TensorFlow with your mighty GPU power!


Step 5: Command line access and stopping the container

If you want to interact with the docker container through the command line, first you need to identify it in a new window terminal:

Then you just copy the CONTAINER IDfrom the output you got and enter the command:

$ docker exec -it <your CONTAINER ID>

You will be greeted with a screen that looks like this:

Finally, when you’re done using the container in order to stop it while being outside from it, just use:

$ docker stop <your_CONTAINER_ID>

Final thoughts

For those absolutely unfamiliar with Docker there’s a little chance they might find this route a little daunting. But believe me when I tell you that as of today this is the easiest path to take if you want to make use of all that GPU power which would make a whole of a difference in the time you spend training your Deep Learning models. Yes, you will struggle at first, but the sooner you start with it, the more the time savings you will get.

Published inGuides

Be First to Comment

Hey There! Let me know what you think about this post...