Install TensorFlow and PyTorch with CUDA, cUDNN, and GPU Support in 3 Easy Steps

Set up a cutting-edge environment for deep learning with TensorFlow 2.10, PyTorch, Docker, and GPU support.
Copyright (©) 2023 Gretel.
Copyright (©) 2023 Gretel.

Getting Started

Latest update: 3/6/2023 - Added support for PyTorch, updated Tensorflow version, and more recent Ubuntu version

Setting up a deep learning environment with GPU support can be a major pain. In this post, we'll walk through setting up the latest versions of Ubuntu, PyTorch, TensorFlow, and Docker with GPU support to make getting started easier than ever. Prefer video? Check out the live walk-through of this post on Gretel’s Youtube channel.

Hardware

Tested with NVIDIA Tesla T4 and RTX 3090 GPUs on GCP, AWS, and Azure. Any NVIDIA CUDA compatible GPU should work.

Software

  • Ubuntu 22.04 LTS
  • Python 3.9
  • Anaconda package manager

Step 1 — Install NVIDIA CUDA Drivers

These are the baseline drivers that your operating system needs to drive the GPU. NVIDIA recommends using Ubuntu’s package manager to install, but you can install drivers from .run files as well. 

sudo apt-get install nvidia-driver-510-server

Now that you have installed the drivers, reboot your system.

sudo reboot

Log back in and validate that the drivers installed correctly by running NVIDIA’s command line utility.

nvidia-smi
Figure 1: Validate that NVIDIA drivers are correctly installed

Step 2 — Set up TensorFlow and PyTorch with GPU support

Install the Anaconda package manager. Navigate to Anaconda | Anaconda Distribution and download the x86 installer for Linux, or use the command below.

wget https://repo.anaconda.com/archive/Anaconda3-2022.10-Linux-x86_64.sh
sh Anaconda3-2022.10-Linux-x86_64.sh

Sign out and sign back in via SSH or close and re-open your terminal window. Now we’ll set up virtual environments for TensorFlow and PyTorch.

PyTorch

We’ll start with PyTorch because it’s way less complicated ;-), following PyTorch’s official instructions

conda create –name=pytorch python=3.9
conda install pytorch torchvision torchaudio pytorch-cuda=11.6 -c pytorch -c nvidia

# Verify install
python3 -c "import torch; print(torch.cuda.is_available())"
Figure 2: Verify that PyTorch can access CUDA

TensorFlow

 Now for TensorFlow support. We’ll follow the official instructions, and create a dedicated conda environment for TensorFlow to make sure that the library requirements don't have any conflicts. 

conda create --name=tf python=3.9
conda activate tf
conda install -c conda-forge cudatoolkit=11.2.2 cudnn=8.1.0

mkdir -p $CONDA_PREFIX/etc/conda/activate.d
echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/' > $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh

Sign out and sign back in via SSH or close and re-open your terminal window. Reactivate your conda session. 

conda activate tf
python3 -m pip install tensorflow==2.10

# Verify install:
python3 -c "import os; os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'; import tensorflow as tf; print('Num GPUs Available: ', len(tf.config.list_physical_devices('GPU')))"
Figure 3: Verify that Tensorflow has GPU support

You’ve done it!

Step 3 — Set up Docker with GPU support (optional)

Now that we have the NVIDIA drivers installed, we’ll install Docker, which will let you run GPU-accelerated containers in your environment. Both NVIDIA and TensorFlow state that containers are the easiest way to run GPU-accelerated machine learning applications.

curl https://get.docker.com | sh 
sudo systemctl --now enable docker

# Enable docker to run without root permissions
sudo groupadd docker
sudo usermod -aG docker $USER

Sign out and sign back in via SSH or close and re-open your terminal window. Now confirm docker is running.

docker run hello-world

Now, we’ll enable Docker containers to run with GPU acceleration, following NVIDIA’s guide, or just run the commands below.  

Set up package repository

distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
	&& curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
  && curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
  sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

Update the package manager

sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

Validate that you can run Docker containers with GPU acceleration

docker run --rm --runtime=nvidia --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 
nvidia-smi
Figure 4: Validate that Docker can run a container with GPU acceleration

Woot! You’re good to go running GPU-accelerated ML containers on your workstation. 

Let us know if you have any questions in our Discord Community!