Tutorial: Deep learning with TensorFlow, Nvidia and Apache Mesos (DC/OS), Part 1

Try out TensorFlow with GPU acceleration.
Try out TensorFlow with GPU acceleration.

DC/OS 1.9 introduced GPU-based scheduling. With GPU-based scheduling, organizations can share resources of clusters for traditional and machine learning workloads, as well as dynamically allocate GPU resources inside those clusters and free them when needed. By using some of the popular libraries for machine learning (such as TensorFlow and Nvidia Docker), data scientists can test locally on their laptops and deploy to production on DC/OS without any change to their applications and models. Read on to learn more about this technology and how you can take advantage of it within Mesosphere DC/OS.

This is the first in a multipart tutorial to demonstrate the value of GPUs, and how running them on DC/OS makes your applications not just fast, but also efficient, reliable, and scalable.

  • Tutorial #1: Run a Tensor flow Docker image on your laptop and run a machine learning model with and without GPUs.

  • Tutorial #2: Run a Tensoflow Docker image on a DC/OS cluster with and without GPUs. See GPU isolation and Jupyter in action.

  • Tutorial #3: Deploy a dynamic, distributed TensorFlow on DC/OS from the Universe. See how TensorFlow on DC/OS dynamically consumes and releases resources on the cluster when done. Run multiple TensorFlows on the same cluster with different resource requirements.

Tensorflow Tutorial #1

In part one of this tutorial we’ll use TensorFlow to launch a convolutional neural network example on your local machine, then use nvidia-docker to accelerate the TensorFlow job using GPUs. Tensorflow is a machine intelligence library with architecture specially configured to leverage GPUs for speed and efficiency.

Watch a video version of this tutorial here.


Docker installed on your local machine.

Let’s get going!

First, we’ll get TensorFlow running without GPUs and time the result.

  1. On your local machine, launch TensorFlow without GPUs using Docker.

    docker run -it tensorflow/tensorflow bash
  2. We will need to download a machine learning model to test with Tensorflow. You can find some good examples on this github repo. We’ll install git using apt-get, then clone the examples repo.

    apt-get update; apt-get install -y git  git clone https://github.com/aymericdamien/TensorFlow-Examples
  3. Now that we have the the convolutional network example, let’s run it and time the execution.

    cd TensorFlow-Examples/examples/3_NeuralNetworks  time python convolutional_network.py

That took my machine about 3 minutes.

TensorFlow example with GPUs

Now, let’s see how GPUs compare.

Note: This part of the tutorial will not work on Mac OSX because nvidia-docker does not support OSX.

  1. Open a new tab to compare results with next steps with nvidia-docker.

  2. Make sure you have the latest CUDA drivers installed on your system. Find them at https://developer.nvidia.com/cuda-downloads. As of this writing, version 8.0 is the latest.

  3. Download and install nvidia-docker: https://github.com/NVIDIA/nvidia-docker.

  4. Verify your nvidia-docker installation.

    nvidia-docker run --rm nvidia/cuda nvidia-smi  nvidia-docker run --rm nvidia/cuda:7.5 nvidia-smi
  5. Launch TensorFlow with GPUs using nvidia-docker.

    nvidia-docker run -it tensorflow/tensorflow:latest-gpu bash
  6. Install git and download TensorFlow-Examples, as you did in the previous section.

     apt-get update; apt-get install -y git  git clone https://github.com/aymericdamien/TensorFlow-Examples
  7. Run the same convolutional network example as before.

    cd TensorFlow-Examples/examples/3_NeuralNetworks  time python convolutional_network.py

That probably took about 30 seconds: 10 times faster than using just CPUs. The benefits of using GPUs are clear.

What’s the effect of multiple GPUs?

  1. While we’re here, let’s run the multi-GPU example to see the performance difference if we use more than one GPU for a task.

    cd ../5_MultiGPU  time python multigpu_basics.py

Even faster!