On this chapter we're going to learn about tensorflow, which is the goolge library for machine learning. In simple words it's a library for numerical computation that uses graphs, on this graph the nodes are the operations, while the edges of this graph are tensors. Just to remember tensors, are multidimensional matrices, that will flow on the tensorflow graphs.

After this computational graph is created it will create a session that can be executed by multiple CPUs, GPUs distributed or not. Here are the main components of tensorflow:

Variables: Retain values between sessions, use for weights/bias

Nodes: The operations

Tensors: Signals that pass from/to nodes

Placeholders: Used to send data between your program and the tensorflow graph

Session: Place when graph is executed.

The TensorFlow implementation translates the graph definition into executable operations distributed across available compute resources, such as the CPU or one of your computer's GPU cards. In general you do not have to specify CPUs or GPUs explicitly. TensorFlow uses your first GPU, if you have one, for as many operations as possible.

Your job as the "client" is to create symbolically this graph using code (C/C++ or python), and ask tensorflow to execute this graph. As you may imagine the tensorflow code for those "execution nodes" is some C/C++, CUDA high performance code. (Also difficult to understand).

For example, it is common to create a graph to represent and train a neural network in the construction phase, and then repeatedly execute a set of training ops in the graph in the execution phase.

If you have already a machine with python (anaconda 3.5) and the nvidia cuda drivers installed (7.5) install tensorflow is simple

export TF_BINARY_URL=https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.10.0rc0-cp35-cp35m-linux_x86_64.whlsudo pip3 install --ignore-installed --upgrade $TF_BINARY_URL

If you still need to install some cuda drivers refer here for instructions

Just as a hello world let's build a graph that just multiply 2 numbers. Here notice some sections of the code.

Import tensorflow library

Build the graph

Create a session

Run the session

Also notice that on this example we're passing to our model some constant values so it's not so useful in real life.

Tensorflow allow exchanging data with your graph variables through "placeholders". Those placeholders can be assigned when we ask the session to run. Imagine placeholders as a way to send data to your graph when you run a session "session.run"

# Import tensorflowimport tensorflow as tfâ€‹# Build grapha = tf.placeholder('float')b = tf.placeholder('float')â€‹# Graphy = tf.mul(a,b)â€‹# Create session passing the graphsession = tf.Session()# Put the values 3,4 on the placeholders a,bprint session.run(y,feed_dict={a: 3, b:4})

Now we're going to see how to create a linear regression system on tensorflow

# Import libraries (Numpy, Tensorflow, matplotlib)import numpy as npimport tensorflow as tfimport matplotlib.pyplot as pltget_ipython().magic(u'matplotlib inline')â€‹# Create 100 points following a function y=0.1 * x + 0.3 with some normal random distributionnum_points = 100vectors_set = []for i in xrange(num_points):x1 = np.random.normal(0.0, 0.55)y1 = x1 * 0.1 + 0.3 + np.random.normal(0.0, 0.03)vectors_set.append([x1, y1])â€‹x_data = [v[0] for v in vectors_set]y_data = [v[1] for v in vectors_set]â€‹# Plot dataplt.plot(x_data, y_data, 'r*', label='Original data')plt.legend()plt.show()

Now we're going to implement a graph with a function $y=W*x_{data}+b$, a loss function $loss = mean[(y-y_{data})^2]$. The loss function will return a scalar value with the mean of all distances between our data, and the model prediction.

# Create our linear regression model# Variables resides internally inside the graph memoryW = tf.Variable(tf.random_uniform([1], -1.0, 1.0))b = tf.Variable(tf.zeros([1.0]))y = W * x_data + bâ€‹# Define a loss function that take into account the distance between# the prediction and our datasetloss = tf.reduce_mean(tf.square(y-y_data))â€‹# Create an optimizer for our loss function (With gradient descent)optimizer = tf.train.GradientDescentOptimizer(0.5)train = optimizer.minimize(loss)

With the graph built, our job is create a session to initialize all our graph variables, which in this case is our model parameters. Then we also need to call a session x times to train our model.

# Run session# Initialize all graph variablesinit = tf.initialize_all_variables()# Create a session and initialize the graph variables (Will acutally run now...)session = tf.Session()session.run(init)â€‹# Train on 8 stepsfor step in xrange(8):# Optimize one stepsession.run(train)# Get access to graph variables(just read) with session.run(varName)print("Step=%d, loss=%f, [W=%f b=%f]") % (step,session.run(loss),session.run(W),session.run(b))â€‹# Just plot the set of weights and bias with less loss (last)plt.plot(x_data, y_data, 'ro')plt.plot(x_data, session.run(W) * x_data + session.run(b))plt.xlabel('x')plt.ylabel('y')plt.legend()plt.show()â€‹# Close the Session when we're done.session.close()

Is almost entirely up to you to load data on tensorflow, which means you need to parse the data yourself. For example one option for image classification could be to have text files with all the images filenames, followed by it's class. For example:

trainingFile.txt

image1.png 0image2.png 0image3.png 1image4.png 1image5.png 2image6.png 2

A common API to load the data would be something like this.

train_data, train_label = getDataFromFile('trainingFile.txt')val_data, val_label = getDataFromFile('validationFile.txt')â€‹## Give to your graph through placeholders...

Tensorflow offers a solution to help visualize what is happening on your graph. This tool is called Tensorboard, basically is a webpage where you can debug your graph, by inspecting it's variables, node connections etc...

In order to use tensorboard you need to annotate on your graph, with the variables that you want to inspect, ie: the loss value. Then you need to generate all the summaries, using the function tf.merge_all_summaries().

Optionally you can also use the function "tf.name_scope" to group nodes on the graph.

After all variables are annotated and you configure your summary, you can go to the console and call:

tensorboard --logdir=/home/leo/test

Considering the previous example here are the changes needed to add information to tensorboard.

1) First we annotate the information on the graph that you are interested to inspect building phase. Then call merge_all_summaries(). On our case we annotated loss (scalar) and W,b(histogram)

# Create our linear regression model# Variables resides internally inside the graph memoryâ€‹#tf.name_scope organize things on the tensorboard graphviewwith tf.name_scope("LinearReg") as scope:W = tf.Variable(tf.random_uniform([1], -1.0, 1.0), name="Weights")b = tf.Variable(tf.zeros([1.0]), name="Bias")y = W * x_data + bâ€‹# Define a loss function that take into account the distance between# the prediction and our datasetwith tf.name_scope("LossFunc") as scope:loss = tf.reduce_mean(tf.square(y-y_data))â€‹# Create an optimizer for our loss functionoptimizer = tf.train.GradientDescentOptimizer(0.5)train = optimizer.minimize(loss)â€‹#### Tensorboard stuff# Annotate loss, weights and bias (Needed for tensorboard)loss_summary = tf.scalar_summary("loss", loss)w_h = tf.histogram_summary("W", W)b_h = tf.histogram_summary("b", b)â€‹# Merge all the summariesmerged_op = tf.merge_all_summaries()

2) During our session creation we need to add a call to "tf.train.SummaryWriter" to create a writer. You need to pass a directory where tensorflow will save the summaries.

# Initialize all graph variablesinit = tf.initialize_all_variables()â€‹# Create a session and initialize the graph variables (Will acutally run now...)session = tf.Session()session.run(init)â€‹# Writer for tensorboard (Directory)writer_tensorboard = tf.train.SummaryWriter('/home/leo/test', session.graph_def)

3) Then when we execute our graph, for example during training we can ask tensorflow to generate a summary. Of course calling this every time will impact performance. To manage this you could call this at the end of every epoch.

for step in xrange(1000):# Optimize one stepsession.run(train)â€‹# Add summary (Everytime could be to much....)result_summary = session.run(merged_op)writer_tensorboard.add_summary(result_summary, step)

Here we can see our linear regression model as a computing graph. â€‹

Bellow we can see how the loss evolved on each iteration.

Sometimes ipython hold versions of your graph that create problems when using tensorboard, one option is to restart the kernel, if you have problems.

Tensorflow also allows you to use GPUs to execute graphs or particular sections of your graph.

On common machine learning system you would have one multi-core CPU, with one or more GPUs, tensorflow represent them as follows

"/cpu:0": Multicore CPU

"/gpu0": First GPU

"/gpu1": Second GPU

Unfortunately tensorflow does not have an official function to list the devices available on your system, but there is an unofficial way.

from tensorflow.python.client import device_libdef get_devices_available():local_device_protos = device_lib.list_local_devices()return [x.name for x in local_device_protos]

print(get_devices_available())

['/cpu:0', '/gpu:0', '/gpu:1']

Use the "with tf.device('/gpu:0')" statement on python to lock all nodes on this graph block to a particular gpu.

import tensorflow as tfâ€‹# Creates a graph.with tf.device('/gpu:0'):a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')c = tf.matmul(a, b)â€‹# Creates a session with log_device_placement set to True, this will dump# on the log how tensorflow is mapprint the operations on devicessess = tf.Session(config=tf.ConfigProto(log_device_placement=True))# Runs the op.print(sess.run(c))sess.close()

[[ 22. 28.][ 49. 64.]]

Now we will explain how training is one on a multiple GPU system.

Baiscally the steps for multiple gpu training is this:

Separate your training data in batches as usual

Create a copy of the model in each gpu

Distribute different batches for each gpu

Each gpu will forward the batch and calculate it's gradients

Each gpu will send the gradients to the cpu

The cpu will average each gradient, and do a gradient descent. The model parameters are updated with the gradients averaged across all model replicas.

The cpu will distribute the new model to all gpus

the process loop again until all training is done