Tensorflow
Last updated
Last updated
On this chapter we're going to learn about tensorflow, which is the goolge library for machine learning. In simple words it's a library for numerical computation that uses graphs, on this graph the nodes are the operations, while the edges of this graph are tensors. Just to remember tensors, are multidimensional matrices, that will flow on the tensorflow graphs.
After this computational graph is created it will create a session that can be executed by multiple CPUs, GPUs distributed or not. Here are the main components of tensorflow:
Variables: Retain values between sessions, use for weights/bias
Nodes: The operations
Tensors: Signals that pass from/to nodes
Placeholders: Used to send data between your program and the tensorflow graph
Session: Place when graph is executed.
The TensorFlow implementation translates the graph definition into executable operations distributed across available compute resources, such as the CPU or one of your computer's GPU cards. In general you do not have to specify CPUs or GPUs explicitly. TensorFlow uses your first GPU, if you have one, for as many operations as possible.
Your job as the "client" is to create symbolically this graph using code (C/C++ or python), and ask tensorflow to execute this graph. As you may imagine the tensorflow code for those "execution nodes" is some C/C++, CUDA high performance code. (Also difficult to understand).
For example, it is common to create a graph to represent and train a neural network in the construction phase, and then repeatedly execute a set of training ops in the graph in the execution phase.
If you have already a machine with python (anaconda 3.5) and the nvidia cuda drivers installed (7.5) install tensorflow is simple
If you still need to install some cuda drivers refer here for instructions
Just as a hello world let's build a graph that just multiply 2 numbers. Here notice some sections of the code.
Import tensorflow library
Build the graph
Create a session
Run the session
Also notice that on this example we're passing to our model some constant values so it's not so useful in real life.
Tensorflow allow exchanging data with your graph variables through "placeholders". Those placeholders can be assigned when we ask the session to run. Imagine placeholders as a way to send data to your graph when you run a session "session.run"
Now we're going to see how to create a linear regression system on tensorflow
With the graph built, our job is create a session to initialize all our graph variables, which in this case is our model parameters. Then we also need to call a session x times to train our model.
Is almost entirely up to you to load data on tensorflow, which means you need to parse the data yourself. For example one option for image classification could be to have text files with all the images filenames, followed by it's class. For example:
trainingFile.txt
A common API to load the data would be something like this.
Tensorflow offers a solution to help visualize what is happening on your graph. This tool is called Tensorboard, basically is a webpage where you can debug your graph, by inspecting it's variables, node connections etc...
In order to use tensorboard you need to annotate on your graph, with the variables that you want to inspect, ie: the loss value. Then you need to generate all the summaries, using the function tf.merge_all_summaries().
Optionally you can also use the function "tf.name_scope" to group nodes on the graph.
After all variables are annotated and you configure your summary, you can go to the console and call:
Considering the previous example here are the changes needed to add information to tensorboard.
1) First we annotate the information on the graph that you are interested to inspect building phase. Then call merge_all_summaries(). On our case we annotated loss (scalar) and W,b(histogram)
2) During our session creation we need to add a call to "tf.train.SummaryWriter" to create a writer. You need to pass a directory where tensorflow will save the summaries.
3) Then when we execute our graph, for example during training we can ask tensorflow to generate a summary. Of course calling this every time will impact performance. To manage this you could call this at the end of every epoch.
Bellow we can see how the loss evolved on each iteration.
Sometimes ipython hold versions of your graph that create problems when using tensorboard, one option is to restart the kernel, if you have problems.
Tensorflow also allows you to use GPUs to execute graphs or particular sections of your graph.
On common machine learning system you would have one multi-core CPU, with one or more GPUs, tensorflow represent them as follows
"/cpu:0": Multicore CPU
"/gpu0": First GPU
"/gpu1": Second GPU
Unfortunately tensorflow does not have an official function to list the devices available on your system, but there is an unofficial way.
Use the "with tf.device('/gpu:0')" statement on python to lock all nodes on this graph block to a particular gpu.
Now we will explain how training is one on a multiple GPU system.
Baiscally the steps for multiple gpu training is this:
Separate your training data in batches as usual
Create a copy of the model in each gpu
Distribute different batches for each gpu
Each gpu will forward the batch and calculate it's gradients
Each gpu will send the gradients to the cpu
The cpu will average each gradient, and do a gradient descent. The model parameters are updated with the gradients averaged across all model replicas.
The cpu will distribute the new model to all gpus
the process loop again until all training is done
Now we're going to implement a graph with a function , a loss function . The loss function will return a scalar value with the mean of all distances between our data, and the model prediction.
Here we can see our linear regression model as a computing graph.