PyTorch is another deep learning library that's is actually a fork of Chainer(Deep learning library completely on python) with the capabilities of torch. Basically it's the facebook solution to merge torch with python.
Some advantages
Easy to Debug and understand the code
Has as many type of layers as Torch (Unpool, CONV 1,2,3D, LSTM, Grus)
Lot's of loss functions
Can be considered as a Numpy extension to GPUs
Faster than others "define-by-run" libraries, like chainer and dynet
Allow to build networks which structure is dependent on the computation itself (Useful on reinforcement learning)
PyTorch Components
How it differs from Tensorflow/Theano
The major difference from Tensorflow is that PyTorch methodology is considered "define-by-run" while Tensorflow is considered "defined-and-run", so on PyTorch you can for instance change your model on run-time, debug easily with any python debugger, while tensorflow has always a graph definition/build. You can consider tensorflow as a more production tool while PyTorch is more a research tool.
The Basics:
Here we will see how to create tensors, and do some manipulation:
import torchimport numpy as np# Create a tensor on torcha = torch.rand(3, 3)# Create a matrix on numpy and conver to PyTorchb_npy = np.array([[1,2,3],[4,5,6],[7,8,9]])# Convert from numpy to torchb = torch.from_numpy(b_npy)print(a)print(b)# Get a specific elementprint(b[1,1])# Get a range of elementsprint(b[1:None,1:None])# Set elements on arraya[1:None,1:None]=0print(a)
import torchimport numpy as np# Define tensors on the GPUa = torch.rand(2, 3).cuda()b = torch.rand(2, 3).cuda()# Define some operation (will execute on the GPU)c = (a + b) *2# Print "c" contents and shape(size)print(c)print(c.size())
Autograd and variables
The Autograd on PyTorch is the component responsible to do the backpropagation, as on Tensorflow you only need to define the forward propagation. PyTorch autograd looks a lot like TensorFlow: in both frameworks we define a computational graph, and use automatic differentiation to compute gradients.
We just need to wrap tensors with Variable objects, a Variable represents a node in a computational graph. They are not like tensorflow placeholders, on PyTorch you place the values directly on the model. Again to include a tensor on the graph wrap it with a variable.
Consider the following simple graph:
import torchfrom torch.autograd import Variable# Define scalar a=2, b=3a =Variable(torch.ones(1, 1) *2, requires_grad=True)b =Variable(torch.ones(1, 1) *3, requires_grad=True)c =Variable(torch.ones(1, 1) *4, requires_grad=True)# Define the function "out" having 2 parameters a,bout = (a*b)+c#c = torch.mul(a,b)+cprint('Value out:',out)# Do the backpropagationout.backward()# Get dout/da (Derivative of out w.r.t to a)print('Derivative of out w.r.t to a:',a.grad)print('Derivative of out w.r.t to b:',b.grad)print('Derivative of out w.r.t to c:',c.grad)
Complete example
Here we mix the concepts and show how to train a MNIST dataset using CNN
# Import librariesimport torchfrom torch.autograd import Variableimport torchvision.datasets as dsetsimport torchvision.transforms as transformsimport torch.nn as nnimport torch.nn.functional as F# Hyper Parametersnum_epochs =5batch_size =50learning_rate =0.001# MNIST Datasettrain_dataset = dsets.MNIST(root='../data/', train=True, transform=transforms.ToTensor(), download=True)test_dataset = dsets.MNIST(root='../data/', train=False, transform=transforms.ToTensor())# Data Loader (Input Pipeline)train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=True)test_loader = torch.utils.data.DataLoader(dataset=test_dataset, batch_size=batch_size, shuffle=False)# CNN Model (2 conv layer) nn.Module is the base class to all neural networksclassCNN(nn.Module):def__init__(self):super(CNN, self).__init__() self.layer1 = nn.Sequential( nn.Conv2d(1, 16, kernel_size=5, padding=2), nn.BatchNorm2d(16), nn.ReLU(), nn.MaxPool2d(2)) self.layer2 = nn.Sequential( nn.Conv2d(16, 32, kernel_size=5, padding=2), nn.BatchNorm2d(32), nn.ReLU(), nn.MaxPool2d(2)) self.fc = nn.Linear(7*7*32, 10)defforward(self,x): out = self.layer1(x) out = self.layer2(out) out = out.view(out.size(0), -1) out = self.fc(out)return outcnn =CNN()cnn.cuda()print(cnn)# Loss and Optimizercriterion = nn.CrossEntropyLoss()optimizer = torch.optim.Adam(cnn.parameters(), lr=learning_rate)# Train the Modelfor epoch inrange(num_epochs):for i, (images, labels) inenumerate(train_loader): images =Variable(images) labels =Variable(labels) images, labels = images.cuda(), labels.cuda()# Forward + Backward + Optimize optimizer.zero_grad() outputs =cnn(images) loss =criterion(outputs, labels) loss.backward() optimizer.step()if (i+1) %500==0:print ('Epoch [%d/%d], Iter [%d/%d] Loss: %.4f'%(epoch+1, num_epochs, i+1, len(train_dataset)//batch_size, loss.data[0]))# Test the Modelcnn.eval()# Change model to 'eval' mode (BN uses moving mean/var).correct =0total =0for images, labels in test_loader: images =Variable(images) images, labels = images.cuda(), labels.cuda() outputs =cnn(images) _, predicted = torch.max(outputs.data, 1) total += labels.size(0) correct += (predicted == labels).sum()print('Test Accuracy of the model on the 10000 test images: %d%%'% (100* correct / total))# Save the Trained Modeltorch.save(cnn.state_dict(), 'cnn.pkl')