# PyTorch

## Introduction

PyTorch is another deep learning library that's is actually a fork of Chainer(Deep learning library completely on python) with the capabilities of torch. Basically it's the facebook solution to merge torch with python.

### Some advantages

* Easy to Debug and understand the code
* Has as many type of layers as Torch (Unpool, CONV 1,2,3D, LSTM, Grus)
* Lot's of loss functions
* Can be considered as a Numpy extension to GPUs
* Faster than others "define-by-run" libraries, like chainer and dynet
* Allow to build networks which structure is dependent on the computation itself (Useful on reinforcement learning)

### PyTorch Components

| Package               | Description                                            |
| --------------------- | ------------------------------------------------------ |
| torch                 | Numpy like library with GPU support                    |
| torch.autograd        | Give differentiation support for all torch ops         |
| torch.nn              | Neural network library integrated with autograd        |
| torch.optim           | Optimization for torch.nn (ADAM, SGD, RMSPROP, etc...) |
| torch.multiprocessing | Memory sharing between tensors                         |
| torch.utils           | DataLoader, Training and other utility functions       |
| torch.legacy          | Old code ported from Torch                             |

### How it differs from Tensorflow/Theano

The major difference from Tensorflow is that PyTorch methodology is considered "define-by-run" while Tensorflow is considered "defined-and-run", so on PyTorch you can for instance change your model on run-time, debug easily with any python debugger, while tensorflow has always a graph definition/build. You can consider tensorflow as a more production tool while PyTorch is more a research tool.

### The Basics:

Here we will see how to create tensors, and do some manipulation:

```python
import torch
import numpy as np

# Create a tensor on torch
a = torch.rand(3, 3)

# Create a matrix on numpy and conver to PyTorch
b_npy = np.array([[1,2,3],[4,5,6],[7,8,9]])
# Convert from numpy to torch
b = torch.from_numpy(b_npy)

print(a)
print(b)

# Get a specific element
print(b[1,1])

# Get a range of elements
print(b[1:None,1:None])

# Set elements on array
a[1:None,1:None] = 0
print(a)
```

### ![](https://2109831662-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LvMRntv-nKvtl7WOpCz%2F-LvMRp9FltcwEeVxPYFs%2F-LvMRsYkkAInXYLH1a1h%2FResultPyTorch.png?generation=1575572714666255\&alt=media)

Create tensors filled with some value

```python
import torch

a = torch.ones(2,3)
b = torch.zeros(3,2)
print(a)
print(b)
```

![](https://2109831662-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LvMRntv-nKvtl7WOpCz%2F-LvMRp9FltcwEeVxPYFs%2F-LvMRsYqZT0-FzpNa7rw%2FOnesZeros.png?generation=1575572715064684\&alt=media)

Now we will do some computation on the GPU

```python
import torch
import numpy as np

# Define tensors on the GPU
a = torch.rand(2, 3).cuda()
b = torch.rand(2, 3).cuda()

# Define some operation (will execute on the GPU)
c = (a + b) * 2

# Print "c" contents and shape(size)
print(c)
print(c.size())
```

### ![](https://2109831662-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LvMRntv-nKvtl7WOpCz%2F-LvMRp9FltcwEeVxPYFs%2F-LvMRsYsbhQbpB4GNiTB%2FResultGPU.png?generation=1575572713587789\&alt=media)

### Autograd and variables

The Autograd on PyTorch is the component responsible to do the backpropagation, as on Tensorflow you only need to define the forward propagation. PyTorch autograd looks a lot like TensorFlow: in both frameworks we define a computational graph, and use automatic differentiation to compute gradients.

We just need to wrap tensors with Variable objects, a Variable represents a node in a computational graph. They are not like tensorflow placeholders, on PyTorch you place the values directly on the model. Again to include a tensor on the graph wrap it with a variable.

Consider the following simple graph:

![](https://2109831662-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LvMRntv-nKvtl7WOpCz%2F-LvMRp9FltcwEeVxPYFs%2F-LvMRsYu0RznP3m174k4%2FPyTorchSimpleGraphSmall.png?generation=1575572715948023\&alt=media)

```python
import torch
from torch.autograd import Variable

# Define scalar a=2, b=3
a = Variable(torch.ones(1, 1) * 2, requires_grad=True)
b = Variable(torch.ones(1, 1) * 3, requires_grad=True)
c = Variable(torch.ones(1, 1) * 4, requires_grad=True)

# Define the function "out" having 2 parameters a,b
out = (a*b)+c
#c = torch.mul(a,b)+c
print('Value out:',out)

# Do the backpropagation
out.backward()

# Get dout/da (Derivative of out w.r.t to a)
print('Derivative of out w.r.t to a:',a.grad)
print('Derivative of out w.r.t to b:',b.grad)
print('Derivative of out w.r.t to c:',c.grad)
```

![](https://2109831662-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LvMRntv-nKvtl7WOpCz%2F-LvMRp9FltcwEeVxPYFs%2F-LvMRsYwngAPRKrKz9Io%2FAutoGradPytorch.png?generation=1575572713998126\&alt=media)

### Complete example

Here we mix the concepts and show how to train a MNIST dataset using CNN

```python
# Import libraries
import torch
from torch.autograd import Variable
import torchvision.datasets as dsets
import torchvision.transforms as transforms
import torch.nn as nn
import torch.nn.functional as F

# Hyper Parameters
num_epochs = 5
batch_size = 50
learning_rate = 0.001

# MNIST Dataset
train_dataset = dsets.MNIST(root='../data/',
                            train=True, 
                            transform=transforms.ToTensor(),
                            download=True)

test_dataset = dsets.MNIST(root='../data/',
                           train=False, 
                           transform=transforms.ToTensor())


# Data Loader (Input Pipeline)
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                           batch_size=batch_size, 
                                           shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=batch_size, 
                                          shuffle=False)


# CNN Model (2 conv layer) nn.Module is the base class to all neural networks
class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.layer1 = nn.Sequential(
            nn.Conv2d(1, 16, kernel_size=5, padding=2),
            nn.BatchNorm2d(16),
            nn.ReLU(),
            nn.MaxPool2d(2))
        self.layer2 = nn.Sequential(
            nn.Conv2d(16, 32, kernel_size=5, padding=2),
            nn.BatchNorm2d(32),
            nn.ReLU(),
            nn.MaxPool2d(2))
        self.fc = nn.Linear(7*7*32, 10)

    def forward(self, x):
        out = self.layer1(x)
        out = self.layer2(out)
        out = out.view(out.size(0), -1)
        out = self.fc(out)
        return out

cnn = CNN()
cnn.cuda()
print(cnn)

# Loss and Optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(cnn.parameters(), lr=learning_rate)

# Train the Model
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        images = Variable(images)
        labels = Variable(labels)

        images, labels = images.cuda(), labels.cuda()

        # Forward + Backward + Optimize
        optimizer.zero_grad()
        outputs = cnn(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        if (i+1) % 500 == 0:
            print ('Epoch [%d/%d], Iter [%d/%d] Loss: %.4f' 
                   %(epoch+1, num_epochs, i+1, len(train_dataset)//batch_size, loss.data[0]))


# Test the Model
cnn.eval()  # Change model to 'eval' mode (BN uses moving mean/var).
correct = 0
total = 0
for images, labels in test_loader:
    images = Variable(images)
    images, labels = images.cuda(), labels.cuda()
    outputs = cnn(images)
    _, predicted = torch.max(outputs.data, 1)
    total += labels.size(0)
    correct += (predicted == labels).sum()

print('Test Accuracy of the model on the 10000 test images: %d %%' % (100 * correct / total))

# Save the Trained Model
torch.save(cnn.state_dict(), 'cnn.pkl')
```

### References:

* <http://pytorch.org/>
* <http://pytorch.org/docs/index.html>
* <https://hackernoon.com/how-is-pytorch-different-from-tensorflow-2c90f44747d6>
* <https://blog.paperspace.com/adversarial-autoencoders-with-pytorch/>
* <https://devblogs.nvidia.com/parallelforall/recursive-neural-networks-pytorch/>
* <http://blog.outcome.io/pytorch-quick-start-classifying-an-image/>
* <https://github.com/ritchieng/the-incredible-pytorch>
* <https://www.youtube.com/watch?v=nbJ-2G2GXL0>
* <https://www.youtube.com/watch?v=4RzoFWre44Y&t=7s>
* <https://www.youtube.com/watch?v=hiIqRUseouQ>
* <https://github.com/PythonWorkshop/Intro-to-TensorFlow-and-PyTorch/blob/master/PyTorch%20Tutorial.ipynb>
* <https://github.com/PythonWorkshop/Intro-to-TensorFlow-and-PyTorch>
* <https://github.com/pytorch/examples>
* <http://pytorch.org/tutorials/>
* <http://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html>
* <http://www.cs.toronto.edu/~rgrosse/courses/csc321_2017/tutorials/tut4.pdf>
* <http://www.cs.toronto.edu/~rgrosse/courses/csc321_2017/slides/lec1.pdf>
* <https://discuss.pytorch.org/t/understanding-loss-function-gradients/771/5>
* <https://discuss.pytorch.org/t/visual-watcher-when-training-evaluating-or-tensorboard-equivalence/146/8>
* <https://github.com/jcjohnson/pytorch-examples>
* <https://discuss.pytorch.org/t/print-autograd-graph/692/8>
* <https://discuss.pytorch.org/t/print-autograd-graph/692>
* <https://github.com/szagoruyko/functional-zoo/blob/master/visualize.py>
* <https://github.com/szagoruyko/functional-zoo/blob/master/resnet-18-export.ipynb>
* <https://www.safaribooksonline.com/library/view/strata-hadoop/9781491976166/video302404.html>
* <https://discuss.pytorch.org/t/build-your-own-loss-function-in-pytorch/235>
* <http://pytorch.org/docs/notes/extending.html#extending-torch-autograd>
* <http://blog.gaurav.im/2017/04/24/a-gentle-intro-to-pytorch/>
* <https://stackoverflow.com/questions/41924453/pytorch-how-to-use-dataloaders-for-custom-datasets>
* <https://discuss.pytorch.org/t/saving-and-loading-a-model-in-pytorch/2610/3>
* <https://discuss.pytorch.org/t/load-a-saved-model/109>
* <https://discuss.pytorch.org/t/saving-torch-models/838>
* <https://discuss.pytorch.org/t/saving-custom-models/621/4>
* <https://github.com/jcjohnson/pytorch-examples>
* <https://discuss.pytorch.org/t/build-your-own-loss-function-in-pytorch/235/19>
* <https://github.com/pytorch/examples/blob/master/imagenet/main.py>
* <https://github.com/pytorch/examples/blob/master/mnist/main.py>
* <https://discuss.pytorch.org/t/discussion-about-datasets-and-dataloaders/296>
* <https://www.kaggle.com/mratsim/starting-kit-for-pytorch-deep-learning>
* <https://iamtrask.github.io/2017/01/15/pytorch-tutorial/>
