Artificial Inteligence
  • Preface
  • Introduction
  • Machine Learning
    • Linear Algebra
    • Supervised Learning
      • Neural Networks
      • Linear Classification
      • Loss Function
      • Model Optimization
      • Backpropagation
      • Feature Scaling
      • Model Initialization
      • Recurrent Neural Networks
        • Machine Translation Using RNN
    • Deep Learning
      • Convolution
      • Convolutional Neural Networks
      • Fully Connected Layer
      • Relu Layer
      • Dropout Layer
      • Convolution Layer
        • Making faster
      • Pooling Layer
      • Batch Norm layer
      • Model Solver
      • Object Localization and Detection
      • Single Shot Detectors
        • Yolo
        • SSD
      • Image Segmentation
      • GoogleNet
      • Residual Net
      • Deep Learning Libraries
    • Unsupervised Learning
      • Principal Component Analysis
      • Generative Models
    • Distributed Learning
    • Methodology for usage
      • Imbalanced/Missing Datasets
  • Artificial Intelligence
    • OpenAI Gym
    • Tree Search
    • Markov Decision process
    • Reinforcement Learning
      • Q_Learning_Simple
      • Deep Q Learning
      • Deep Reinforcement Learning
    • Natural Language Processing
      • Word2Vec
  • Appendix
    • Statistics and Probability
      • Probability
        • Markov Chains
        • Random Walk
    • Lua and Torch
    • Tensorflow
      • Multi Layer Perceptron MNIST
      • Convolution Neural Network MNIST
      • SkFlow
    • PyTorch
      • Transfer Learning
      • DataLoader and DataSets
      • Visualizing Results
Powered by GitBook
On this page
  • Main idea
  • Model Yolo:
  • What this 7x7 tensor represents
  • Filtering results
  • Training phase
  • Pre-train
  • Other details
  • Loss Function
  • Intersect over Union (IoU)
  • Non-Maxima Suppression (nms)
  • Yolo v2
  • References:

Was this helpful?

  1. Machine Learning
  2. Deep Learning
  3. Single Shot Detectors

Yolo

PreviousSingle Shot DetectorsNextSSD

Last updated 5 years ago

Was this helpful?

This detector is a little bit less precise (Improved on v2) but it is a really fast detector, this chapter will try to explain how it works and also give a reference working code in tensorflow.

Main idea

The idea of this detector is that you run the image on a CNN model and get the detection on a single pass. First the image is resized to 448x448, then fed to the network and finally the output is filtered by a Non-max suppression algorithm.

Model Yolo:

The tiny version is composed with 9 convolution layers with leaky relu activations. Observe that after maxpool6 the 448x448 input image becomes a 7x7 image.

  • 2 Box definitions: (consisting of: x,y,width,height,"is object" confidence)

  • 20 class probabilities (only considered if the "is object" confidence is high)

Tensor=S.S.(B.5+C)Tensor=S.S.(B.5+C)Tensor=S.S.(B.5+C)

Where:

  • S: Tensor spatial dimension (7 on this case)

  • B: Number of bounding boxes (x,y,w,h,confidence)

  • C: Number of classes

  • confidence=Pobject.IoU(pred,gt)confidence=P_{object}.IoU(pred,gt)confidence=Pobject​.IoU(pred,gt)

Here "is object" or PobjectP_{object}Pobject​ is the probability that a box contains any object (or it is background), if during training a particular cell is not over some object we set "is object" to zero.

What this 7x7 tensor represents

This 7x7 tensor can be considered as a 7x7 grid representing the input image, where each cell of this tensor will hold the 2 box definitions and 20 class probabilities.

Here it's also useful to say that each cell has the probability to be one of the 20 classes. (And each cell has 2 bounding box)

Notice that this information with the fact that each bounding box has the information if it's below an object or not will help to detect the class of the object.

The logic is that if there was an object on that cell, we define which object by using the biggest class probability value from that cell.

Filtering results

At the end of the model at prediction time you will have something like this:

Training phase

Steps:

  • Look which cell is near the center of the bounding box of the Ground truth. (Matching phase)

  • Check from a particular cell which of it's bounding boxes overlaps more with the ground truth (IoU), then decrease the confidence of the bounding box that overlap less. (Each bounding box has it's on confidence)

  • Decrease the confidence of all bounding boxes from each cell that has no object. Also don't adjust the box coordinates or class probabilities from those cells.

  • Decrease the bounding boxes confidence of the cells that don't contain any object.

Pre-train

The paper mentioned that before training for object detection, they modified the network (Add a Average pooling, FC and Softmax) layers and train for classification on the Imagenet Dataset for one week. (Until they got a good top 5 error). Later they add more conv layers and the FC layer responsible for detection.

Other details

  • Pre-trained on Imagenet

  • Use lot's of augmentation

  • Use SGD to train

  • Evaluated on Pascal VOC

  • 135 Epochs, batch size: 64

  • Momentum 0.9

  • Random scale and translations up to 20% size of original image

  • Color exposure/saturation augmentation

Loss Function

Here is the multi-part loss function that we want to optimize. This loss function take into account the following objectives:

  • Classification (20 classes)

  • Object/No object classification

  • Bounding box coordinates (x,y,height,width) regression (4 scalars)

Each of this sub objectives use a sum-squared error, also a factor λcoord=5.0\lambda_{coord}=5.0λcoord​=5.0 and λnoobj=0.5\lambda_{noobj}=0.5λnoobj​=0.5 are used to unbalance the box coordinates and the classification objectives.

Some other points to observe:

  • The classification loss is not back propagated if the cell has no object

  • The bounding box loss with highest IOU (Intersect over union) with the ground truth is backpropagated

  • B: Number of bounding boxes (2)

  • xi,yi,wi,hix_{i}, y_{i}, w_{i}, h_{i}xi​,yi​,wi​,hi​ Box definition

  • CiC_{i}Ci​ Some particular class i

  • S: Grid size (7)

  • ⇑iobj\Uparrow_{i}^{obj}⇑iobj​: If object appear on the cell i, if does not appear it will be zero

  • ⇑ijobj\Uparrow_{ij}^{obj}⇑ijobj​: Bounding box j, from cell i responsible for prediction

Intersect over Union (IoU)

It's a method used to evaluate how well an object detection output is related to some ground truth, the IoU is normally used during training and testing by comparing how the bounding box given during prediction overlap with the ground truth (training/test data) bounding box.

Calculating the IoU is simple we basically divide the overlap area between the boxes by the union of those areas.

# Calculate Intersect over usion between boxes b1 and b2, here each box is defined with 2 points
# box(startX, startY, endX, endY), there are other definitions ie box(x,y,width,height)
def calc_iou(b1, b2):
 # determine the (x, y)-coordinates of the intersection rectangle
 xA = max(b1[0], b2[0])
 yA = max(b1[1], b2[1])
 xB = min(b1[2], b2[2])
 yB = min(b1[3], b2[3])

 # compute the area of intersection rectangle
 area_intersect = (xB - xA + 1) * (yB - yA + 1)

 # Calculate area of boxes
 area_b1 = (b1[2] - b1[0] + 1) * (b1[3] - b1[1] + 1)
 area_b2 = (b2[2] - b2[0] + 1) * (b2[3] - b2[1] + 1)

 # compute the intersection over union by taking the intersection
 # area and dividing it by the sum of prediction + ground-truth
 # areas - the intersection area
 iou = area_intersect / float(area_b1 + area_b2 - area_intersect)

 # return the intersection over union value
 return iou

Another way to calculate the IoU with numpy

import numpy as np

def calc_iou(xy_min1, xy_max1, xy_min2, xy_max2):
    # Get areas
    areas_1 = np.multiply.reduce(xy_max1 - xy_min1)
    areas_2 = np.multiply.reduce(xy_max2 - xy_min2)

    # determine the (x, y)-coordinates of the intersection rectangle
    _xy_min = np.maximum(xy_min1, xy_min2) 
    _xy_max = np.minimum(xy_max1, xy_max2)
    _wh = np.maximum(_xy_max - _xy_min, 0)

    # compute the area of intersection rectangle
    _areas = np.multiply.reduce(_wh)

    # return the intersection over union value
    return _areas / np.maximum(areas_1 + areas_2 - _areas, 1e-10)

Non-Maxima Suppression (nms)

During prediction time (after training) you may have lot's of box predictions around a single object the nms algorithm will filter out those boxes that overlap between each other and also some threshold.

Here we have a example with numpy and python

def non_max_suppress(conf, xy_min, xy_max, threshold=.4):
    _, _, classes = conf.shape
    # List Comprehension
    # https://www.youtube.com/watch?v=HobjHIpLhZk
    # https://www.youtube.com/watch?v=Q7EYKuZJfdA
    boxes = [(_conf, _xy_min, _xy_max) for _conf, _xy_min, _xy_max in zip(conf.reshape(-1, classes), xy_min.reshape(-1, 2), xy_max.reshape(-1, 2))]

    # Iterate each class
    for c in range(classes):
        # Sort boxes
        boxes.sort(key=lambda box: box[0][c], reverse=True)
        # Iterate each box
        for i in range(len(boxes) - 1):
            box = boxes[i]
            if box[0][c] == 0:
                continue
            for _box in boxes[i + 1:]:
                # Take iou threshold into account
                if calc_iou(box[1], box[2], _box[1], _box[2]) >= threshold:
                    _box[0][c] = 0
    return boxes

Yolo v2

The Yolo detector has been improved recently, to list their main improvements:

  • Faster

  • More Accurate (73.4 mAP(Mean average precision over all classes) on Pascal dataset)

  • Can detect up to 9000 classes (Before was 20)

What they did to improve:

  • Added Batchnorm

  • Pre-train on imagenet at multiple scales (224x224) then (448x448), then only after they train for detection.

  • Now they use anchor boxes like Faster-RCNN , the classification is done per-box shape, instead of per each grid-cell

  • Instead of manually choose the box shape, they use K-means to get a box shape based on data

  • Train the network at multiple scales, as the network is now Fully Convolutional (NO FC layer) this is easy to do.

  • They train on both Image-net and MS-COCO

  • They create a new mechanism to train on datasets that don't have detection data. By selecting on the multi-part loss function what to propagate.

  • Use WordTree to combine data from various sources and our joint optimization technique to train simultaneously on ImageNet and COCO.

References:

The output of this model is a tensor batch size 7x7x30. In this tensor the following information is encoded:

Finally by using thresholding and non-maxima suppression we can filter out boxes that are not valid detections.

where

https://www.youtube.com/watch?v=NM6lrxy0bxs&t=617s
http://www.pyimagesearch.com/2015/02/16/faster-non-maximum-suppression-python/
https://github.com/leonardoaraujosantos/DeepLearningFramework/files/704078/1506.02640v5.pdf
https://github.com/leonardoaraujosantos/DeepLearningFramework/files/703055/1612.08242v1.pdf
https://github.com/pjreddie/darknet/blob/master/cfg/tiny-yolo.cfg
https://docs.google.com/presentation/d/1aeRvtKG21KHdD5lg6Hgyhx5rPq_ZOsGjG5rJ1HP7BbA/pub?start=false&loop=false&delayms=3000&slide=id.g137784ab86_4_1598
https://en.wikipedia.org/wiki/Jaccard_index
http://www.pyimagesearch.com/2016/11/07/intersection-over-union-iou-for-object-detection/
http://www.pyimagesearch.com/2014/11/17/non-maximum-suppression-object-detection-python/
http://vision.stanford.edu/teaching/cs231b_spring1415/syllabus.html
https://github.com/ruiminshen/yolo-tf
https://github.com/dshahrokhian/YOLO_tensorflow
https://github.com/hizhangp/yolo_tensorflow
https://github.com/nilboy/tensorflow-yolo
https://www.tensorflow.org/versions/r0.12/api_docs/python/image/working_with_bounding_boxes
https://www.tensorflow.org/api_docs/python/tf/image/non_max_suppression
http://www.pyimagesearch.com/2014/11/10/histogram-oriented-gradients-object-detection/
http://stackoverflow.com/questions/42879109/tensorflow-non-maximum-suppression
http://silverpond.com.au/2016/10/24/pedestrian-detection-using-tensorflow-and-inception.html
https://github.com/jrosebr1/imutils/blob/master/imutils/object_detection.py
https://pjreddie.com/darknet/yolov1/
https://github.com/DrewNF/Tensorflow_Object_Tracking_Video
https://github.com/moontree/yolo_tensorflow
https://github.com/subodh-malgonde/yolo
https://github.com/gliese581gg/YOLO_tensorflow
https://github.com/aleju/papers/blob/master/neural-nets/YOLO9000.md
https://github.com/allanzelener/YAD2K/blob/master/yad2k/models/keras_yolo.py
https://github.com/allanzelener/YAD2K
https://github.com/longcw/yolo2-pytorch
https://arxiv.org/pdf/1612.08242.pdf
http://aimotion.blogspot.co.uk/2010/06/hi-all-it-has-been-while-since-my-last.html
https://github.com/tommy-qichang/yolo.torch/tree/master/yoloCriterion
https://github.com/marvis/pytorch-yolo2