Artificial Inteligence
  • Preface
  • Introduction
  • Machine Learning
    • Linear Algebra
    • Supervised Learning
      • Neural Networks
      • Linear Classification
      • Loss Function
      • Model Optimization
      • Backpropagation
      • Feature Scaling
      • Model Initialization
      • Recurrent Neural Networks
        • Machine Translation Using RNN
    • Deep Learning
      • Convolution
      • Convolutional Neural Networks
      • Fully Connected Layer
      • Relu Layer
      • Dropout Layer
      • Convolution Layer
        • Making faster
      • Pooling Layer
      • Batch Norm layer
      • Model Solver
      • Object Localization and Detection
      • Single Shot Detectors
        • Yolo
        • SSD
      • Image Segmentation
      • GoogleNet
      • Residual Net
      • Deep Learning Libraries
    • Unsupervised Learning
      • Principal Component Analysis
      • Generative Models
    • Distributed Learning
    • Methodology for usage
      • Imbalanced/Missing Datasets
  • Artificial Intelligence
    • OpenAI Gym
    • Tree Search
    • Markov Decision process
    • Reinforcement Learning
      • Q_Learning_Simple
      • Deep Q Learning
      • Deep Reinforcement Learning
    • Natural Language Processing
      • Word2Vec
  • Appendix
    • Statistics and Probability
      • Probability
        • Markov Chains
        • Random Walk
    • Lua and Torch
    • Tensorflow
      • Multi Layer Perceptron MNIST
      • Convolution Neural Network MNIST
      • SkFlow
    • PyTorch
      • Transfer Learning
      • DataLoader and DataSets
      • Visualizing Results
Powered by GitBook
On this page
  • Introduction
  • Anchor(Priors or Default boxes) concept
  • Multibox Loss Function
  • Localization Loss
  • Classification Loss
  • References:

Was this helpful?

  1. Machine Learning
  2. Deep Learning
  3. Single Shot Detectors

SSD

PreviousYoloNextImage Segmentation

Last updated 5 years ago

Was this helpful?

Introduction

The SSD detector differs from others single shot detectors due to the usage of multiple layers that provide a finer accuracy on objects with different scales. (Each deeper layer will see bigger objects).

The SSD normally start with a VGG on Resnet pre-trained model that is converted to a fully convolution neural network. Then we attach some extra conv layers, which will actually help to handle bigger objects. The SSD architecture can in principle be used with any deep network base model.

One important point to notice is that after the image is passed on the VGG network, some conv layers are added producing feature maps of sizes 19x19, 10x10, 5x5, 3x3, 1x1. These, together with the 38x38 feature map produced by VGG’s conv4_3, are the feature maps which will be used to predict bounding boxes.

There the conv4_3 is responsible to detect the smallest objects while the conv11_2 is responsible for the biggest objects.

As shown on the diagram some activations are "grabbed" from the network and passed to a specialized sub-network that should work as a classifier and localizer. During prediction we use a Non-maxima suppression algorithm to filter the multiple boxes per object that may appear.

Anchor(Priors or Default boxes) concept

Anchors are a collection of boxes overlaid on the image at different spatial locations, scales and aspect ratios that act as reference points on the ground truth images. It's like the Yolo idea where each cell on the activation map has multiple boxes.

A model is then trained to make two predictions for each anchor:

  1. A discrete class prediction for each anchor

  2. A continuous prediction of an offset by which the anchor needs to be shifted to fit the ground-truth bounding box

During training SSD matches objects with _default boxes _of different aspects. Each element of the feature map (cell) has a number of default boxes associated with it. Any default box with an IoU (Jaccard index) greater than 0.5 is considered a match.

Consider the image above, observe that the cat is has 2 boxes that match on the 8x8 feature map, but none on the dog. Now on the 4x4 feature map there is one box that matches the dog.

It is important to note that the boxes in the 8x8 feature map are smaller than those in the 4x4 feature map: SSD grab some feature maps, each responsible for a different scale of objects, allowing it to identify objects across a large range of scales.

For each default box on each cell the network output the following:

  • A probability vector of length c, where c are the number of classes plus the background class that indicates no object.

  • A vector with 4 elements (x,y,width,height) representing the offset required move the default box position to the real object.

Multibox Loss Function

During training we minimize a combined classification and regression loss.

As Yolo the SSD loss balance the classification objective and the localization objective.

Localization Loss

Bellow we have the forward propagation of this loss using PyTorch

def forward(self, predictions, targets):
        """Multibox Loss
        Args:
            predictions (tuple): A tuple containing loc preds, conf preds,
            and prior boxes from SSD net.
                conf shape: torch.size(batch_size,num_priors,num_classes)
                loc shape: torch.size(batch_size,num_priors,4)
                priors shape: torch.size(num_priors,4)
            ground_truth (tensor): Ground truth boxes and labels for a batch,
                shape: [batch_size,num_objs,5] (last idx is the label).
        """

        loc_data, conf_data, priors = predictions
        num = loc_data.size(0)
        num_priors = (priors.size(0))
        num_classes = self.num_classes

        # match priors (default boxes) and ground truth boxes
        loc_t = torch.Tensor(num, num_priors, 4)
        conf_t = torch.LongTensor(num, num_priors)
        for idx in range(num):
            truths = targets[idx][:,:-1].data
            labels = targets[idx][:,-1].data
            defaults = priors.data
            match(self.threshold,truths,defaults,self.variance,labels,loc_t,conf_t,idx)

        # Send localization and confidence to GPU if available
        if GPU:
            loc_t = loc_t.cuda()
            conf_t = conf_t.cuda()

        # wrap targets as Variables (Include on graph, to use autograd)
        loc_t = Variable(loc_t, requires_grad=False)
        conf_t = Variable(conf_t,requires_grad=False)

        pos = conf_t > 0
        num_pos = pos.sum()

        # Localization Loss (Smooth L1)
        # Shape: [batch,num_priors,4]
        pos_idx = pos.unsqueeze(pos.dim()).expand_as(loc_data)
        loc_p = loc_data[pos_idx].view(-1,4)
        loc_t = loc_t[pos_idx].view(-1,4)
        loss_l = F.smooth_l1_loss(loc_p, loc_t, size_average=False)

        # Compute max conf across batch for hard negative mining
        batch_conf = conf_data.view(-1,self.num_classes)
        loss_c = log_sum_exp(batch_conf) - batch_conf.gather(1, conf_t.view(-1,1))

        # Hard Negative Mining
        loss_c[pos] = 0 # filter out pos boxes for now
        loss_c = loss_c.view(num, -1)
        _,loss_idx = loss_c.sort(1, descending=True)
        _,idx_rank = loss_idx.sort(1)
        num_pos = pos.long().sum(1)
        num_neg = torch.clamp(self.negpos_ratio*num_pos, max=pos.size(1)-1)
        neg = idx_rank < num_neg.expand_as(idx_rank)

        # Confidence Loss Including Positive and Negative Examples
        pos_idx = pos.unsqueeze(2).expand_as(conf_data)
        neg_idx = neg.unsqueeze(2).expand_as(conf_data)
        conf_p = conf_data[(pos_idx+neg_idx).gt(0)].view(-1,self.num_classes)
        targets_weighted = conf_t[(pos+neg).gt(0)]
        loss_c = F.cross_entropy(conf_p, targets_weighted, size_average=False)

        # Sum of losses: L(x,c,l,g) = (Lconf(x, c) + αLloc(x,l,g)) / N

        N = num_pos.data.sum()
        loss_l/=N
        loss_c/=N
        return loss_l,loss_c

Bellow there are the same loss implemented on Tensorflow (Notice that the code is a bit more complicated)

def ssd_losses(logits, localisations,
               gclasses, glocalisations, gscores,
               match_threshold=0.5,
               negative_ratio=3.,
               alpha=1.,
               label_smoothing=0.,
               device='/cpu:0',
               scope=None):
    with tf.name_scope(scope, 'ssd_losses'):
        lshape = tfe.get_shape(logits[0], 5)
        num_classes = lshape[-1]
        batch_size = lshape[0]

        # Flatten out all vectors!
        flogits = []
        fgclasses = []
        fgscores = []
        flocalisations = []
        fglocalisations = []
        for i in range(len(logits)):
            flogits.append(tf.reshape(logits[i], [-1, num_classes]))
            fgclasses.append(tf.reshape(gclasses[i], [-1]))
            fgscores.append(tf.reshape(gscores[i], [-1]))
            flocalisations.append(tf.reshape(localisations[i], [-1, 4]))
            fglocalisations.append(tf.reshape(glocalisations[i], [-1, 4]))
        # And concat the crap!
        logits = tf.concat(flogits, axis=0)
        gclasses = tf.concat(fgclasses, axis=0)
        gscores = tf.concat(fgscores, axis=0)
        localisations = tf.concat(flocalisations, axis=0)
        glocalisations = tf.concat(fglocalisations, axis=0)
        dtype = logits.dtype

        # Compute positive matching mask...
        pmask = gscores > match_threshold
        fpmask = tf.cast(pmask, dtype)
        n_positives = tf.reduce_sum(fpmask)

        # Hard negative mining...
        no_classes = tf.cast(pmask, tf.int32)
        predictions = slim.softmax(logits)
        nmask = tf.logical_and(tf.logical_not(pmask),
                               gscores > -0.5)
        fnmask = tf.cast(nmask, dtype)
        nvalues = tf.where(nmask,
                           predictions[:, 0],
                           1. - fnmask)
        nvalues_flat = tf.reshape(nvalues, [-1])
        # Number of negative entries to select.
        max_neg_entries = tf.cast(tf.reduce_sum(fnmask), tf.int32)
        n_neg = tf.cast(negative_ratio * n_positives, tf.int32) + batch_size
        n_neg = tf.minimum(n_neg, max_neg_entries)

        val, idxes = tf.nn.top_k(-nvalues_flat, k=n_neg)
        max_hard_pred = -val[-1]
        # Final negative mask.
        nmask = tf.logical_and(nmask, nvalues < max_hard_pred)
        fnmask = tf.cast(nmask, dtype)

        # Add cross-entropy loss.
        with tf.name_scope('cross_entropy_pos'):
            loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits,
                                                                  labels=gclasses)
            loss = tf.div(tf.reduce_sum(loss * fpmask), batch_size, name='value')
            tf.losses.add_loss(loss)

        with tf.name_scope('cross_entropy_neg'):
            loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits,
                                                                  labels=no_classes)
            loss = tf.div(tf.reduce_sum(loss * fnmask), batch_size, name='value')
            tf.losses.add_loss(loss)

        # Add localization loss: smooth L1, L2, ...
        with tf.name_scope('localization'):
            # Weights Tensor: positive mask + random negative.
            weights = tf.expand_dims(alpha * fpmask, axis=-1)
            loss = custom_layers.abs_smooth(localisations - glocalisations)
            loss = tf.div(tf.reduce_sum(loss * weights), batch_size, name='value')
            tf.losses.add_loss(loss)

References:

Classification Loss

https://arxiv.org/pdf/1611.10012.pdf
http://www.cs.unc.edu/~wliu/papers/ssd.pdf
http://silverpond.com.au/2017/02/17/how-we-built-and-trained-an-ssd-multibox-detector-in-tensorflow.html
http://www.shortscience.org/paper?bibtexKey=conf/eccv/LiuAESRFB16#qureai
https://github.com/weiliu89/caffe/tree/ssd
https://pt.slideshare.net/xavigiro/ssd-single-shot-multibox-detector
https://chatbotslife.com/towards-a-real-time-vehicle-detection-ssd-multibox-approach-2519af2751c
http://www.eccv2016.org/files/posters/O-1A-02.pdf
http://www.cs.unc.edu/~wliu/papers/ssd_eccv2016_slide.pdf
https://github.com/zhreshold/mxnet-ssd
https://github.com/rykov8/ssd_keras
https://github.com/balancap/SSD-Tensorflow
https://github.com/amdegroot/ssd.pytorch
https://github.com/weiliu89/caffe/issues?utf8=%E2%9C%93&q=author%3Aleonardoaraujosantos%20
https://github.com/seann999/ssd_tensorflow