Linear Classification

A linear classifier does classification decision based on the value of a linear combination of the characteristics. Imagine that the linear classifier will merge into it's weights all the characteristics that define a particular class. (Like merge all samples of the class cars together)

This type of classifier works better when the problem is linear separable.

f(x,W,b)=j(Wjxj)+bx: input vectorW: Weight matrixb: Bias vector\large f(\vec{x}, \vec{W}, \vec{b}) = \sum{j} (W{j} x_{j}) + b \\ \text{x: input vector}\\ \text{W: Weight matrix}\\ \text{b: Bias vector}

The weight matrix will have one row for every class that needs to be classified, and one column for ever element(feature) of x.On the picture above each line will be represented by a row in our weight matrix.

Weight and Bias Effect

The effect of changing the weight will change the line angle, while changing the bias, will move the line left/right

Parametric Approach

The idea is that out hypothesis/model has parameters, that will aid the mapping between the input vector to a specific class score. The parametric model has two important components:

  • Score Function: Is a function f(x,W,b)f(x,W,b) that will map our raw input vector to a score vector

  • Loss Function: Quantifies how well our current set of weights maps some input x to a expected output y, the loss function is used during training time.

On this approach, the training phase will find us a set of parameters that will change the hypothesis/model to map some input, to some of the output class.

During the training phase, which consist as a optimisation problem, the weights (W) and bias (b) are the only thing that we can change.

Now some topics that are important on the diagram above:

  1. The input image x is stretched to a single dimension vector, this loose spatial information

  2. The weight matrix will have one column for every element on the input

  3. The weight matrix will have one row for every element of the output (on this case 3 labels)

  4. The bias will have one row for every element of the output (on this case 3 labels)

  5. The loss will receive the current scores and the expected output for it's current input X

Consider each row of W a kind of pattern match for a specified class. The score for each class is calculated by doing a inner product between the input vector X and the specific row for that class. Ex:

scorecat=[0.2(56)0.5(231)+0.1(24)+2(2)]+1.1=96.8score_{cat}=[0.2(56)-0.5(231)+0.1(24)+2(2)]+1.1 = -96.8

Example on Matlab

The image bellow reshape back the weights to an image, we can see by this image that the training try to compress on each row of W all the variants of the same class. (Check the horse with 2 heads)

Bias trick

Some learning libraries implementations, does a trick to consider the bias as part of the weight matrix, the advantage of this approach is that we can solve the linear classification with a single matrix multiplication.

f(x,W)=W.xf(x,W)=W.x

Input and Features

The input vector sometimes called feature vector, is your input data that is sent to the classifier. As the linear classifier does not handle non-linear problems, it is the responsibility of the engineer, process this data and present it in a form that is separable to the classifier. The best case scenario is that you have a large number of features, and each of them has a high correlation to the desired output and low correlation between thems

Last updated