Unsupervised Learning

As mentioned on previous chapters, unsupervised learning is about learning information without the label information. Here the term information means, "structure" for instance you would like to know how many groups exist in your dataset, even if you don't know what those groups mean. Also we use unsupervised learning to visualize your dataset, in order to try to learn some insight from the data.

Unlabeled data example

Consider the following dataset
XR2X \in R^2
(X has 2 features)
One type of unsupervised learning algorithm called "clustering" is used to infer how many distinct groups exist on your dataset.
Here we still don't know what those groups means, but we know that there are 4 groups that seems very distinct. On this case we choose a low dimensional dataset
but on real life it could be thousands of dimensions, ie
for a grayscale 28x28 image.

Dimensionality Reduction

In order to improve classification response time (not prediction performance) and sometimes for visualizing your high dimension dataset (2D, 3D), we use dimesionality reduction techniques (ie: PCA, T-Sne).
For example the MNIST datset is composed with 60,000 training examples of (0..9) digits, each one with 784 dimensions. This high dimensionality is due to the fact that each digit is a 28x28 grayscale image.
It would be difficult to vizualize this dataset, so one option is to reduce it's dimensions to something visible on the monitor (2D,3D).
Here is easy to observe that a classifier could have problems to differentiate the digit 1 and 7. Another advantage is that this gives us some hint on how good is our current set of features.


We can also use neural networks to do dimensionality reduction the idea is that we have a neural network topology that approximate the input on the output layer. On the middle the autoencoder has smaller layer. After training the middle layer has a compressed version (lossy) of the input.

Convolution Neural network pre-train

As we don't need the label information to train autoencoders, we can use them as a pre-trainer to our convolution neural network. So in the future we can start your training with the weights initialized from unsupervised training.
Some examples of this technique can be found here:

Data Manifold

Manifold Learning pursuits the goal to embed data that originally lies in a high dimensional space in a lower dimensional space, while preserving characteristic properties. This is possible because for any high dimensional data to be interesting, it must be intrinsically low dimensional. For example, images of faces might be represented as points in a high dimensional space (let’s say your camera has 5MP -- so your images, considering each pixel consists of three values [r,g,b], lie in a 15M dimensional space), but not every 5MP image is a face. Faces lie on a sub-manifold in this high dimensional space. A sub-manifold is locally Euclidean, i.e. if you take two very similar points, for example two images of identical twins they will be close on the euclidian space
For example on the dataset above we have a high dimension manifold, but the faces sit's on a much lower dimension space (almost euclidian). So on this subspace things like distance has a meaning.
With the increase of more features, the data distribution will not be linear, so simpler linear techniques (ex: PCA) will not be useful for dimensionality reduction. On those cases we need other stuff like T-Sne, Autoencoders, etc..
By the way dimensionality reduction on non-linear manifolds is sometimes called manifold learning.
Bellow we have a diagram that guide you depending on the type of problem:
Here is a comparison between the T-SNE method against PCA on MNIST dataset