Statistics and Probability

On this chapter we learn the basic about statistics. Statistics is all about is converting data into useful information, it's a process made of the following components:

  • Gather data (Producing Data)

  • Summarize this data (exploratory data analysis)

  • Interpret data (Inference)

The process start when we identify something that we want to know about it, we call this Population

The problem is that quite often the population is so big that we cannot process it entirely, so we sample a subgroup of this population hoping that this sample capture the dynamic of whatever we want to learn about the population. Here we need to take care to sample this population in a way that we can capture this dynamic.

Now we will try to summarize this data, this step is called exploratory data analysis. Imagine this step as some kind of feature extraction/dimensionality reduction (Which deep learning does automatically) .

Now that we have the data filtered we need to draw conclusions from it, now we use probability to help us. This step is also called Inference. Inference is just an educated guess that we can draw from the data available to us. The coolest inference engine nowadays is the Bayesian.

Also a big trend today is to just run a powerful algorithm (SVM, Random Forest, Naive Bayes, etc...) after you collected and processed your data, but before that try to look at your data, ex use scatter plots.

Check bellow how we define correlation between variables.

During the summarizing phase we want to select variables that are strong correlated with your population but preferably not correlated with other variables on your dataset.

Some definitions

  • Data: Are pieces of information about individuals(persons, objects, etc...) organized into variables.

  • Variables: Some characteristic of the indicidual (person, objects, etc...)

  • Dataset: List/Table of variables

Type of variables

  • Categorical: Some finite group of values (ex: Race could be White, Black, Asian, etc...)

  • Quantitative: Some number (ex: Age, Income, etc...)

References

Last updated