Machine Learning Statistics
Statistics are tools to get answers to questions about data:
- What is Common?
- What is Expected?
- What is Normal?
- What is the Probability?
Inferential Statistics
Inferential statistics are methods for quantifying properties of a population from a small Sample:
You take data from a sample and make a prediction about the whole population.
For example, you can stand in a shop and ask a sample of 100 people if they like chocolate.
From your research, using inferential statistics, you could predict that 91% of all shoppers like chocolate.
Incredible Chocolate Facts
Nine out of ten people love chocolate.
50% of the US population cannot live without chocolate every day.
You use Inferential Statistics to predict whole domains from small samples of data.
Descriptive Statistics
Descriptive Statistics summarizes (describes) observations from a set of data.
Since we register every new born baby, we can tell that 51 out of 100 are boys.
From these collected numbers, we can predict a 51% chance that a new baby will be a boy.
It is a mystery that the ratio is not 50%, like basic biology would predict. We only know that we have had this tilted sex ratio since the 17th century.
Note
Row observations are only data. They are not real knowledge.
You use Descriptive Statistics to transform raw observations into data that you can understand.
Descriptive Statistics Measurements
Descriptive statistics are broken down into different measures:
Tendency (Measures of the Center)
- The Mean (the average value)value
- The Median (the mid point value)
- The Mode (the most common value)
Spread (Measures of Variability)
- Min and Max
- Standard Deviation
- Variance
- Skewness
- Kurtosis