Measures of Variability
Key Questions
-
Answer:
In the formula for a population standard deviation, you divide by the population size
#N# , whereas in the formula for the sample standard deviation, you divide by#n-1# (the sample size minus one).Explanation:
If
#mu# is the mean of the population, the formula for the population standard deviation of the population data#x_{1},x_{2},x_{3},\ldots, x_{N}# is#sigma=sqrt{\frac{sum_{k=1}^{N}(x_{k}-mu)^{2}}{N}}# .If
#bar{x}# is the mean of a sample, the formula for the sample standard deviation of the sample data#x_{1},x_{2},x_{3},\ldots, x_{n}# is#s=sqrt{\frac{sum_{k=1}^{n}(x_{k}-bar{x})^{2}}{n-1}}# .The reason this is done is somewhat technical. Doing this makes the sample variance
#s^{2}# a so-called unbiased estimator for the population variance#sigma^{2}# . In effect, if the population size is really large and you are doing many, many random samples of the same size#n# from that large population, the mean of the many, many values of#s^{2}# will have an average very close to the value of#sigma^{2}# (and, as far as a theoretical perspective goes, the mean of#s^{2}# as a "random variable" will be exactly#sigma^{2}# ).The technicalities for why this is true involve lots of algebra with summations, and is usually not worth the time spent for beginning students.
-
Standard deviation is most widely used.
Range simply gives the difference between lowest and highest value, and a few extreme values will alter the range excessively.
The standard deviation
#sigma# tells you where most of the values will be, and in a normal distribution 68% of all values will be within one standard deviation from the mean#mu# , and 95% will be within two standard deviations of the mean.Example:
You have a filling machine that fills kilogram bags of sugar. It will not fill exactly#1000g# every time, the standard deviation is#10g# .
Then you know, that#68%# is between#990and1010g# , and#95%# between#980and1020g# , a total span of#20g# or#40g# respectively.Every now and again a bag will be far over-filled (say
#1100g# ) and sometimes a bag will end up empty (#0g# ), so the range will be a total of#1100g# .You may decide which of the two gives a better idea of the spread in this distribution.
-
SD: it gives you an numerical value about the variation of the data.
Range: it gives you the maximal and minimal values of all data.Mean: a pontual value that represents the average value of data. Doesn't represent the true in assimetrical distributions and it is influenced by outliers