Normal probability Distribution question from a small sample?

Weights of golden retriever dogs are normally distributed. Samples of weights of golden retriever dogs, each of size n = 15, are randomly collected and the sample
means are found. Is it correct to conclude that the sample means cannot be treated as being from a normal distribution because the sample size is too small? Explain

1 Answer
Jun 17, 2018

No, if the individual weights are normally distributed, the sum or average of groups of fifteen of them will be as well.

Explanation:

By the central limit theorem, the distribution of the sum and thus the average will tend toward a normal distribution. If the individual distributions are normal and identical, we're already there.

Where we might need something other than a normal distribution is for hypothesis testing using small samples. What we generally want to do is to calculate the probability that by chance we would observe the average we calculated given a purported mean (the null hypothesis) that's different.

Say we're measuring the effect of a brand of dog food. We know the average weight of a Golden Retriever from other studies. We calculate the average weight of our fifteen dogs who've been eating that brand for a year, say. We want to decide if that average is "significantly different" as the jargon goes, i.e. if this dog food makes dogs fat. (Or fat or thin; we can look at one or two sided tests).

To do our test we need to estimate how many standard deviations away our observed average is from the purported mean. So we need a value to use as the standard deviation. If we can get a reliable estimate of #sigma# from other studies, we can treat #sigma/sqrt{n}# as the standard deviation of the average, which we divide the difference beween the means by. We'd expect that to be normally distributed.

If we don't know the standard deviation we have to estimate it from our limited amount of data. The calculated mean is the one that minimizes the calculated standard deviation, so it's likely we'll underestimate our standard deviation when we calculate it using the calculated mean. We get a biased estimate, which we can unbias by subtracting one from our #n# to adjust for the one degree of freedom we used up from the data calculating the average. The quotient which estimates the number of standard deviations away from the purported mean will then have a #t# distribution with #n-1# degrees of freedom.

Once we have our #t# we look up in the distribution, just like for a normal distribution, the probability of seeing the calculated #t.# If the #t# is big enough, that probability will be small enough, less than 5% or 1%, whatever significance level we've decided to accept, and we can conclude our average is "significantly greater" then the null hypothesis, i.e. this food make Goldens fat.

For small #n# the T distribution has "fat tails" compared to the normal distribution, meaning large #t#s are more likely than we'd expect from a normal distribution. Curb your enthusiasm when you see big #t#s with small #n#s.

In summary, we need a T distribution for hypothesis testing when we use a standard deviation that was estimated using the sample mean with a small #n.#

Factoid: The original work that led to the t-test was testing batches of beer.