Normal probability Distribution question from a small sample?
Weights of golden retriever dogs are normally distributed. Samples of weights of golden retriever dogs, each of size n = 15, are randomly collected and the sample
means are found. Is it correct to conclude that the sample means cannot be treated as being from a normal distribution because the sample size is too small? Explain
Weights of golden retriever dogs are normally distributed. Samples of weights of golden retriever dogs, each of size n = 15, are randomly collected and the sample
means are found. Is it correct to conclude that the sample means cannot be treated as being from a normal distribution because the sample size is too small? Explain
1 Answer
No, if the individual weights are normally distributed, the sum or average of groups of fifteen of them will be as well.
Explanation:
By the central limit theorem, the distribution of the sum and thus the average will tend toward a normal distribution. If the individual distributions are normal and identical, we're already there.
Where we might need something other than a normal distribution is for hypothesis testing using small samples. What we generally want to do is to calculate the probability that by chance we would observe the average we calculated given a purported mean (the null hypothesis) that's different.
Say we're measuring the effect of a brand of dog food. We know the average weight of a Golden Retriever from other studies. We calculate the average weight of our fifteen dogs who've been eating that brand for a year, say. We want to decide if that average is "significantly different" as the jargon goes, i.e. if this dog food makes dogs fat. (Or fat or thin; we can look at one or two sided tests).
To do our test we need to estimate how many standard deviations away our observed average is from the purported mean. So we need a value to use as the standard deviation. If we can get a reliable estimate of
If we don't know the standard deviation we have to estimate it from our limited amount of data. The calculated mean is the one that minimizes the calculated standard deviation, so it's likely we'll underestimate our standard deviation when we calculate it using the calculated mean. We get a biased estimate, which we can unbias by subtracting one from our
Once we have our
For small
In summary, we need a T distribution for hypothesis testing when we use a standard deviation that was estimated using the sample mean with a small
Factoid: The original work that led to the t-test was testing batches of beer.