How does an outlier affect the mean of a population?
1 Answer
Outliers tend to "pull" the mean towards them. See explanation for more details and examples.
Explanation:
Outliers, loosely speaking, are values which are so far "away" from the general area of the remaining values of a data set that they nearly appear to be suspect values.
More technically speaking, an outlier is generally any data value that lies more than 1.5 times the interquartile range (IQR) of a data set beyond the first or third quartiles. To know this, you generally have to calculate the lower quartile (Q1), median (Q2), upper quartile (Q3), and interquartile range (Q3-Q1), and then compare each data point to
In any case, an outlier can dramatically affect the mean of a population as a measure of central tendency. Consider the following set of data (chosen to make calculations easy):
For this set, we can calculate the mean
Now, let us replace the value 6 in this set with an exaggerated value that would definitely be considered an outlier of this overly small data set:
We can see now how this affects the new mean
By changing a single value, we "pulled" the mean strongly in the direction of the large outlier we just created.
We can see this effect again with an obviously contrived example:
Clearly if every number is 100 in the set, no matter how many numbers there are the mean
In this instance, we can examine the new calculated mean
Although it doesn't seem too impressive, changing these three values to a small value outlier has had the effect of "pulling" the mean of
It should be noted that in both of these examples I created, the mode (most common value) was unchanged as a result of swapping in an outlier, and the median (the "central" value) was unchanged as well. This demonstrates an "attractive" feature of the median as a measure of central tendency: it tends to be more insulated from wild swings that could be caused by outliers in the data.