next up previous contents
Next: Graphical representation Up: Descriptive statistics Previous: Summary measures for univariate

Resistant measures

Observed variables often contain rogue outlier values that lie far away from the sample mean. Especially when dealing with small samples, outliers can bias the previous summary statistics away from values representative for majority of the sample.

This problem can be avoided either by eliminating or downweighting the outlier values in the sample (quality control), or by using statistics that are resistant to the presence of outliers. Note that the word robust should not be used to signify resistant since it is used in statistics to refer to insensitivity to choice of probability model rather than data value. Because the range is based on the extreme minimum and maximum values in the sample, it is a good example of a statistic that is not at all resistant to the presence of an outlier (and so should be interpreted very carefully !).

Resistant summary statistics can be obtained by using the sample quantiles (percentiles/fractiles). Quantiles are constructed by sorting (ranking) the data into ascending order to obtain a sequence of order statistics $\{x_{(1)},x_{(2)},\ldots,x_{(n)}\}$. The p'th quantile qp is then obtained by taking the 1+(n-1)p'th order statistic x1+(n-1)p (or an average of neigbouring values if 1+(n-1)p is not integer). For example, the quartiles of the height example are given by q0=161 (minimum value), q0.25=171 (lower quartile), q0.5=175 (median), q0.75=180 (upper quartile), and q1=190 (maximum value).

Unlike the arithmetic mean, the median is not at all influenced by the exact value of the largest objects and so provides a resistant measure of the central location. Likewise, a resistant measure of the scale can be obtained using the Inter-Quartile Range (IQR) given by the difference between the upper and lower quartiles q0.75-q0.25. In the asymptotic limit of large sample size ( $n\rightarrow\infty$), for normally distributed variables, the sample median tends to the sample mean and the sample IQR tends to 1.34 times the sample standard deviation. More resistant measures of skewness and kurtosis also exist such as L-moments but are beyond the scope of this course. Refer to von Storch and Zwiers (1999) for more details.


next up previous contents
Next: Graphical representation Up: Descriptive statistics Previous: Summary measures for univariate
David Stephenson
2000-09-02