next up previous contents
Next: The data matrix and Up: Multivariate methods Previous: Hovmöller plots

Problem of multiplicity*

 


  
Figure 8.3: The Binomial distribution of the multiplicity problem for correlation maps in Fig. 8.1. The degrees of freedom in this case is N=20 as the SLP fields have been reconstructed from the 20 leading EOFs. [stats_uib_8_1.m]
\begin{figure}\centerline{
\epsfxsize=5in
\epsfysize=3in
\epsffile{figs/stats_uib_8-1c.eps}
}
\end{figure}

The confidence levels of hypothesis testing for each correlation score (grid box) leave some chance, p, of a false rejection of H0. How many false rejections can one expect to see it there are N such tests? This situation crops up in multivariate analysis, where N can be taken as the degrees of freedom. The probability of a number of false rejection follows a binomial distribution function (Wilks, 1995 [], p.151-157).

Often, the adjacent grid boxes are related to one another, and there are not $n_x \times n_y$ degrees of freedom (DOF $ \ll n_x \times n_y$). Thus, in the multiplicity analysis, one must use the DOF and not $n_x \times n_y$:


 \begin{displaymath}P_r(X=x) = \left( \begin{array}{c} N\\ x \end{array} \right) p^x ( 1 - p )^{N-x}
\end{displaymath} (8.1)

NB: The expected area was estimated from the c.d.f. of the number of EOFs that can be expected to have a correlation that exceeds the 5% limit by pure chance: this number was divided by the total number of EOFs, giving a fraction, and assuming that this fraction is representative for the area of correlation (valid?), the expected area is calculated. Although, the area of significant correlation in Fig. 8.1 is about the same as expected according to a binomial distribution, the highest correlation scores are substantially higher than $r_{\mbox{\tiny crit}}$, and they tend to be located over northern Europe and Northern Africa. If higher confidence levels were used (eg 99%), then the expected area with correlation exceeding the confidence limit is smaller. The ``clustering'' of correlation score may mean that if we did the same analysis for a sub-domain over northern Europe, the fractional area with significant correlation would be much higher. If the significant scores were more fragmented, then a change of scale may not change the area ratio.


next up previous contents
Next: The data matrix and Up: Multivariate methods Previous: Hovmöller plots
David Stephenson
2000-09-02