Book Notes: Computer Age Statistical Inference -- Ch15 Multiple Testing

For the pdf slides, click here

Classical (Before Computer Age) Multiple Testing Corrections

Background and notations

  • Before computer age, multiple testing may only involve 10 or 20 tests. With the emerge of biomedical (microarray) data, multiple testing may need to evaluate several thousands of tests

  • Notations

    • N: total number of tests, e.g., number of genes.
    • zi: the z-statistic of the i-th test. Note that if we perform tests other than z-test, say a t-test, then we can use inverse-cdf method to transform the t-statistic into a z-statistic, like below zi=Φ1[Fdf(ti)], where Φ is the standard normal cdf, and F is a t distribution cdf.
    • I0: the indices of the true H0i, having N0 members. Usually, majority of hypotheses are null, so π0=N0/N is close to 1.
  • Hypotheses: standard normal vs normal with a non-zero mean H0i:ziN(0,1)H1i:ziN(μi,1) where μi is the effect size for test i

Example: the prostate data

  • A microarray data of

    • n=102 people, 52 prostate cancer patients and 50 normal controls
    • N=6033 genes
Histogram of 6033 z-values, with the scaled standard normal density curve in red

Figure 1: Histogram of 6033 z-values, with the scaled standard normal density curve in red

Bonferroni Correction

Classical multiple testing method 1: Bonferroni bound

  • For an overall significance level α (usually α=0.05), with N simultaneous tests, the Bonferroni bound rejects the ith null hypothesis H0i at individual significance level

piαN

  • Bonferroni bound is quite conservative!

    • For prostate data N=6033 and α=0.05, the p-value rejection cutoff is very small: pi8.3×106

Family-wise Error Rate

Classical multiple testing method 2: FWER control

  • The family-wise error rate is the probability of making even one false rejection FWER=P(reject any true H0i)

  • Bonferroni’s procedure controls FWER, i.e., Bonferroni bound is more conservative than FWER control FWER=P{iI0(piαN)}iI0P(piαN)=N0αNα

FWER control: Holm’s procedure

  1. Order the observed p-values from smallest to largest p(1)p(2)p(i)p(N)

  2. Reject null hypotheses H0(i) if p(i)Threshold(Holm's)=αNi+1

  • FWER is usually still too conservative for large N, since it was originally developed for N20

An R function to implement Holm’s procedure

## A function to obtain Holm's procedure p-value cutoff
holm = function(pi, alpha=0.1){
  N = length(pi)
  idx = order(pi)
  reject = which(pi[idx] <= alpha/(N - 1:N + 1))
  
  return(idx[reject])
}
## Download prostate data's z-values
link = 'https://web.stanford.edu/~hastie/CASI_files/DATA/prostz.txt'
prostz = c(read.table(link))$V1
## Convert to p-values
prostp = 1 - pnorm(prostz)

Illustrate Holm’s procedure on the prostate data

## Apply Holm's procedure on the prostate data
results = holm(prostp)
## Total number of rejected null hypotheses
r = length(results); r
## [1] 6
## The largest z-value among non-rejected nulls
sort(prostz, decreasing = TRUE)[r + 1]
## [1] 4.13538
## The smallest p-value among non-rejected nulls
sort(prostp)[r + 1]
## [1] 1.771839e-05

False Discovery Rates

False discovery proportion

  • FDR control is a more liberal criterion (compared with FWER), thus it has become standard for large N multiple testing problems.

  • False discovery proportion Fdp(D)={a/R,if R00,if R=0
    • A decision rule D rejects R out of N null hypotheses
    • a of those are false discoveries (unobservable)

False discovery rate

  • False discovery rates FDR(D)=E{Fdp(D)}

  • A decision rule D controls FDR at level q, if FDR(D)q

    • q is a prechosen value between 0 and 1

Benjamini-Hochberg FDR control

Benjamini-Hochberg FDR control

  1. Order the observed p-values from smallest to largest p(1)p(2)p(i)p(N)

  2. Reject null hypotheses H0(i) if p(i)Threshold(Dq)=qNi

  • Default choice q=0.1

  • Theorem: if the p-values are independent of each other, then the above procedure controls FDR at level q, i.e., FDR(Dq)=π0qq,where π0=N0/N

    • Usually, majority of the hypotheses are truly null, so π0 is near 1

An R function to implement Benjamini-Hochberg FDR control

## A function to obtain Holm's procedure p-value cutoff
bh = function(pi, q=0.1){
  N = length(pi)
  idx = order(pi)
  reject = which(pi[idx] <= q/N * (1:N))
  
  return(idx[reject])
}

Illustrate Benjamini-Hochberg FDR control on the prostate data

## Apply Holm's procedure on the prostate data
results = bh(prostp)
## Total number of rejected null hypotheses
r = length(results); r
## [1] 28
## The largest z-value among non-rejected nulls
sort(prostz, decreasing = TRUE)[r + 1]
## [1] 3.293507
## The smallest p-value among non-rejected nulls
sort(prostp)[r + 1]
## [1] 0.0004947302

Comparing Holm’s FWER control and Benjamini-Hochberg FDR control

  • In the usual range of interest, large N and small i, the ratio Threshold(Dq)Threshold(Holm's)=qα(1i1N)i increases with i almost linearly

  • The figure below is about the prostate data, with α=q=0.1

Question about the FDR control procedure

  1. Is controlling a rate (i.e., FDR) as meaningful as controlling a probability (of Type 1 error)?

  2. How should q be chosen?

  3. The control theorem depends on independence among the p-values. What if they’re dependent, which is usually the case?

  4. The FDR significance for one gene depends on the results of all other genes. Does this make sense?

An empirical Bayes view

Two-groups model

  • Each of the N cases (e.g., genes) is

    • either null with prior probability π0,
    • or non-null with probability π1=1π0
  • For case i, its z-value zi under Hij for j=0,1 has density fj(z), cdf Fj(z), and survival curve Sj(z)=1Fj(z)

  • The mixture survival curve S(z)=π0S0(z)+π1S1(z)

Bayesian false-discovery rate

  • Suppose the observation zi for case i is seen to exceed some threshold value z0 (say z0=3). By Bayes’ rule, the Bayesian false-discovery rate is Fdr(z0)=P(case i is nullziz0)=π0S0(z0)S(z0)

  • The “empirical” Bayes reflects in the estimation of the denominator: when N is large, S^(z0)=N(z0)N,N(z0)=#{ziz0}

  • An empirical Bayes estimate of the Bayesian false-discovery rate Fdr^(z0)=π0S0(z0)S^(z0)

Connection between Fdr^ and FDR controls

  • Since pi=S0(zi) and S^(z(i))=i/N, the FDR control Dq algorithm p(i)iNq becomes S0(z(i))S^(z(i))q, After rearranging the above formula, we have its Bayesian Fdr bounded Fdr^(z0)π0q

  • The FDR control algorithm is in fact rejecting those cases for which the empirical Bayes posterior probability of nullness is too small

Answer the 4 questions about the FDR control

  1. (Rate vs probability) FDR control does relate to the posterior probability of nullness

  2. (Choice of q) We can set q according to the maximum tolerable amount of Bayes risk of nullness, usually after taking π0=1 in

  3. (Independence) Most often the zi, and hence the pi, are correlated. However even under correlation, S^(z0) is still an unbiased estimator for S(z0), making Fdr^(z0) nearly unbiased for Fdr(z0).

    • There is a price to be paid for correlation, which increases the variance of S^(z0) and Fdr^(z0)
  4. (Rejecting one test depending on others) In the Bayes two-group model, the number of null cases zi exceeding some threshold z0 has fixed expectation Nπ0S0(z0). So an increase in the number of zi exceeding z0 must come from a heavier right tail for f1(z), implying a greater posterior probability of non-nullness Fdr(z0).

    • This emphasizes the “learning from the experience of others” aspect of empirical Bayes inference

Local False Discovery Rates

Local false discovery rates

  • Having observed test statistic zi equal to some value z0, we should be more interested in the probability of nullness given zi=z0 than ziz0

  • Local false discovery rate fdr(z0)=P(case i is nullzi=z0)=π0f0(z0)f(z0)

  • After drawing a smooth curve f^(z) through the histogram of the z-values, we get the estimate fdr^(z0)=π0f0(z0)f^(z0)

    • the null proportion π0 can either be estimated or set equal to 1

A fourth-degree log polynomial Poisson regression fit to the histogram, on the prostate data

  • Solid line is the local fdr^(z) and dashed lines are tail-area Fdr^(z)

  • 27 genes on the right and 25 one the left have fdr^(zi)0.2

The default cutoff for local fdr

  • The cutoff fdr^(zi)0.2 is equivalent to f1(z)f0(z)4π0π1

  • Assuming π00.9, this makes the factor factor quite large f1(z)f0(z)36 This is “strong evidence” against the null hypothesis in Jeffrey’s scale of evidence for the interpretation of Bayes factors

Relation between the local and tail-area fdr’s

  • Since Fdr(z0)=E(fdr(z)zz0) Therefore Fdr(z0)<fdr(z0)

  • Thus, the conventional significant cutoffs are Fdr^(z)0.1fdr^(z)0.2

Empirical Null

Empirical null

  • Large scale applications may allow us to empirically determine a more realistic null distribution than H0i:ziN(0,1)

  • In the police data, a N(0,1) curve is too narrow for the null. Actually, an MLE fit to central data gives N(0.10,1.402) as the empirical null

Empirical null estimation

  • The theoretical null ziN(0,1) is not completely wrong, but needs adjustment for the dataset at hand

  • Under the two-group model, with f0(z) normal but not necessarily standard normal f0(z)N(δ0,σ02), to compute the local fdr(z)=π0f0(z)/f(z), we need to estimate three parameters (δ0,σ0,π0)

  • Our key assumption is that π0 is large, say π00.9, and most of the zi near 0 are null.

  • The algorithm locfdr begins by selecting a set A0 near z=0 and assumes that all the zi in A0 are null

  • Maximum likelihood based on the numbers and values of zi in A0 yield the empirical null estimates (δ^0,σ^0,π^0)

References