Weak Convergence and Empirical Processes

with A. W. van der Vaart; March 1996

Review 6 Metrika 46. Peter Gaenssler, reviewer.

Vaart, Aad van der; Wellner, Jon A. Weak convergence and empirical processes. With applications to statistics. (English) [B] Springer Series in Statistics. New York, NY: Springer, xvi, 508 p. DM 74.00; oeS 540.20; sFr 65.50 (1996). [ISBN 0-387-94640-3]

In their Preface, the authors state that they aim to achieve three goals: The first is to give an exposition of certain modes of stochastic convergence and, in the particular, weak convergence following ideas of Hoffmann-Jorgensen (1984 and 1991) and Dudley (1985). The second one is to use the weak convergence theory background to present an account of major components of the modern theory of empirical processes indexed by classes of functions. The third goal is to illustrate the usefulness of modern weak convergence theory and modern empirical process theory for statistics by a wide variety of applications. Accordingly the book consists of three parts, each of which is self-contained to a certain extent; namely, Part 1, Stochastic Convergence, Part 2, Empirical processes; Part 3; Statistical Applications. Notes at the end of each part contain useful comments on the literature supplementing the text. Most of the chapters in each part conclude with "problems and complements". An Appendix covers auxiliary results: inequalities (due to Ottaviani, L\'evy, Hoffmann-Jorgensen, Hoeffding), the contraction principle, Gaussian processes (Exponential bounds, majorizing measures), Rademacher processes, Isoperimetric inequalities for product measures, and some limit theorems

Part 1 presents a through exposition of the theory of weak convergence for arbitrary, possibly nonmeasurable maps $\XX_n$ (defined on an underlying probability space and taking values in a metric space $D$) to a limit random element $\XX$ in $D$ which is assumed to be Borel-measurable. (In view of the notation used by the authors in Part 2, I prefer to write here $\XX_n$ and $\XX$ instead of $X_n$ and $X$.) Denoting by $E^*$ outer expectations, $\XX_n$ is said to converge weakly to $\XX$ ($\XX_n \to \XX$) if $E^* f(\XX_n) \to Ef(\XX)$ as $n \to \infty$ for every bounded and continuous $f : D \to R$. Based on this definition, theory of weak convergence runs more or less parallel to the classical theory treated in Billingsley (1968), including a portmanteau theorem, continuous mapping theorems, Prohorov's theorem, tightness and basic tools for establishing tightness, and weak convergence results for product spaces. Relationships with other modes of stochastic convergence such as convergence in probability and almost sure convergence are extended to nonmeasurable maps leading to the concept of "convergence in outer probability" and "outer almost sure convergence".

Part 2 is concerned with the theory of empirical processes, which has developed rapidly over the past 17 years since the ground breaking work by Dudley (1978). This vigourous development has gone hand in hand with considerable progress in related areas of probability theory in Banach spaces (cf., e.g. Ledoux and Talagrand 1991). The authors' concern is mostly on empirical measures and empirical processes based on a sample of observations in a competely arbitrary sample space $({\cal X}, {\cal A})$. This aspect becomes important when statisticians are, e.g. dealing with "function"- or "picture"-valued data such as seismographs, noise level tracings, electrocardiograms, and high-dimensional biomedical data. For this, let $X_1, X_2 , \ldots $ be random elements in $({\cal X}, {\cal A})$ with common distribution $P$ on ${\cal A}$, the $X_i`s$ defined canonically, i.e., the projection onto the ith coordinate. The empirical measure $\PP_n$ associated with $X_1 , \ldots , X_n$ is defined as $\PP_n := n^{-1} \sum_{i\le n} \delta_{X_i}$, where $\delta_x$ denotes the Dirac measure at $x \in {\cal X}$, and the corresponding empirical process (of sample size $n$) is given by $\GG_n := \sqrt{n} ( \PP_n - P)$. Modern empirical process theory views both $\PP_n$ and $\GG_n$ as stochastic processes indexed by subclasses ${\cal C}$ of ${\cal A}$ or classes ${\cal F}$ of measurable functions $f : {\cal X} \to R$. Thus $\GG_n f = \sqrt{n} (\PP_n f - Pf)$ where $Pf := \int f dP$ and $\PP_n f = n^{-1} \sum_{i\le n} f(X_i)$. If the sample space ${\cal X}$ equals the interval $I = [0,1]$ and $P$ is the uniform distribution on $I$, by taking ${\cal F} = \{ 1[0,t] : t \in I \}$, $\GG_n$ becomes the uniform empirical process $\UU_n = (\UU_n (t): t \in I)$ with $ \UU_n (t) = \sqrt{n} ( \FF_n (t) - t)$ and $\FF_n (t) := n^{-1} \sum_{i\le n} 1\{X_i \le t \}$, $t \in I$, begin the empirica distribution function. But, as pointed out by Chibisov (1965) and nicely explained by Billingsley (1968, pp. 150-163) neither $\FF_n$ nor $\UU_n$ can be viewed as random elements in the space $D[0,1]$ endowed with the Borel $\sigma-$field pertaining to the supremum norm $\| \cdot \|_{\infty}$. Thus the classical definition of weak convergence cannot be used for $\UU_n$ vied as a random function with values in the metric space $(D[0,1], \| \cdot \|_{\infty})$, even though this is a very natural space to serve as sample space for the processes $\UU_n$. To deal with this difficulty, the theory presented in Part 1 is in order, avoiding thus the usage of Skorohod's metric in $D[0,1]$ and its extension to the multivariate case, where $I$ is replaced by $I^d$, $d > 1$. In view of the classical Glivenko-Cantelli theorem and Donsker's functional central limit theorem for $\UU_n$, one says that, within the general setting given before, a class ${\cal F}$ is a (P-)Glivenko-Cantelli class if $\| \PP_n - P\|_{\cal F} := sup_{f \in {\cal F}} | \PP_n f - Pf |$ converges to zero "outer almost surely", and ${\cal F}$ is called a (P-) Donsker class if $\GG_n \to \GG$ in the Banach space $\DD \equiv l^{\infty} ({\cal F}) := \{ z : {\cal F} \to R : \| z \|_{\cal F} := sup_{f \in {\cal F}} |z(f)| < \infty \}$ endowed with the metric $\| z \|_{\cal F}$, where $\GG$ is the $P-$Brownian bridge. Whether a given class ${\cal F}$ is a (P-) Glivenko-Cantelli class or (P-) Donsker class depends on the "size of ${\cal F}$". A relatively simple way to cope with the size of ${\cal F}$ is via entropy numbers: The $\epsilon-$entropy of ${\cal F}$ is essentially the logarithm of the number of "balls" or "brackets" of size $\epsilon$ needed to cover ${\cal F}$. Sufficient conditions for a class ${\cal F}$ to be Glivenko-Cantelli or Donsker can be given in terms of the rate of increase of the $\epsilon-$entropy as $\epsilon$ trends to zero. The core of Part 2 are the results on Glivenko-Cantelli classes ${\cal F}$ in Chap. 2.4 and on Donsker classes ${\cal F}$ in Chap. 2.5. Some of the other highlights of Part 2 are as follows: Chapter 2.7 presents estimates on the bracketing numbers for classes ${\cal F}$ of smooth functions, sets with smooth boundaries, convex sets, monotone functions, and functions smoothly depending on a parameter. Coupled with the results of Chaps. 2.4 and 2.5, these estimates yield many more interesting Donsker classes. The viewpoint in Chap. 2.9 on multiplier central limit theorems is that for a Donsker class ${\cal F}$, $\GG_n = n^{-1/2} \sum_{i\le n} (\delta_{X_i} - P) \equiv n^{-1/2} \sum_{i\le n} Z_i$ converges weakly to $\GG$ in $l^{\infty}({ \cal F})$.; then the authors pose the question: For what sequences of i.i.d. real-valued random variables $\xi_1 , \xi_2 , \ldots $ independents from the original data does it follow that $n^{-1/2} \sum_{i\le n} \xi_i Z_i \to \GG$ in $l^{\infty}({ \cal F})$? It is shown that these two statements are equivalent if the $\xi$ are centered, have variance $1$, and satisfy the $L_{2,1}$ condition $\| \xi \|_{2,1} := \int_0^{\infty} (P(| \xi_1 | > t ))^{1/2} dt < \infty$. Conditiona versions of this multiplier central limit theorem are also presented, which turn out to be basic in Part 3 for the development of limit theorems for the bootstrap empirical process. Various operations preserving the Donsker property are considered in Chap. 2.10. Such "permanence properties" provide a very effective method of verifying the Donsker property in statistical applications. Also, in view of statistical applications, it is of interest ot know whether for a given class ${\cal P}$ of probability measures on a sample space $({\cal X}, {\cal A})$ a class ${\cal F}$ is a Glivenko-Cantelli class uniformly in $P \in {\cal P}$, and correspondingly, whether ${\cal F}$ is a Donsker class uniformly in $P \in {\cal P}$; this is addressed in Chap. 2.8. Chap.2.11 gives extensions of the classical Donsker theorem for sums of $\sum_{i\le n} Z_{ni}$ of independent but not necessarily identically distributed processes $Z_{ni} = (Z_{ni} (f))_{f \in {\cal F}}$. Finally, Chap. 2.14 presents a detailed study of moment bounds, tail bounds, and exponential bounds for the supremum $\| \GG_n \|_{\cal F}$ of the empirical process $\GG_n$, playing an important role in Chaps. 3.2 and 3.4 on the limiting theory of $M-$ estimators and maximum likelihood estimators, respectively. In this connections, Sect. 2.11.3 concerning classes ${\cal F}$ of functions changing with the sample size $n$ is also of importance (cf. Section 3.2.24).

Part 3 illustrates in an impressive way the power of the general empirical process theory by applications ranging from $M-$estimators, bootstrapping, permutation tests, tests of independence, the functional $\delta-$method (von Mises method), contiguity theory up infinite-dimensional versions of the H\'ajek convolution and asymptotic minimax theorems which lead, e.g. to the asymptotic efficiency of the empirical measure in the nonparametric situation. Nonparametric bootstrap methods have become popular in statistics since their introduction by Efron (1979). This method is based on sampling from the empirical measure $\PP_n$. Given the original sample $X_1 , \ldots , X_n$, let $\hat{X}_1 , \ldots , \hat{X}_n$ be an i.i.d. sample from $\PP_n$. The bootstrap empirical measure and process are given by $\hat{\PP}_n := n^{-1} \sum_{i \le n} \delta_{\hat{X}_i}$ and $\hat{\GG}_n := \sqrt{n} ( \hat{\PP}_n - \PP_n )$, respectively. Letting $M_{ni}$ be the number of times that $X_i$ is "redrawn" from the original sample, $\hat{\PP}_n$ and $\hat{\GG}_n$ can also be written as $\hat{\PP}_n = n^{-1} \sum_{i \le n} M_{ni} \delta_{X_i}$ and $\hat{\GG}_n = n^{-1/2} \sum_{i\le n} (M_{ni} - 1) \delta_{X_i} $, where $(M_{n1} , \ldots , M_{nn} ) $ has a multinomial distribution with $n$ cells, $n$ trials, and success probabilities $n^{-1}$ for each of the $n$ cells. Since the variables $M_{n1} , M_{n2} , \ldots $ converge in distribution to a sequence of i.i.d. Poisson variables with mean $1$, the limit theory for $\hat{\GG}_n$ is closely linked to the conditional multiplier central limit theorems developed in Chap. 2.9. The present proofs of the two main bootstrap limit theorems due to Gin\'e and Zinn (1990) rely on this connection. It is shown that (for the processes $\hat{\GG}_n$, $\GG_n$, and $\GG$ indexed by ${\cal F}$) $\hat{\GG}_n \to \GG$ "in outer probability" if and only if $\GG_n \to \GG$ and that $\hat{\GG}_n \to \GG$ "outer almost surely" if and only if $\GG_n \to \GG$ and and $P^* \|f - Pf\|_{\cal F}^2 < \infty$. Other bootstrap methods considered are based on exchangeable weights instead of the multinomial weights. One of the the most important basic tools of large sample theory in statistics is the so-called $\delta-$method and its functional version presented in Chap. 3.9, based on Hadamard-differentiable maps between metrizable topological vector spaces $\DD$ and $\EE$. The examples treated there include the Nelson - Aalen estimator from right-censored data, quantile and copula functions, the product integral, multivariate trimming, and $M-$functionals. When specialized to a Hadamard differentiable map $\phi : \DD \equiv l^{\infty}({\cal F}) \to \EE$ for a Donsker class ${\cal F}$, the functional $\delta-$method gives weak convergence of $\sqrt{n} ( \phi ( \PP_n) - \phi (P))$ to $\phi^{\prime}(\GG)$. In view of the bootstrap limit theorems mentioned before, the functional $\delta-$method yields also weak convergence of $\sqrt{n} ( \phi (\hat{\PP}_n ) - \phi (\PP_n))$ to $\phi^{\prime} (\GG)$ "in outer probability" or "outer almost surely". Thus the consistency of the bootstrap for many statistical functionals, which constitutes the most important motivation for the study of the abstract bootstrap process.

In summary, the goals of the authors cited above have been achieved with the present book in a most striking way, incorporating an immense amount of material from the journal literature up to 1995. Graduate students will find the work excellent, as will teachers preparing courses. I am convinced that this bok will become a milestone like Billingsley's (1968) book, in both probability and statistics and therefore should be looked at by a very wide range of readers.

References

Billingsley, P. (1968). Convergence of probability measures. Wiley, New York.

Chibisov, D. M. (1965). An investigation of the asymptotic power of the test of fit. Theory of Probability and Applications 10, 421 - 437.

Dudley, R. M. (1978). Central limit theorems for empirical measures. Ann. Probability 7, 909 - 911.