Data from contaminated sites often
contain some large positive outliers. These data are usually verified by
repeated chemical analysis. They may have a large effect on the arithmetic
mean and hence estimates of the total contaminant load. Land use
planners are often concerned with this total load, especially if the ground
water is threatened by the contaminant. Such an estimate should be
unbiased. This paper considers methods of estimating confidence intervals
in the presence of a large positive outlier, and is illustrated using copper
levels from a contaminated site near Adelaide, South Australia.
The actual data consist of 76 samples,
and had a mean of 556 but a median of only 115. The highest value
was 18000 mg kg-1. This outlier affects not only the mean and the
variance, but also skewness and kurtosis. In our sample, G2 is Fisher's
measure of kurtosis, had a value of 61. Use of the t distribution in this
case is therefore invalid. Typically robust estimates of central tendency
do not give an unbiased estimate of the sample mean.
We note that a simple bootstrap
technique is dominated by whether the bootstrap contains 1, 2, 3 or more
occurrences of the outlier. The bootstrap distribution is thus multimodal
and can be disjoint. This disjoint distribution means that a small
change in the quantiles (eg a change from 0.97 to 0.98) could lead to a
large change in the resultant confidence limit. We therefore explore
a method of data sharpening followed by a bootstrap technique to obtain
confidence intervals.
The sharpening is achieved by minimising
a penalty function that increases slowly with the distance of the observations
from the mean, but constrained so that the arithmetic mean is unaffected
and that the variance is reduced to some pre-assigned fraction (to say
a half of the original variance). Confidence intervals can then be
obtained on the sharpened values using a bootstrap technique, and those
confidence intervals can be back-transformed to the original scale.
RAYMOND CORRELL
Mathematical & Information Sciences,
CSIRO
PMB#2 Glen Osmond
Adelaide 5064, Australia
ray.correll@cmis.csiro.au
A Statistical Test for the Evaluation
of Signal Species
as Indicators of Key Biotopes
for Endangered Species
Anders Grimvall, Stig Danielsson,
Markus Malm, and Stefan Stark
Forests and other widespread ecosystems are believed to contain specific biotopes that play a crucial role in the protection of endangered species. Hence, it is of great interest to develop procedures that enable identification of such key biotopes. One of the strategies that has been adopted is based on inventories of so-called signal species, i.e. species which are easy to detect and indicate favourable environments for species that are red-listed because they are rare or endangered. This article describes how the results of an inventory of both signal species and red-listed species in a number of biotopes can be used to assess whether the presence of red-listed species is linked to specific patterns in the presence of signal species. The data set observed in such an inventory may be regarded as outcomes of binary variables, one for each combination of species and biotope. Furthermore, the prediction problem addressed is a matter of selecting a suitable subset of signal species and a suitable binary function of the variables representing presence/absence of selected signal species. We propose a procedure in which the class of permissible predictors is comprised of all binary variables that can be expressed as increasing functions of one, two or three other binary variables. Furthermore, we propose that a permutation test is employed to test whether the fit of predicted values to observed values is statistically significant. A case study of mosses and lichens in forest ecosystems in Sweden is used to illustrate the methods proposed.
ANDERS GRIMVALL
Department of Mathematics
Linkoping University
Linkoping 58183, Sweden
angri@mai.liu.se
Estimation of Soil ingestion Via
Semiparametric Bayes Methods
John V. Tsimikas and Edward J.
Stanek
Exposure to soil is assessed based
on mass-balance soil ingestion studies. In these studies a sample of individuals
are followed over a period of time and their daily soil ingestion is estimated
based on measurements on trace element intake from food and trace element
output observed in fecal samples. Many trace elements may be used
in one study. Given knowledge of the transit time between intake
and output of the trace element one can reliably estimate the daily amounts
of soil ingested by individuals in the study.
Linear random coefficient models
provide a natural framework for the analysis of such data, the crucial
parameter to be estimated being the upper 5th or 10th quantile of the distribution
of subject-specific soil ingestion over a fixed period of time. An estimate
easily arises if one assumes normality of the random subject specific effects
in the model. A Semiparametric alternative to the standard linear random
effects model is the Bayesian nonparametric hierarchical model involving
Dirichlet priors or Dirichlet Process Mixtures (Escobar and West, 1995,
1998; Ibrahim and Keinman 1998).
We apply and extend
these methods to the estimation of subject and population exposure
parameters based on short multiple time series of trace element excretion.
We discuss how these methods yield more reliable estimates of soil ingestion
exposure distributions which serve as the foundation for many environmental
risk assessments.
JOHN V. TSIMIKAS
Department of Mathematics and Statistics
University of Massachusetts at Amherst
1442 LGRT
Amherst, MA 01003, USA
tsimikas@math.umass.edu