In this practicum we will investigate a dataset studied in Craigmile et al. [2006]. Here is their description of the dataset: “In recent years there has been a great deal of interest in monitoring nitrogen levels in the Chesapeake Bay, the largest estuary in the United States. Parts of six states make up the Chesapeake Bay watershed: Delaware, Maryland, New York, Pennsylvania, Virginia, and West Virginia, as well as the District of Columbia. Increased nitrogen levels increase algal growth. The blanketing effect of algal growth, along with the decomposition of these algae, in turn decreases the amount of dissolved oxygen in the water. As a consequence, increasing nitrogen levels pose a substantial threat to the wildlife and ecology of the Bay and the surrounding watershed. In a recent agreement between the Chesapeake Bay Program partners, every state is mandated to regulate the nitrogen levels (both from tributaries and from the air) in their part of the Bay (Chesapeake Bay Program Partnership, 2000). We consider a spatial dataset of 49 measurements of the surface-nitrogen concentration (in mg/l) taken in May 1995 throughout the Chesapeake Bay. This same dataset was used in De Oliveira and Ecker [2002] for hotspot detection.”

Load libraries

library(geoR)
library(downloader)
source("http://www.stat.washington.edu/peter/591/labs/qqplots.R")

Read in the data, storing as a data frame, and the boundary coordinates

download("http://www.stat.washington.edu/peter/PASI/ChesapeakeBay.zip", "CheasapeakeBay.zip", 
    mode = "wb")
unzip("CheasapeakeBay.zip", exdir = "./")
ches <- read.table("ChesapeakeBay/chesbay.txt", header = T)
names(ches)

boundary <- read.table("ChesapeakeBay/boundary.txt", header = TRUE)

Spatial exploratory data analysis

Plot the locations/values of the nitrogren samples.

plot(boundary, xlab = "Easting", ylab = "Northing", col = "gray40", type = "l", 
    asp = TRUE)


points(ches$easting, ches$northing)

## Show block 1 for later upscaling
rect(390, 4100, 410, 4150, border = "blue", lwd = 2)

## Show block 2
rect(390, 4150, 410, 4200, border = "blue", lwd = 2)

The set of spatial locations are shown in this figure. In environmental monitoring we are often interested in monitoring a quantity summarized over specific regions. Suppose we wish to predict the average log nitrogen concentrations in the two blue areas in the figure. Before we can do this wee need to carry out a full geostatistical analysis of the data.

1. Exploratory analysis

Carry out an exploratory spatial analysis on the log nitrogen values. Make sure that you investigate the following:

  1. Demonstrate that there are spatial trends. Predict the trends naively with ordinary least squares using the easting and/or the northing coordinates.

  2. After you detrend your data, produce a semivariogram for the residuals using the variog R function. Explain what spatial dependence you see.

  3. Is the spatial dependence isotropic? (hint: use the variog4 function). If the variogram is not isotropic, consider a transformation of the coordinate system to make the residuals closer to isotropy.

  4. Fit an exponential covariance function using weighted least squares (variofit) to thetransformed coordinate-system residuals.

Note that the exponential covariance structure is clearly an approximation, but will be sufficient for our analysis.

2. Estimating the spatial model parameters via likelihood methods

Suppose that tx are your transformed x coordinates, ty are your transformed y coordinates and trend.resids are your residuals. We can then create a geodata object for our residuals.

## Create a geodata object for the residuals,
Rmy.resids <- cbind(tx, ty, trend.resids)
Rnitro <- as.geodata(Rmy.resids, coords.col = 1:2, data.col = 3)

Now, assuming Gaussianity for the residuals (is this reasonable?), we estimate the parameters using maximum likelihood:

ml.fit <- likfit(Rnitro, cov.model = "exp", fix.nugget = FALSE, nugget = 1, 
    ini.cov.pars = c(1, 1))
summary(ml.fit)

We could also use restricted maximum likelihood:

reml.fit <- likfit(Rnitro, cov.model = "exp", fix.nugget = FALSE, lik.method = "REML", 
    nugget = 1, ini.cov.pars = c(1, 1))
summary(reml.fit)

Supposing that emp.var.final is your final empirical variogram, here is a comparison of the model fits.

plot(emp.var.final)
lines(ml.fit, lty = 1, col = "blue")
lines(reml.fit, lty = 2, col = "red")

How do the two model fits compare? These model fits are only for the residuals from the ordinary least squares fit to the spatial trend. We can fit the model to the original dataset. This is a bit more involved.

First we set up a new geo.data object

my.data <- cbind(tx, ty, log.nitro)
Onitro <- as.geodata(my.data, coords.col = 1:2, data.col = 3)

You need to fill in your trend model here after the tilde ~

your.trend.model <- ~ 

Then we estimate the covariance parameters using REML

trend.reml.fit <- likfit(Onitro, cov.model = "exp", fix.nugget = FALSE, trend = trend.spatial(your.trend.model), 
    lik.method = "REML", nugget = 1, ini.cov.pars = c(1, 1))
summary(trend.reml.fit)

Question 1: Write down the statistical model that you are fitting to the log surface nitrogen concentrations, and describe the model summaries.

3. Spatial prediction (Kriging)

In this section, we will demonstrate predictions of the residual field (without the trend component). We first need to set up a grid of locations that we will predict at.

px <- seq(360, 440, length = 30)
py <- seq(4080, 4390, length = 100)
pred.sites <- expand.grid(px, py)


# Show these prediction locations on a map.
par(mfrow = c(1, 1))
plot(boundary, xlab = "Easting", ylab = "Northing", col = "gray40", type = "l")
points(pred.sites, pch = 4, cex = 0.5)

Is there something strange about the choice of predicted locations? (How might you change the choice of predicted locations?) We need to make sure that we rescale these predicted locations, before we get geoR to produce our predictions (to match with the spatial corrdinate system in our model for covariances).

## Define your transformed x and y coodinates (fill in the blanks by
## transforming pred.sites[,1] and pred.sites[,2]).
ptx <- pty <- 
## To krige we have to remember to rescale the locations
predicted.sites <- cbind(ptx, pty)

After rescaling, we can calculate the ordinary kriging predictor and its standard error (SE) using the function krige.conv (conv stands for “conventional”).

## Let us start by predicting the residual field
krige <- krige.conv(Rnitro, loc = predicted.sites, krige = krige.control(cov.model = reml.fit$cov.model, 
    cov.pars = reml.fit$cov.pars, nugget = reml.fit$nugget))

Here is some code to explore these predictions graphically.

## Now produce some pretty pictures of the predictions.
par(mfrow = c(1, 2))

## Plot the axes
plot(boundary, xlab = "Easting", ylab = "Northing", type = "n", main = "Prediction")

## Show the predicted values, along with a contour map to indicate the values
image(px, py, matrix(krige$pred, length(px)), add = T)
contour(px, py, matrix(krige$pred, length(px)), add = T, nlevels = 10)

## Add the boundaries
lines(boundary, col = "gray40")

## And the locations of the nitrogren samples over the top.
points(ches$easting, ches$northing)

## Show the boundaries in gray.
plot(boundary, xlab = "Easting", ylab = "Northing", type = "n", main = "SE of Prediction")

## Show the SE of the predicted values, along with a contour map to indicate
## the values
image(px, py, matrix(sqrt(krige$krige.var), length(px)), add = T)
contour(px, py, matrix(krige$krige.var, length(px)), add = T, nlevels = 10)

## Add the boundaries
lines(boundary, col = "gray40")

## Plot the locations of the nitrogren samples over the top.
points(ches$easting, ches$northing)

Where do we predict that the residual log nitrogen process has higher values? Where do we predict it has lower values? What spatial patterns do you see in the uncertainty?

If you have time, investigate how you might include the trend component in your prediction.

### Define the trend for the predictions using the columns of predicted.sites
your.predicted.trend.model <- ~
## Kriging, with a trend
krige <- krige.conv(Onitro, loc = predicted.sites, krige = krige.control(type.krige = "OK", 
    trend.d = trend.spatial(your.trend.model), trend.l = trend.spatial(your.predicted.trend.model), 
    beta = trend.reml.fit$beta, cov.model = trend.reml.fit$cov.model, cov.pars = trend.reml.fit$cov.pars, 
    nugget = trend.reml.fit$nugget))

4. Predicting blocks–an example of change of support

We are now in a position to predict at a diff erent spatial scale–the block level.

Question 2. Try to predict the average log nitrogen level in the two blocks we defined in the beginning.

The coordinates are:

  1. 390–410 E, 4100–4150 N and
  2. 390–410 E, 4150–4200 N.

If you are pressed for time, consider only predicting the detrended log nitrogen level.

Produce standard errors for your predictions and try to estimate the covariance between the two blocks.

Question 3. How would you go about predicting the average nitrogen level in the blocks?

References

P. F. Craigmile, N. Cressie, T. J. Santner, and Y. Rao (2006) A loss function approach to identifying environmental exceedances. Extremes, 8(3):143–159.

V. De Oliveira and M. Ecker (2002) Bayesian hot spot detection in the presence of a spatial trend: application to total nitrogen concentration in Chesapeake Bay. Enivronmetrics, 13:85–101

P. J. Diggle and P. J. Ribeiro Jr (2007) Model Based Geostatistics. Springer, New York.

P. J. Ribeiro Jr and P. J. Diggle. geoR: a package for geostatistical analysis. R-NEWS, 1: 14–18, June 2001.