1. Introduction

In this lab we will look at an example of models for Spatio-temporal data. The lab is based on the model presented in Lindstrom et al. (2013) and the associated R-package (http://CRAN.R-project.org/package=SpatioTemporal).

Other R packages that handle spatio-temporal models and data are summarised in the relevant task view (http://CRAN.R-project.org/view=SpatioTemporal).

2. Model and Theory

We are interested in models of the form \[\begin{equation} y(s,t) = \mu(s,t) + \nu(s,t), \label{eqn:model_decomp} \end{equation}\]

where \(y(s,t)\) denotes the spatio-temporal observations, \(\mu(s,t)\) is the structured mean field, and \(\nu(s,t)\) is the space-time residual field.

The mean field is modelled as \[\begin{equation} \mu(s,t) = \sum_{l=1}^L \gamma_l M_l(s,t) + \sum_{i=1}^{m} \beta_i(s) f_i(t), \label{eqn:mean_model} \end{equation}\]

where the \(M_l(s,t)\) are spatio-temporal covariates; \(\gamma_l\) are coefficients for the spatio-temporal covariates; \(\{f_i(t)\}_{i=1}^m\) is a set of (smooth) temporal basis functions, with \(f_1(t)\equiv 1\); and the \(\beta_i(s)\) are spatially varying coefficients for the temporal functions.

The \(\beta_i(s)\)-coefficients in are treated as spatial fields with a universal kriging structure, allowing the temporal structure to vary between locations: \[\begin{equation} \beta_i(s) \sim N(X_i \alpha_i, \Sigma_{\beta_i}(\theta_i)) \ \quad \text{ for } i=1,\ldots,m , \label{eqn:beta_fields} \end{equation}\]

where \(X_i\) are \(n\,\times\,p_i\) design matrices, \(\alpha_i\) are \(p_i\,\times\,1\) matrices of regression coefficients, and \(\Sigma_{\beta_i}(\theta_i)\) are \(n\,\times\,n\) covariance matrices. The \(X_i\) matrices often contain geographical covariates and we denote this component a ``land use’’ regression (LUR). This structure allows for different covariates and covariance structures in the each of the \(\beta_i(s)\) fields; the fields are assumed to be a priori independent of each other.

The residual space-time field, \(\nu(s,t)\), is assumed to be independent in time with stationary, parametric spatial covariance \[\begin{equation} \nu(s,t) \sim N ( 0, \underbrace{\begin{bmatrix} \Sigma_\nu ^1(\theta_\nu) & 0 & 0 \\ 0 & \ddots & 0 \\ 0 & 0 & \Sigma_\nu^T(\theta_\nu) \end{bmatrix}}_{\Sigma_\nu(\theta_\nu)} ), \label{eqn:nu_fields} \end{equation}\] or \[\begin{align*} \nu(s,t) &\sim N(0, {\Sigma_\nu ^t(\theta_\nu)} ) \ \quad \text{ for }\ t=1,\ldots,T & &\text{and} & \nu(s_1,t_1) &\perp \nu(s_2,t_2),\ t_1 \neq t_2. \end{align*}\]

The temporal independence in \(\nu(s,t)\) is based on the assumption that the temporal basis functions, \(\{f_i(t)\}_{i=1}^m\) account for for the temporal correlation in data.

2.1 Smooth Temporal Functions

The objective of the smooth temporal basis functions, \(f_i(t)\), is to capture the temporal variability in the data. These functions can either be specified as {eterministic functions}, or obtained as {moothed singular vectors}. See Fuentes et al. (2006) for details..

To derive the \(m-1\) smoothed singular vectors (\(m-1\) since \(f_1(t)\equiv 1\)) we first construct the data matrix.
Note that the code switches the order of the space and time dimensions from the notation above, so that the first (row) index is time and the second (col) index represents sites, so that \(D\) is \(T \times N\). \[\begin{align} D(t,s) = \begin{cases} y(t,s), & \text{if the observation } y(t,s) \text{ exists}, \\ \text{NA}, & \text{otherwise,} \end{cases} \label{eqn:data_matrix} \end{align}\]

and fill in missing observations using the algorithm described by Fuentes et al. (2006):

  • Step 0. Center and scale each column (to mean zero, variance one) and compute the mean of all available observations for each time-point, \(u_1(t)\). Missing values in \(D(t,s)\) are then imputed using fitted values from a linear regression where each column of \(D(t,s)\) is regressed onto \(u_1\). For this step to be well defined the data matrix must have at least one observation in each row and column.

  • Step 1. Compute the SVD (singular value decomposition) of the new data matrix with the missing values imputed.

  • Step 2. Do regression of each column of the new data matrix on the first \(m-1\) orthogonal basis functions from Step 1. The missing values are then replaced by the fitted values of this regression.

  • Step 3. Repeat from Step 1 until convergence; convergence being measured by the change in the imputed values between iterations.

Having imputed the missing values in \(D(t,s)\) we then use splines to smooth the leading \(m-1\) singular vectors, i.e. the \(m-1\) first columns of \(U\) in the SVD: \(D = U S V^T\). Cross-validation can be used to determine the number of smooth temporal basis functions needed.

3. The data

As an example we’ll study \(NO_x\) data from Los Angeles, first we load the relevant libraries and the data

data(mesa.data.raw, package = "SpatioTemporal")

and study the available data:


The raw-data object contains a dataframe of observations (\(\log\) of \(NO_x\) concentrations)

options(digits = 4)

a dataframe of spatial covariates for each location


and a dataframe of spatio-temporal covariates (predictions of \(NO_x\) from a traffic model).

The first step is to collect the data into a -object, this matches columns in the observations matrix to the spatial covariates in

mesa.data <- createSTdata(mesa.data.raw$obs, mesa.data.raw$X, SpatioTemporal = list(lax.conc.1500 = mesa.data.raw$lax.conc.1500))

Given this object we can also study the times and locations at which observations where obtained.

plot(mesa.data, "loc")

Try to locate monitors that have moved, or have short sampling periods. Also note that we have two types of monitors; the FIXED monitors are only sampled during the second half of the period.

We can also study the locations of the monitoring stations in Los Angeles.%

plot(mesa.data$covars$long, mesa.data$covars$lat, pch = c(24, 25)[mesa.data$covars$type], 
    bg = c("red", "blue")[mesa.data$covars$type], xlab = "Longitude", ylab = "Latitude")
map("county", "california", col = "#FFFF0055", fill = TRUE, add = TRUE)
legend("bottomleft", c("AQS", "FIXED"), pch = c(24, 25), bty = "n", pt.bg = c("red", 

5. \(\beta\)–fields

Given smooth temporal trends we fit each of the times series of observations to the smooth trends and extract the regression coefficients

beta <- estimateBetaFields(mesa.data)

In the full spatio-temporal model these \(\beta\)–fields are modelled using geographic covariates. Selection of covariates is done by comparing these fields to the available covariates, e.g. for the intercept or “constant” term in the regressions, \(\beta_0\),

par(mfrow = c(2, 2), mgp = c(2.5, 0.75, 0), mar = c(4, 5, 2, 1))
plot(mesa.data$covars$long, beta$beta[, 1], ylab = "beta for constant \ntemporal basis", 
    xlab = "Longitude")
plot(mesa.data$covars$lat, beta$beta[, 1], ylab = "beta for constant \ntemporal basis", 
    xlab = "Latitude")
plot(mesa.data$covars$km.to.coast, beta$beta[, 1], ylab = "beta for constant \ntemporal basis", 
    xlab = "Distance to coast")
plot(mesa.data$covars$log10.m.to.a1, beta$beta[, 1], ylab = "beta for constant \ntemporal basis", 
    xlab = "Distance to major road")

However, linear model building or covariate selection (or dimension reduction by, for example, PLS) is outside the scope of the lab. For now we just look at the spatial distribution of the regression coefficients for the set of covariates provided. Here we look at the spatial distribution of the fitted regression coefficients for each of the fixed sites (AQS and MESA).

data <- cbind(mesa.data$covars[, c("long", "lat")], beta$beta)
Palette <- colorRampPalette(rev(brewer.pal(11, "Spectral")))
sc <- scale_color_gradientn(colours = Palette(100))
plot1 <- ggplot(data = data, aes(x = long, y = lat, colour = const)) + geom_point() + 
plot2 <- ggplot(data = data, aes(x = long, y = lat, colour = V1)) + geom_point() + 
plot3 <- ggplot(data = data, aes(x = long, y = lat, colour = V2)) + geom_point() + 
grid.arrange(plot1, plot2, plot3, nrow = 2)

We will keep these values so that we can (eventually) compare them to the results from the full model fitting rather than these simple multiple regressions.

6. Estimating the model

Study the available covariates,


and create a model with three covariates for the temporal intercept, one covariate for the two temporal trends, and a spatio-temporal covariate;

LUR <- list(~log10.m.to.a1 + s2000.pop.div.10000 + km.to.coast, ~km.to.coast, 
ST <- "lax.conc.1500"

exponential covariances for the \(\beta\) and \(\nu\)-fields, with different nugget in the \(\nu\)-field for the two types of monitors, “AQS” and “FIXED” (the latter being MESA study monitors).

cov.beta <- list(covf = "exp", nugget = FALSE)
cov.nu <- list(covf = "exp", nugget = ~type, random.effect = FALSE)

To understand the options for the covariance specifications, particulary the “random.effect” argument (even though we are not using it), see the help file for the function makeSigmaNu. We also specify which covariates to use as locations for our observations. Note that this object includes both rectangular coordinates, “x” and “y”, as well as longitude and latitude.

locations <- list(coords = c("x", "y"), long.lat = c("long", "lat"), others = "type")

Given these specifications, we create a model-object.

mesa.model <- createSTmodel(mesa.data, LUR = LUR, ST = ST, cov.beta = cov.beta, 
    cov.nu = cov.nu, locations = locations)

Given the model we setup initial values for the optimisation. Here we’re using two different starting points

dim <- loglikeSTdim(mesa.model)
x.init <- cbind(c(rep(2, dim$nparam.cov - 1), 0), c(rep(c(1, -3), dim$m + 1), 
    -3, 0))
rownames(x.init) <- loglikeSTnames(mesa.model, all = FALSE)

We are now ready to estimate the model. However this takes a rather long time. (It took over 4 min on my not-so-fast desktop iMac at home.)


start_time <- Sys.time()
# est.mesa.model <- estimate(mesa.model, x.init, hessian.all = TRUE)
end_time <- Sys.time()
print(end_time - start_time)

Instead we load the precomputed results.



6.1 Evaluating the results

Having estimated the model we studying the results, taking special note of the initial status message that indicates if the optimisation has converged.

options(digits = 3)

Plot the estimated parameters and approximate confidence intervals. Note which parameters have the smallest confidence intervals, any idea why?

par <- coef(est.mesa.model)
par(mfrow = c(1, 1), mar = c(13, 2.5, 0.5, 0.5))
plotCI(par$par, uiw = 1.96 * par$sd, ylab = "", xlab = "", xaxt = "n")
abline(h = 0, col = "grey")
axis(1, 1:dim(par)[1], rownames(par), las = 2)

Having estimated the model parameters we can now compute the conditional expectations of the observed locations and latent \(\beta\)-fields (This should take at most 1 minute)

EX <- predict(mesa.model, est.mesa.model, pred.var = TRUE)

The predictions can be used to extend the shorter time-series to predictions covering the entire period. To illustrate we plot predictions and observations for 4 different locations (note that ID can be given either as an index number or using the full name of each location).

par(mfrow = c(4, 1), mar = c(2.5, 2.5, 2, 0.5))
plot(EX, ID = 1, STmodel = mesa.model, pred.var = TRUE)
plot(EX, ID = 10, STmodel = mesa.model, pred.var = TRUE)
plot(EX, ID = 17, STmodel = mesa.model, pred.var = TRUE)
plot(EX, ID = "L002", STmodel = mesa.model, pred.var = TRUE)

Alternatively we can also study the predictions due to different parts of the model. Look at the help files for the functions plot.predictSTmodel and predict.STmodel to learn what the arguments pred.type refer to.

par(mfrow = c(2, 1), mar = c(2.5, 2.5, 2, 0.5))
plot(EX, ID = 10, STmodel = mesa.model, pred.var = TRUE, lwd = 2)
plot(EX, ID = 10, pred.type = "EX.mu", col = "green", add = TRUE, lwd = 2)
plot(EX, ID = 10, pred.type = "EX.mu.beta", col = "blue", add = TRUE, lwd = 2)
plot(EX, ID = 17, STmodel = mesa.model, pred.var = TRUE, lwd = 2)
plot(EX, ID = 17, pred.type = "EX.mu", col = "green", add = TRUE, lwd = 2)
plot(EX, ID = 17, pred.type = "EX.mu.beta", col = "blue", add = TRUE, lwd = 2)

e.g. just due to the linear regression (mean value part) for the \(\beta\)-fields, the universall kriging for the \(\beta\)-fields, or the full model including the \(\nu\)-fields.

Question 1: The obsevations are in red. Explain in terms of the model notation what the green, blue, and black curves represent.

We also compare the \(\beta\)-fields obtained from the full model with those previously computed by individually fitting each times series of observations to the smooth trends.

par(mfcol = c(2, 2), mar = c(4.5, 4.5, 2, 0.5))
for (i in 1:3) {
    plotCI(x = beta$beta[, i], y = EX$beta$EX[, i], pch = NA, uiw = 1.96 * sqrt(EX$beta$VX[, 
        i]), main = colnames(EX$beta$EX)[i], xlab = "Empirical estimate", ylab = "Spatio-Temporal Model")
    plotCI(x = beta$beta[, i], y = EX$beta$EX[, i], pch = NA, uiw = 1.96 * beta$beta.sd[, 
        i], err = "x", add = TRUE)
    points(beta$beta[, i], EX$beta$EX[, i], pch = 19, cex = 1, col = "red")
    abline(0, 1)

Question 2: Comment on these plots. What are the notable differences between the two different sets of regression parameter estimates?

7. Cross-validation — If you have time

A cross-validation (CV) study is a simple but good way of evaluating model performance. First we define 10 CV groups, and study the number of observations in each group

Ind.cv <- createCV(mesa.model, groups = 10, min.dist = 0.1)

And illustrate the location of sites that belong to the same CV groups

I.col <- sapply(split(mesa.model$obs$ID, Ind.cv), unique)
I.col <- apply(sapply(I.col, function(x) mesa.model$locations$ID %in% x), 1, 
    function(x) if (sum(x) == 1) which(x) else 0)
plot(mesa.model$locations$long, mesa.model$locations$lat, pch = 23 + floor(I.col/max(I.col) + 
    0.5), bg = I.col)
map("county", "california", add = TRUE)

Here sites that share the same symbol and colour belong to the same CV group.

The CV functions, and , will leave out observations marked by the current CV-groups number in the vector . For the first CV-groupd only observations such that are used for parameter estimation, predictions are then done for the observations with given observations in and the estimated parameters.

Estimated parameters and predictions for the 10-fold CV are obtained using the following. However, The cross-validation computations take a really long time (over 29 minutes on my not-so-fast iMac at home).


start_time <- Sys.time()
est.cv.mesa <- estimateCV(mesa.model, x.init, Ind.cv)  #pred.cv.mesa <- predictCV(mesa.model, est.cv.mesa, LTA = TRUE)
end_time <- Sys.time()
print(end_time - start_time)

We load the precomputed results instead.



7.1 Evaluating the results

First we examine the parmeter estimates.


Noting that the estimates for all 10 CV-groups have converged. We then compare the parameter estimates with those obtained when using all the data to fit the model.

par(mfrow = c(1, 1), mar = c(13, 2.5, 0.5, 0.5), las = 2)
boxplot(est.cv.mesa, plot.type = "all")
points(coef(est.mesa.model)$par, col = 2, pch = 19)

To assess the models predictive ability we plot a couple of predicted timeseries (with \(95\%\) confidence intervals), and the left out observations (in red).

par(mfcol = c(4, 1), mar = c(2.5, 2.5, 2, 0.5))
plot(pred.cv.mesa, ID = 1)
plot(pred.cv.mesa, ID = 5)
plot(pred.cv.mesa, ID = 13)
plot(pred.cv.mesa, ID = 18)

or investigate how much each part of the model contributes to the predictions

par(mfrow = c(1, 1), mar = c(4, 4, 3, 1))
plot(pred.cv.mesa, ID = "60371601", xlab = "", ylab = "NOx (log ppb)", main = "Predictions for 60371601", 
    lty = c(1, NA), lwd = 2, pch = c(NA, 19), cex = 0.75)
plot(pred.cv.mesa, ID = "60371601", pred.type = "EX.mu", lty = 4, lwd = 2, col = "blue", 
    add = TRUE)
plot(pred.cv.mesa, ID = "60371601", pred.type = "EX.mu.beta", lty = 2, lwd = 2, 
    col = "green", add = TRUE)
legend("topright", c("Observations", "Predictions", "Contribution from beta", 
    "Contribution from mean", "95% CI"), bty = "n", lty = c(NA, 1, 2, 4, NA), 
    lwd = c(NA, 2, 2, 2, NA), pch = c(19, NA, NA, NA, 15), pt.cex = c(0.75, 
        NA, NA, NA, 2.5), col = c("red", "black", "green", "blue", "grey"))

We can also compute the root mean squared error, \(R^2\), and coverage of \(95\%\) confidence intervals for the predictions.


Another option is to do a scatter plot of the left out data against the predicted (points colour-coded by site)

par(mfcol = c(1, 1), mar = c(4.5, 4.5, 2, 0.5))
plot(pred.cv.mesa, "obs", ID = "all", pch = c(19, NA), cex = 0.25, lty = c(NA, 
    2), col = c("ID", "black", "grey"), xlab = "Observations", ylab = "Predictions", 
    main = "Cross-validation NOx (log ppb)")
abline(0, 1)

Question 3: This plot suggest that some sites have notably bad fits (e.g., the green dots). Of course, some site must be “worst”. Feel free to examine the fits at all the sites using commands like \(plot(pred.cv.mesa, ID=1)\), as used above. Identify on a map where one (or two or three) of these sites are. Can you explain why these sites have poor fits?

Question 4 (Challenge for extra credit!): Question 1 asked you to explain the different colored lines, which allows you to comment on the contribution of the different parts of the model to the overall prediction. There are universal kriging spatial predictions involved for each of the \(\beta\) fields and simple kriging predictions for the spatio-temporal residual field \(\nu\). We have not generated any graphical or diagnostic evaluation of the fitted spatial covariances underlying these krigings. The model has provided estimates of the spatial covariance parameters for these fields (in \(est.mesa.model\$res.best\$par.cov\)).

4(a) For the case of the coefficient of the constant term, \(beta_0(s)\), compute and plot an empirical variogram and the fitted exponential model. Hint: consider residuals derived from the estimates in the object \(EX\$beta\$mu[,1]\) and \(EX\$beta\$EX[,1]\) from the fitting in Section 6. It might be easiest to do this using geoR.

4(b) For the residual field \(nu(s,t)\), a plot showing empirical correlations vs fitted ones. (Covariances are complicated by the fact that there are different nuggets for the AQS and FIXED (MESA) monitors) Hint: consider a spatial covariance matrix computed from the residuals of the observed data matrix \(D\) and the fit of the mean model in \(EX\$EX.mu\). Any comments?


Fuentes, M., Guttorp, P., and Sampson, P. D. (2006), Using transforms to analyze space-time processes, in Statistical Methods for Spatio-Temporal Systems, eds. Finkenstädt, B., Held, L., and Isham, V., Chapman & Hall/CRC, pp. 77–150.

Lindström, J., Szpiro, A. A., Sampson, P. D., Oron, A., Richards, M., Larson, T., and Sheppard, L. (2013), A Flexible Spatio-Temporal Model for Air Pollution with Spatial and Spatio-Temporal Covariates, Environ. Ecol. Stat., 1352-8505, 1–23.