Housing-DataAnalysis.r

visualize.tps.warps.R

deformloglik.l1.R

draw2.R

pluieNA.novdec.csv

pluieStations.Practicum.csv

part1.R

precip.RData

RandomFields.R

RandomFieldsSolutions.R

names.csv

distances.csv

t-max.csv

Wind

windGustNetherlands.RData

netherlands.RData

oneSimFinalModel.RData

finalFit.RData

modelSelection.R

predictionSpatialGEV.R

spatialGEV.R

simulation.R

pairwiseLlik.R

leastSquares.R

SpatialTrends2.R

SpatialTrends.R

fmadogram.R

There are 8 covariates:

Column 2: sqft = area in 100 square feet (roughly 9.3 square meters)

Column 3: age = age of house,

Column 4: bedrooms = number of bedrooms

Column 5: vacant_lot = lot vacant when sold (1 if true)

Column 6: arge_lot = large lot (1 if true)

Column 7: dist_freeway = distance to Interstate freeway in miles

Column 8: lat = latitude

Column 9: long = longitude

Suggested mean values (obtained from ols regression of log price on the covariates) are in column 10.

The locations for which covariance predictions are to be made have NA in columns 1 and 10.

1. First, to eliminate most of the issues of small-scale temporal correlation, the data have been averaged to a 2-week time scale, which happens to be the time scale for most of the analyses being conducted by the MESA Air study at the University of Washington. (The main reason for this time scale in the MESA Air study is that supplemental monitoring data, not being provided here, are from instruments that are sampling 2-week time periods.) The two-week averages are then log-transformed (you could argue whether this is the best choice for these data, but it works well enough and it is what MESA Air investigators have been using).

2. The log 2-week average time series at each monitoring site were detrended using an empirical smooth SVD approach described in a couple of papers, including the two listed here. See attached figures for the nature of the detrending.

a. Guttorp, P., Fuentes, M., Sampson, P.D. (2006). Using transforms to analyze space-time processes. in: Statistical Methods for Spatio-Temporal Systems, B. Finkenstadt, L. Held, V. Isham, Eds., CRC/Chapman and Hall, pp 77-150.

b. Sampson PD, Szpiro AA, Sheppard L, Lindstrom J, and Kaufman JD. (2011) Pragmatic estimation of a spatio-temporal air quality model with irregular monitoring data. Atmospheric Environment, 45, 6593-6606.

3. The detrended, mean zero time series (residuals from the fitted smooth trends) are the basis for the computation of a 52 x 52 covariance matrix.

4. An empirical covariance matrix was using the EM algorithm to deal with missing data.

Notes:

10 of the 52 monitoring sites have been reserved for validation purposes. The figure shows the location of the sites for analysis/modeling (red) and the sites for validation (blue). The aim of the modeling is to predict covariances among the validation sites and between the validation and analysis sites. Validation sites were selected by repeatedly sampling 10 sites at random until I observed a configuration with sites covering most of the geographic span of the 52 sites, but not including extreme spatial sites.

These data have not been analyzed for nonstationary covariance structure in the MESA Air study. The sites provided here cover a larger region of southern California than is the focus (around L.A.) for MESA Air.

There are missing data, as in almost all environmental monitoring datasets, but we selected sites that were mostly complete, having no more that 110 missing observations out of 339 2-week averages for these 13 years. A few sites were eliminated for having batches of highly suspicious observations that inflated variances and deflated spatial correlations.

An empirical covariance matrix was computed by the EM algorithm to deal with missing data.

A separate dataset will be provided for consideration of possible nonstationary spatial covariance for the spatial (not spatio-temporal) dataset of long-term mean concentrations. Geographic covariates will be provided for specification of a mean model.

The current data for analysis are provided in the R workspace PASI.NO2.anal.RData. It contains the following objects. Dates and monitoring site ids are provided in the dimnames of these objects. `

Here are 200 independent reps of this simulation:

simulated_grid_repsWM.txt

simulated_irreg_repsWM.txt