This chapter describes methods for exploring a new spatial dataset. The application of exploratory data analysis (EDA) methods can help the analyst to arrive at some early understanding of the part of the variability in a dataset that it might be possible to account for. The reader is assumed to be familiar with the basic principles behind, and techniques of, EDA so that here concern focuses on specialist EDA techniques that are relevant to exploring the properties of small area spatial data and which support spatial data modelling. The chapter discusses techniques for univariate small area spatial data starting with the role of mapping followed by techniques for checking for spatial trends, spatial heterogeneity in the mean, global spatial dependence and finally for heterogeneity in the spatial dependence structure and for detecting event clusters. Exploratory techniques for examining relationships between two or more variables are discussed. An association between two variables need not be constrained to the same area nor be constant across the study region. Tests for overdispersion and zero-inflation in small area count data are also included.
Estimating a large set of parameters reliably is often the goal of many small area analyses. This chapter and the next (Chapter 8) presents a family of Bayesian hierarchical models defined at the unit level where outcome values are reported at the individual (household) level. Bayesian hierarchical models encompass all the area-specific parameters within a common prior distribution. This hierarchical structure on the parameters gives rise to a distinctive feature known as information borrowing – the ability to borrow (or share) information across areas when estimating the area-specific parameters. This feature allows estimation of a large set of parameters in a way that addresses the issues arising from heterogeneity and data sparsity. We start by introducing the Newcastle household-level income data, the illustrative example that will run through both this chapter and the next, and describe four strategies for modelling the area-specific parameters. Two non-hierarchical models are first constructed and applied to the income data in order to illustrate their shortcomings (strategies 1 and 2). Then a Bayesian hierarchical model with the so-called “exchangeable” structure on the area-specific parameters – a modelling structure that allows information to be shared globally within the study region – is presented (strategy 3).
Following on from Chapter 7, this chapter presents a number of Bayesian spatial hierarchical models for modelling a set of small area parameters based on the idea of local information borrowing (strategy 4). Local information borrowing is based on the property of spatial dependence. Data values close together in geographical space tend to be more alike than data values that are further apart in geographical space. Imposing this dependence property of data on parameters helps further strengthen and improve parameter estimation. To implement the process of local information borrowing, models incorporating spatial dependency need to be constructed. Various spatial models for localized information sharing are presented, all of them involving some form of the conditional autoregressive (CAR) modelling structure. The intrinsic conditional autoregressive (ICAR) and the proper CAR (pCAR) models are described. Locally adaptive spatial smoothing models which allow the elements in the spatial weights matrix to be estimated using data are described as is the Besag-York-Mollié (BYM) model, which combines an exchangeable model (strategy 3) with the ICAR model, so that borrowing information is carried out both globally and locally. Using the Newcastle household-level income data, this chapter provides insights into the application of these different modelling options.
This chapter presents four applications of the Bayesian hierarchical modelling approach that tackle a range of substantive problems at the area level in the social and public health sciences. In the process we demonstrate how, within the Bayesian approach to inference, certain statistical challenges arising from the modelling of spatial data can be addressed. In the first application, the aim is to identify the covariates that explain why some areas of a city are classified as high intensity crime areas (HIAs) whilst others are not. In the second, the aim is to assess the relationship between exposure to nitrogen oxide and stroke mortality at the small area level. The third application is an analysis of small area counts of new cases of malaria in a small region of India. The fourth application aims to model the spatial variation, at the small area scale, in the reported cases of violent sexual assault in Stockholm. Each case study presents certain statistical challenges which is the reason for their inclusion. These challenges include: handling missing data, dealing with incompatible spatial units, handling overdispersion and zero inflation when modelling small area count data, dealing with spatially autocorrelated missing covariates, allowing for spatial heterogeneity in model parameters and providing reliable small area estimates.
Spatial econometric models are a suite of likelihood-based models that adapt the standard normal linear regression model in order to address two of the fundamental challenges associated with spatial data, namely spatial dependence and spatial heterogeneity, and their implications for model specification, parameter estimation and hypothesis testing. Spatial econometric models pay particular attention to assessing spatial spillover and associated spatial feedback effects. The chapter describes the spatial lag model (SLM), the spatially-lagged covariates model (SLX), the spatial error model (SEM) and the spatial Durbin model (SDM). An application demonstrates the form(s) of spatial spillover each model captures and discusses issues associated with the interpretation of the covariate effects on outcomes distinguishing between direct and indirect (or exogenous) effects of a covariate on the outcome. Computational issues that arise from fitting some of the spatial econometric models to observed data are described. Finally we compare this group of spatial econometric models with the hierarchical models discussed in Chapters 7, 8 and 9.
Two applications of spatial econometric modelling are presented and discussed. The first application aims to evaluate evidence of spatial spillover effects in voting outcomes aggregated to the local authority district level. The second application tests for price competition effects between individual petrol retail outlets in a large city. In both examples interest focuses on estimating interaction effects between places thus lending themselves to the spatial econometric approach in which models are expressed as a series of N simultaneous equations, one equation for each spatial unit. Both applications give rise to the problem of endogeneity. In both applications, the outcome variable is assumed to be normally distributed. However, the outcome data in both applications show features that may not satisfy the normality assumption and alternative approaches are discussed. For each application, the background to the problem is described, then the data followed by modelling issues and an exploratory analysis of the data. The results from the modelling are followed by a summary of some of the key statistical findings.