An adaptive spatio-temporal smoothing model for estimating trends and step changes in disease risk

Statistical models used to estimate the spatio-temporal pattern in disease risk from areal unit data represent the risk surface for each time period with known covariates and a set of spatially smooth random effects. The latter act as a proxy for unmeasured spatial confounding, whose spatial structure is often characterised by a spatially smooth evolution between some pairs of adjacent areal units while other pairs exhibit large step changes. This spatial heterogeneity is not consistent with existing global smoothing models, in which partial correlation exists between all pairs of adjacent spatial random effects. Therefore we propose a novel space-time disease model with an adaptive spatial smoothing specification that can identify step changes. The model is motivated by a new study of respiratory and circulatory disease risk across the set of Local Authorities in England, and is rigorously tested by simulation to assess its efficacy. Results from the England study show that the two diseases have similar spatial patterns in risk, and exhibit a number of common step changes in the unmeasured component of risk between neighbouring local authorities.


Introduction
Disease risk exhibits spatio-temporal variation due to many factors, including changing levels of environmental exposures and differences in the prevalence of risk- †Email address for correspondence: alastair.rushworth@glasgow.ac.uk arXiv:1411.0924v1 [stat.AP] 4 Nov 2014 inducing behaviours such as smoking. Data about disease risk are typically obtained in the form of population level summaries for administrative geographical units, such as local authorities or counties, and the spatial pattern in risk is presented in the form of a choropleth map. Such disease maps enable public health scientists and epidemiologists to quantify the spatial pattern in disease risk across a region of study, allowing financial resources and public health interventions to be targeted at areas at highest risk. Disease maps are routinely published by health agencies worldwide, such as the cancer e-Atlas (http://www.ncin.org.uk/cancer_ information_tools/eatlas/) by Public Health England and the weekly influenza maps (http://www.cdc.gov/flu/weekly/usmap.htm) produced by the Centres for Disease Control and Prevention in the USA. In addition to their use in allocating health service resources, such maps allow the scale of health inequalities between rich and poor communities and their underlying drivers to be quantified. For example, a 2014 report by the Office for National Statistics (ONS) in the UK estimates that average healthy life expectancy differs by nineteen years between communities with the highest and the lowest levels of deprivation (Office for National Statistics, 2014). Such large inequalities exacerbate socioeconomic divisions in society, and health costs may be increased due to higher disease prevalence in the most disadvantaged regions.
The disease maps presented by health agencies display raw disease rates, which are contaminated by sampling variation and do not allow statements to be made about the probability that the risk in an area exceeds a certain threshold (exceedence probabilities, see Richardson et al., 2004). Therefore a range of statistical models have been developed for these disease data, which represent the risk surface with known covariates and a set of spatially smooth random effects. The latter act as a proxy for capturing unmeasured spatial confounding, where spatial structure is induced by using a special case of a Gaussian Markov Random field (GMRF, Rue and Held, 2005) prior, known as the Conditional Autoregressive (CAR, Besag et al., 1991) prior. For data that contain repeated spatial measurements over time, spatial GMRF priors have been extended to incorporate spatio-temporal structure, and prominent examples include Bernardinelli et al. (1995), Knorr-Held (2000), MacNab and Dean (2001) and Ugarte et al. (2010).
These GMRF-based models assume the random effects are globally spatially smooth, in the sense that a single parameter governs the spatial autocorrelation in disease risk between all pairs of geographically adjacent areal units. In practice however, this residual or unexplained spatial structure is often characterised by a spatially smooth evolution between some pairs of adjacent areal units, while other pairs exhibit large step changes. The identification of such step changes in the unexplained component of risk is known as Wombling following the seminal article by Womble (1951), and can provide a number of epidemiological insights. Firstly, it allows the delineation of clusters of areal units that exhibit unexplained elevated risks compared with neighbouring areas, which enables health resources and public health interventions to be specifically targeted at areas in greatest need. Secondly, it enables the elucidation of unknown etiological factors: by providing detailed insight into the spatial structure of confounding, potential risk factors can be more easily identified that could be driving unexplained risk. Existing global smoothing models do not support step change detection as part of model fitting, which may result in oversmoothing in regions where strong local disparities exist, leading to biased estimation of their associated disease risks. This problem is analogous to specifying a single smoothing parameter to estimate a non-linear function using semi-parametric regression, when the underlying signal exhibits varying levels of smoothness. A range of spatially adaptive smoothing priors have been proposed to address these limitations for purely spatial data, including Green and Richardson (2002), Lu and Carlin (2005), Lu et al. (2007), Lawson et al. (2012), Lee and Mitchell (2013), Wakefield and Kim (2013) and .
However, very few spatially adaptive smoothing models have been developed for spatio-temporal disease data, with an exception being Lee and Mitchell (2014) who propose an iterative fitting algorithm using Integrated Nested Laplace Approximations (INLA). A spatio-temporal setting has the advantage of temporal replication of the spatial surface, which is likely to improve the estimation in such highly complex models. However, as the temporal replication increases so does the computational complexity, due to the increased numbers of data points and parameters. Therefore the contribution of this paper is the development of a new spatially adaptive GMRF model for spatio-temporal disease mapping data, which can be viewed as both an adaptive smoother and a model for the detection of step changes in unexplained risk. The model builds on the purely spatial approach of Ma et al. (2010), and does not make any simplifying assumptions about the step change structure unlike . Additionally, unlike existing methods in this field, our model is freely available to others via the R package CARBayesST, making this research reproducible. The methodological development is motivated by a new study of respiratory and circulatory disease in England, UK, which according to the World Health Organisation (WHO) are two of the largest causes of death worldwide (www.who.int/mediacentre/factsheets/fs310/en/). The remainder of this paper is structured as follows. In Section 2 the motivating data set of respiratory and circulatory hospital admissions in England between 2001 and 2010 is presented, while in Section 3 the literature on spatio-temporal disease mapping and adaptive spatial smoothing is reviewed. Section 4 proposes a new space-time GMRF model for adaptive smoothing, which is comprehensively tested by simulation in Section 5. In Section 6 the proposed model is applied to the motivating application, while the paper concludes in Section 7 with a discussion of the results and suggestions for future research.

Motivating case study
Our methodological development is motivated by a new study of circulatory and respiratory disease risk in England, which have International Classification of Disease tenth revision (ICD-10) codes I00-I99 and J00-J99 respectively. Hospital admission records from the Health and Social Care Information Centre (www.hscic.gov.uk) were analysed at the UK Met Office to provide counts of hospital admissions by local authority, where the primary diagnosis was circulatory or respiratory disease and where the method of admission was as an emergency. The resulting data are annual counts of circulatory and respiratory hospital admissions for each of the N = 323 Local Authorities (LA) in England between 2001 and 2010. The expected number of hospital admissions was calculated for each year and LA to adjust for their differing population sizes and demographic structures, and internal standardisation was used based on England-wide rates. Monte Carlo permutation tests are all significant at the 5% level. However, Figure 1 also highlights that these unexplained spatial structures exhibit step changes, which are not compatible with a global spatial smoothing model. Therefore the aims of this analysis are twofold. Firstly, we want to produce the best estimate of the spatio-temporal patterns in circulatory and respiratory disease risks, so that the extent of the health inequalities in these two diseases can be identified.
Secondly, we wish to estimate the locations of the step changes in the unexplained risk surface, so that the geographical extent of clusters of excessively high unexplained risks regions can be identified and investigated for possible causes. These goals are likely to be best achieved by an adaptive smoothing model such as that proposed in Section 4, but before we present our model we provide a review of the existing literature.

Spatio-temporal disease mapping
The study region is typically composed of N non-overlapping areal units indexed by i ∈ {1, . . . , N }, for which data are observed for j ∈ {1, . . . , T } time periods.
These data comprise the observed and expected numbers of disease cases, and for the population living in area i during time period j are denoted by (Y ij , E ij ) respectively.
Here, disease risk is represented by R ij , where R ij = 1.2 corresponds to a 20% increased risk of disease compared with the expected number of disease cases (based on national disease rates) E ij . The natural log of R ij is modelled by a vector of p known covariates x ij with parameters β = (β 1 , . . . , β p ), and spatially and temporally structured random effects φ ij . A Bayesian approach to estimation is taken in these hierarchical models, based on Markov chain Monte Carlo (McMC) simulation.
GMRF priors are commonly used to induce spatial smoothness in the random effects (φ ij , φ kj ), via a binary N × N adjacency matrix W . Element w ik = 1 if areas i and k share a common border (denoted i ∼ k) and w ik = 0 otherwise (denoted i k), while w ii = 0 for all i. Numerous GMRF priors have been developed for purely spatial random effects (φ 1 , . . . , φ N ), and the proposal by Leroux et al. (2000) has an attractive full conditional decomposition for f (φ i |φ −i ), given by Here the conditional expectation of φ i is a weighted average of the random effects in adjacent areal units, which spatially smooths their values. The level of spatial smoothing is controlled by ρ, and if ρ = 1 equation (2) reduces to the intrinsic autoregressive model proposed by Besag et al. (1991), while if ρ = 0 the random effects have identical and independent normal prior distributions. The joint distribution for (φ 1 , . . . , φ N ) corresponding to these full conditionals is a zero-mean multivariate Gaussian distribution, with variance σ 2 vector of ones and I is the N × N identity matrix.

Non-adaptive spatio-temporal models
Many spatio-temporal extensions of spatial GMRF models have been developed in the disease mapping literature, with the first being Bernardinelli et al. (1995) who model R ij with linear time-trends that have spatially varying slopes and intercepts.
A further generalisation was proposed by MacNab and Dean (2001), in which the time index is projected onto a set of spline basis functions, allowing spatially-varying non-linear temporal trends. In contrast, Knorr-Held (2000) introduced a decomposition of R ij into spatial and temporal main effects and an interaction, with all terms being modelled by GMRF priors. More recently, Rushworth et al. (2014) utilise the autoregressive decomposition where φ j = (φ 1j , . . . , φ N j ). They combine the likelihood model (1) with Leroux CAR priors for each φ j , with φ 1 being modelled by (2), while temporal autocor- Temporal autocorrelation is controlled via α, with α = 0 corresponding to temporal independence while α = 1 corresponds to a multivariate random walk prior. A uniform prior on the unit interval is placed on α, while the priors outlined in (2) are placed on the remaining hyperparameters.

Adaptive spatial smoothing models
The global nature of the spatial autocorrelation induced by (2) for purely spatial random effects (φ 1 , . . . , φ N ) can be seen from their theoretical partial autocorrelations, which are given by Under model (2) and the spatio-temporal extension (3), random effects for all pairs of neighbouring areal units (for which w ik = 1) will be partially autocorrelated, and the strength of that partial autocorrelation will be controlled by ρ. Thus as ρ will typically be close to one (the spatial residual surfaces are autocorrelated as described in Section 2), a pair of adjacent areas exhibiting substantially different levels of unexplained risk will have those risks wrongly smoothed towards each other, masking the step change to be identified.
This has prompted the development of spatially adaptive smoothing models, which are flexible enough to capture both smoothness and step changes in the random effects surface. However, almost all of these models have been developed for purely spatial data, and the extension to the spatio-temporal domain is one of the contributions of this paper. Brewer and Nolan (2007)  (2014) treat the non-zero elements of the adjacency matrix W as random variables, rather fixing them equal to one. Equation (4) then implies that spatially adjacent random effects (φ i , φ k ) can be partially autocorrelated or conditionally independent, depending on the estimated value of w ik . It is this latter approach that we extend to the spatio-temporal domain in this paper, and we treat w ik as random variables on the unit interval in common with Brezger et al. (2007) but utilise a second stage CAR prior in common with Ma et al. (2010) to achieve the adaptive smoothing.

Methodology
Here we present a novel spatially adaptive smoothing model for spatio-temporal data, which allows step changes to occur between adjacent areal units in the unexplained component of risk while treating their locations as unknown.
Step change detection is achieved by modelling the elements in W corresponding to adjacent random effects as random variables on the unit interval, rather than assuming they equal one. For example, if the posterior mean of w ik is close to zero then (4) implies that the corresponding random effects are close to conditionally independent, which indicates strong evidence of a step-change in risk. Our proposed model is one of the first adaptive (localised) smoothing models for spatio-temporal data, and is im-

Level 1 -Likelihood and random effects model for
The first level of the model for (Y ij , φ ij ) is similar to that proposed by Rushworth et al. (2014), and is given by The only difference from the model proposed by Rushworth et al. (2014) is that the GMRF prior proposed by Leroux et al. (2000) is replaced by the intrinsic GMRF prior, which has the simplification that ρ = 1. This simplification is enforced because the additional flexibility offered by the Leroux prior is redundant when adaptive (local) smoothing is permitted via modelling W , as is the case here. In particular, attempting to estimate both ρ and W could result in high posterior correlation and multimodality, because the random effects are spatially independent if either ρ = 0 or all elements of W equal zero. To avoid rankdeficiency of the precision matrix Q(W, ) and subsequent problems with matrix inversion, the adjusted specification Q(W, ) = diag(W 1) − W + I is used, where I is added to ensure that Q(W, ) is diagonally dominant and hence invertible. This invertibility condition is required because in the second level of the model described below, elements in W are treated as random variables, necessitating the evaluation of the normalised prior density of f (φ j |φ j−1 ). Sensitivity to the value of was checked in an initial modelling step, and was found not to affect estimation until was increased to a relatively large value, such as > 10 −2 . Therefore in this paper we set = 10 −7 .

Level 2 -Adjacency model for elements in W
Our methodological contribution extends the model of Rushworth et al. (2014) by treating the elements of W that correspond to adjacent areal units as unknown parameters to be estimated, rather than assuming they are fixed at one. These parameters are collectively denoted by the vector w + = {w ik |i ∼ k} of length N W = 1 T W 1/2, while the remaining elements of W that correspond to non-adjacent areal units remain fixed at zero. Equation (4) shows that under the intrinsic model (when ρ = 1) if w ik ∈ w + is close to one then partial autocorrelation and hence smoothing is induced between the spatially adjacent φ ij and φ kj for all time periods j. Conversely, if w ik is estimated as close to zero then φ ij and φ kj are close to conditionally independent for all time periods j, and no such spatial smoothing is enforced. In the latter case, a step change is said to exist in the random effects surface between areal units (i, k) for all time periods j. Thus the weight of evidence for a step change between areal units (i, k) is based on the posterior distribution where Y denotes the vector of all data points. Specifically, we follow Lu and Carlin (2005) and quantify the evidence for a step change using the posterior probability of w ik being less than 0.5. We model w ik ∈ w + as a set of continuous random variables on the unit interval [0, 1] as suggested by Brezger et al. (2007), rather than as binary random variables as suggested by Ma et al. (2010) and Lee and Mitchell (2013). This is because a continuous domain for w + allows the direct application of a second stage GMRF prior, avoiding the need for a discrete prior such as the Ising model for which no polynomial time algorithm exists to compute its normalising constant. As we model each w ik ∈ [0, 1] we propose a GMRF prior on the logit scale, v + = log (w + /(1 − w + )), which has the back transformation w + = exp(v + )/(1 + exp(v + )). The GMRF prior we propose for v + is a multivariate Gaussian distribution with a constant mean of µ, a constant variance of τ 2 , and a precision matrix defined by the GMRF prior proposed by Leroux et al. (2000) for first level random effects. This second stage GMRF prior requires us to specify an adjacency structure for the elements in v + , and here two elements v ik , v rs ∈ v + are defined as adjacent (denoted ik ∼ rs) if the geographical borders they represent in the study region share a common vertex. Using this notation, the GMRF prior we propose for v + and its hyperpriors are given by: τ 2 ∼ Uniform(0, 10000), ρ ∼ Uniform(0, 1).
Writing the joint prior for v + in this form highlights the role of ρ, which controls the extent to which step changes appear spatially clustered together and join at common vertices, or whether they are independently scattered around the study region.
When ρ ≈ 1 the random variable v ik , which controls the existence of a step change between areal units (i, k), is smoothed spatially towards adjacent v rs via the penalty ik∼rs (v ik − v rs ) 2 , which thus induces spatially clustered step changes. In contrast, when ρ = 0 each random variable v ik is smoothed non-spatially towards the overall mean value µ by the penalty vik∈v + (v ik − µ) 2 , which does not encourage spatial clustering of step changes in the unexplained component of risk.

In order to avoid problems of numerical under and over flow in implementing the
McMC algorithm when transforming between v + and w + , the sample space for each v ik ∈ v + is truncated to the interval [−q, q]. Here we set q = 15, because it avoids these numerical problems while allowing w + ik to have an effective domain of [0.000000306, 0.9999997], which is close to the intended [0, 1] interval. The prior mean µ for v + is fixed in (7), to ensure that the induced prior on the untransformed w + scale is consistent with our prior beliefs about the prevalence of step changes.
Specifically, given the level of spatial autocorrelation evident in the residuals shown in Figure 1, and the associated Moran I statistics reported in Section 2, one would expect there to be relatively few step changes in the random effects surface. In order to be consistent with this preference we have to choose µ > 0, as choosing µ < 0 implies a marginal mean for w ik of less than exp(0)/(1 + exp(0)) = 0.5, which thus favours step changes a-priori.
However, Figure 2 shows that the induced prior distribution for w ik depends on τ 2 as well as µ, with the left and right panels showing µ = 0 and µ = 15 respectively for various values of τ . The left panel shows that when µ = 0 the prior density for w ik can have a mode at 0.5, which is incongruous with our prior beliefs because higher probability density is associated with moderate values of w ij compared to w ij close to 1. Some initial simulations confirmed that setting µ = 0 leads to spurious step changes being identified. In contrast, when µ = 15, as shown in the right panel of Figure 2, the prior assigns high prior probability to w ik ≈ 0 or w ik ≈ 1 or both, with little prior probability in between. The ratio of the densities at {0, 1} depends on τ , so that when τ is small, almost all prior mass is concentrated around w ik = 1, hence strongly discouraging boundaries. In contrast, as τ increases the prior becomes more symmetric and 'U' shaped, with equal point masses at 0 and 1 expressing ambivalence about the presence or absence of step changes. Thus fixing µ = 15 ensures that clear step change decisions, that is w ik close to zero or one, are preferred over ambiguous values such as w ik = 0.5.

Inference
The model proposed here is implemented in the freely available R package CAR-BayesST, which can be downloaded from the CRAN website (http://cran.r-project.

org). The McMC algorithm we use is a combination of Gibbs and Metropolis-
Hastings steps, and the high dimensionality of the parameter space means that the algorithm has a number of features that reduce the computational burden.
These features are described in the supplementary material accompanying this paper, which also includes sample R code to apply the model to simulated data.

Simulation study
In this section we comprehensively test the performance of the proposed model under a range of scenarios, which differ in the amount of temporal replication of the data, the size of the step changes and the prevalence of the disease. The relative performances of three models are compared in this study, the first of which is the global Model (2) a-priori treats each w ik ∈ w + independently, and therefore does not encourage configurations in which step changes cluster together. This comparison allows the assessment of the relative merits of spatial and non-spatial smoothing of the step changes.

Data generation and study design
Data are generated for the N = 323 local authorities that comprise mainland England, which is the motivating application described in Section 3. Our primary focus in this study is to assess the ability of the models to (i) estimate the spatio-temporal pattern in disease risk, and (ii) perform Wombling, that is the identification of step changes in risk between neighbouring areas. For simplicity no covariates are included in this study, so that the step changes identified in the random effects also correspond to the risk surface. Disease count data are generated from a Poisson loglinear model, where the expected numbers of cases E ij are altered to assess model performance for diseases with different underlying prevalences. Simulated disease data for England are generated for T consecutive time periods, which is also altered to assess its impact on model performance. The log-risk surfaces are generated from a multivariate Gaussian distribution, whose precision matrix is defined by the intrinsic CAR prior and hence produces spatially smooth surfaces. To simulate spatial step changes in risk, a piecewise constant mean surface is specified for the random effects, which is displayed in the left panel of Figure 3. Lighter shaded areas exhibit a mean risk level of 1 while the darker shaded areas have a mean risk level of a, and the black lines correspond to the locations of true step changes. Different values of a are considered in this study, to assess the ability of the models to detect different sized step changes. An example realisation of this surface is shown in the right panel of Figure 3 for a = 2, where the clusters of high-risk areas are evident. To ensure the true risk surface is not identical for all time periods, independent random noise is added to the risk in each areal unit for each time period. The scenarios considered in this study are summarised in Table 1, which shows that we consider

Results
One hundred data sets were generated under each of the 9 scenarios shown in Table   1, and Models (1) -(3) were fitted in turn. Inference for each model was based on 30,000 McMC samples following a burn-in period of 20,000 samples, after which convergence was assessed to have been reached. The quality of the estimation of the spatio-temporal pattern in disease risk was quantified by the root-mean squared error (RMSE) of the estimated risk surface, that is RMSE= 1 N T i,j (R ij −R ij ) 2 , as well as by the coverage probabilities of the 95% credible intervals. Receiveroperating characteristic (ROC) curves were computed to quantify the accuracy of the step change detection, which were based on each models sensitivity and speci- Table 2. Median root mean-squared error (RMSE) and 95% credible interval coverages associated with the fitted risks for each model and each of the 9 simulation scenarios. ficity at identifying true step changes. These statistics were based on comparing E[w ij |Y] to a threshold value p * , where if E[w ij |Y] < p * a step change was identified where as for the converse no step change was declared. The value of p * was varied from 0 to 1 at intervals of 0.01, and the ROC curve is a plot of sensitivity against specificity. However, for ease of presentation the Area Under the Curve (AUC) is presented here rather than the full ROC curve, and an AUC of one corresponds to perfect step change identification. We note that AUC was only computed for Models (2) and (3), as Model (1) has no mechanism for step change detection. Table 2 shows the RMSE and credible interval coverages associated with each model across the nine simulation scenarios, from which a number of patterns emerge.
Firstly, models (2) and (3) outperform the existing non-adaptive approach in terms of RMSE in all but the a = 1 scenario, in which no step changes exist and the RMSE values are hence similar. Model (2) outperforms model (3) in terms of RMSE in almost all cases, suggesting that the spatial smoothing imposed on the adjacency relationships w + is sub-optimal compared with assuming each element w ik ∈ w + is a-priori independent. For all models RMSE decreases as both the number of time periods T and disease prevalence E increases, which is due to an increase in the amount of data. The coverage probabilities for all models are generally close to their nominal 95% levels, with the exception being the a = 1 scenario in which the adaptive smoothing models have coverages of just below 89%.  Table 3 displays the median AUC statistic across the set of ROC curves calculated for the 100 simulated data sets from each scenario. The numbers in brackets are the tenth percentile of that distribution, and give a summary of the variation in the AUC statistics across the 100 simulated data sets. An exception to this is the a = 1 scenario, which instead displays the specificity because as the risk surface is spatially smooth there are no true step changes to identify. For model (2) the median AUC values is close to the maximal value of 1, indicating very accurate step change identification. The exception to this occurs when T = 1 which is when step change detection is based on only one realistaion of the spatial surface.
the specificity for the a = 1 scenario for model (3) with ρ estimated, which has a median specificity of only 0.5594. The relatively poorer performance of model (3) compared with model (2) is consistent across all scenarios and is also evident in the tenth percentile results, and re-enforces the RMSE and coverage results displayed in Table 2. This poorer performance is because model (3) forces the step changes to be spatially smooth, thus inducing a set of false positives that are spatially close to the real boundaries. The other main result from Table 3 is that the AUC increases as the number of time periods T increases, which is due to an increase in the amount of temporal replication in the data.

Results of the England case study
This section presents the results of the England circulatory and respiratory disease case study introduced in Section 2, where JSA is included as a covariate in all models.
We apply two models to each data set, the global smoothing model of Rushworth et al. (2014) (Model (1)) and the adaptive smoothing model proposed here with the simplification that ρ = 0 (Model (2)). We do not apply the full adaptive model which estimates ρ because the simulation study showed it produced poorer results compared to Model (2). Inference for both models is based on 50,000 posterior samples, which are collected after a burn-in period of a further 50,000 samples.
In analysing these data our goals are to: (i) estimate the spatio-temporal pattern in disease risk to quantify the extent of health inequalities; and (ii) estimate the location of any step changes in the unexplained spatial risk structure, which will assist in the identification of unmeasured confounders.  Table 4. The table shows that the adaptive smoothing model (2) fits the data better than the global smoothing model (1) for both diseases, with reductions of around 350 (1%) in both cases.

Model fit and risk estimation
Additionally, Table 4 shows that model (2) has a substantially smaller number of effective parameters (p D ) than (1), so that it is achieving this improvement in fit whilst having an improvement in parsimony. While model (2) has a more complex specification than model (1), its ability to identify step changes permits the GMRF component to smooth more strongly elsewhere in the spatial surface. This stronger smoothing over the remaining random effects is visible in Table 4 from the smaller variance estimates for σ 2 , which in turn implies greater level of penalisation of φ N ×T and a reduction in the overall effective number of parameters p D . Finally, the temporal autocorrelation as estimated by the parameter α is high and consistent between models and different disease data, with posterior median estimates ranging For circulatory disease a similar pattern is evident, with an estimated difference between highest and lowest risk of 1.39 in 2001 and 1.54 in 2010.  England that incorporates Manchester and Yorkshire, even after adjusting for JSA.

Step change identification
It is striking that these features are largely consistent between the two diseases, so that although the estimated risks have different overall magnitudes, they exhibit very similar spatial patterns. Public health professionals can use these results to identify potential risk factors for disease, by searching for risk factors that exhibit step changes in the same locations as those exhibited in Figure 4.

Discussion
In this paper a new study of the spatio-temporal structure of circulatory and respiratory disease risk in England is presented, with the goal of understanding the extent of health inequalities and whether the data present evidence of disparities in disease risk between pairs of adjacent regions. Consequently, a new spatially adaptive smoothing model for disease risk was developed, that can estimate the location and strength of step-changes in disease risk. The model is a spatially adaptive extension to the class of GMRF prior distributions, and is one of the first models for step change identification in spatio-temporal disease risk. Freely available software via the CARBayesST package for R is provided to allow others to apply our model to their own data, and this is one of the first R packages for spatio-temporal disease mapping.
The simulation study in Section 5 established the superiority of our model over existing global smoothing alternatives, in terms of both risk estimation and the quantification of uncertainty in disease risk. Our model was also successful at recovering the locations of known step changes in simulated data, with AUC statistics close to one for a range of different scenarios. These AUC statistics were higher if the step changes were assumed to be independent in space, because a-priori assuming spatial clustering resulted in false step changes being identified close to real step changes. Thus existing global smoothing models are sub-optimal for space time disease mapping in two respects. Firstly, they smooth over such step changes leading to poorer estimation of disease risk, and secondly they cannot identify such step changes which themselves provides etiological evidence about potential unmeasured risk factors.
Section 6 described the application of the new model to the England hospital admissions data, from which strong evidence of step changes in the unexplained component of risk was found. Better model fit with a smaller number of effective parameters p D was also observed compared to the global smoothing alternatives, which was achieved because increased levels of smoothing were possible in locations where step changes were not present. Thus existing models without this adaptive smoothing capability may overfit some data sets, by imposing too weak a spatial smoothing constraint due to the presence of step changes in risk. A striking association was found between the fitted risks and identified step changes between circulatory and respiratory disease, perhaps indicating the influence of the same unobserved risk factor (after allowing for socio-economic deprivation by JSA). Therefore in future work we will try and identify such unmeasured confounders, to see if the they are indeed common to both diseases.
Another avenue for future work is to use the model in an ecological regression context, where the effect of an exposure on disease risk is of primary interest rather than the spatio-temporal pattern in disease risk. The efficacy of adaptive smoothing models in this context may be to reduce spatial confounding between the random effects and the covariates as suggested by Clayton et al. (1993), and environmental factors such as air pollution would be a natural context for such work. A further avenue of future work is to consider spatio-temporal models for multiple diseases, such as circulatory and respiratory disease, simultaneously, thus allowing between disease correlations in step change locations to be utilised in the model.

Acknowledgements
This work was funded by the Engineering and Physical Sciences Research Council (EPSRC), via grant EP/J017442/1.