How does temperature vary over time?: evidence on the stationary and fractal nature of temperature fluctuations

The paper analyses temperature data from 96 selected weather stations world wide, and from reconstructed northern hemisphere temperature data over the last two millennia. Using a non‐parametric test, we find that the stationarity hypothesis is not rejected by the data. Subsequently, we investigate further properties of the data by means of a statistical model known as the fractional Gaussian noise (FGN) model. Under stationarity FGN follows from the fact that the observed data are obtained as temporal aggregates of data generated at a finer (basic) timescale where temporal aggregation is taken over a ‘large’ number of basic units. The FGN process exhibits long‐range dependence. Several tests show that both the reconstructed and most of the observed data are consistent with the FGN model.


Introduction
Debate about whether or not a change in temperature levels is taking place has been fierce over the past few decades. It seems to be widely accepted now that there has been a systematic change in the temperature process in recent years. However, this is not as easy to demonstrate as might be expected. Recorded series of temperatures typically exhibit local trends and cycles that sometimes seem to persist for up to several decades. The difficulty of settling this matter relates to the fact that the lengths of the available observed time series of temperature data are limited. Few of the recorded temperature series are longer than 250 years. Although 250 years may seem a long time, it is not when it comes to examining properties such as stationarity and long-range dependence.
Recently, there have been attempts to reconstruct temperature data from other sources. One important data set has been obtained by Moberg et al. (2005a), who have reconstructed annual temperatures from 1979 back to 1 AD based on information from ice core drillings, tree rings, lake sediments and other sources. The advantage of the reconstructed data sets is that they cover quite a long time range. However, since they are reconstructions one cannot rule out the possibility that the extent of measurement error might be substantial.
Whereas existing physical models provide good short-term temperature forecasts, they are not reliable for forecasts more than a few days ahead. In the absence of a physical model that is capable of explaining weather dynamics precisely over a longer time range, an interesting question is whether temperature fluctuations are consistent with outcomes of an underlying stationary stochastic mechanism (process), and what the features of such a stochastic process might be. In this paper we address several challenges. First, we investigate whether the temperature process is stationary by applying a non-parametric test on observed as well as reconstructed data. Second, we apply theoretical arguments to justify a specific stochastic model for the temperature process. Third, we estimate the parameters of the model by using both observed and reconstructed data. Finally, we apply different approaches to test whether the observed and reconstructed data are consistent with our model.
With a few exceptions, those references applied univariate statistical tools for analysing time series without any a priori restrictions derived from theoretical considerations. When the choice between statistical models is based primarily on goodness-of-fit criteria, choosing between specifications that yield virtually the same fit but entail different interpretations and produce substantially different out-of-sample predictions is highly problematic. This difficulty is illustrated by discussions in the recent literature  and Mills (2010), and their references) about whether the temperature process exhibits systematic trends. This difficulty is not likely to be resolved by statistical tests alone. If a precise physical theory were available one might imagine applying a combination of a priori physical assumptions and statistical modelling techniques, as has been done by, for example, Aldrin et al. (2012), Humlum et al. (2013), Schmith et al. (2012) and Solheim et al. (2011). However, such a strategy has serious drawbacks because existing physical theories are not sufficiently precise to provide reliable quantitative relationships to explain observed temperature fluctuations. Climate models are rough deterministic representations of very complex phenomena and so cannot be expected to give precise answers. Since they are deterministic, they are also not very helpful for choosing between competing statistical formulations.
Motivated by the results from the non-parametric stationarity tests, which show that stationarity is not rejected in most cases, we take a theoretical approach that draws on recent advances in the theory of stochastic processes to justify the model structure before the model is confronted with the data. Our approach is simple in the sense that no explanatory variable is introduced and no deeper explanations of the temperature variations are provided. The justification is based on a key feature of the data, namely that the observed temperature series are obtained as temporal (or can be interpreted as) aggregates of data generated at a finer (basic) timescale where temporal aggregation is taken over a 'large' number of basic units. For example, weekly temperatures are obtained as averages of daily (average) temperatures, monthly temperatures as averages of weekly (average) temperatures, etc. This feature is very important because it has a crucial bearing on the structure of the statistical model which is supposed to have generated these data. Under stationarity and specific regularity conditions, it follows from Giraitis et al. (2012), pages 77 and 79, that the observed temperature data have, approximately, the properties of a so-called fractional Gaussian noise (FGN) process (Mandelbrot and van Ness, 1968;Wallis, 1968, 1969). Recall that the stationarity assumption is reasonable since the first-stage non-parametric tests show that stationarity is not rejected by the reconstructed-and most of the observed-temperature series. As will be explained later, other tests of the properties of the model (including stationarity) are also applied. As demonstrated by Mandelbrot and Wallis (1969), the FGN model allows for highly irregular patterns of observations. It implies a specific structure of the serial correlation pattern. Thus, our approach avoids ad hoc parameterizations of the serial correlation structure that are typical in statistical time series analysis. In our case, where the timescale is based on either 'month' or 'year' as the unit, although the basic timescale may be 'day' (or fractions of a day), we can safely say that temporal aggregation is taken over a large number of units at the basic scale.
We have used data from 96 weather stations and data reconstructed by Moberg et al. (2005aMoberg et al. ( , 2006 to test the stationarity hypothesis and to estimate and test the FGN model. We have used different estimation methods and tests. These tests seem to provide convincing evidence to support the assertion that the FGN model is consistent with the major part of the data. What is particularly striking is that the reconstructed data also appear to be consistent with the FGN model. The paper is organized as follows. In Section 2 the data are described. Section 3 contains a discussion on the justification of the modelling assumptions. In Section 4 we discuss methods for estimation and testing, whereas Section 5 contains results from the estimation and testing. Section 6 is devoted to an analysis of the reconstructed data from the last two millennia (Moberg et al., 2005b).
Additional results are given in four on-line supplementary appendices. The first file contains appendices with further descriptions of the data and estimation results and figures. The second file contains the R code that was used to produce the results in the first file. The third file contains the estimation results of the paper together with the R code that was used to produce the results. The fourth file contains all the functions that were developed for this paper by Mariachiara Fortuna written as plain text.

Data
Data on observed temperatures were obtained from various sources (Hov Moen; www. rimfrost.no). These sources are the National Aeronautics and Space Administration Goddard Institute for Space Studies, European climate assessment and data and national meteorological institutes, such as the Swedish Meteorological and Hydrological Institute and the Norwegian Meteorological Institute. The data, certified by the National Aeronautics and Space Administration, comprise time series of temperatures for 1258 weather stations in cities or towns in more than 100 countries. The time series are available as yearly and monthly figures. The lengths of the time series vary greatly across stations. Some, such as Uppsala, contain data for 290 years, with monthly data from 1722 to 2012, apart from some missing observations. Other time series are shorter than two decades. Some of the time series have several periods of missing data. After various selection criteria were applied we ended up with our sample consisting of 96 time series from 32 countries. The time series were selected on the basis of the quality of data, such as the length and number of missing observations. More information about the selection process is available in appendix F in the first on-line resource file.
The first on-line resource file contains plots of the temperature data for nine selected weather stations (Fig. B1) as well as summary information for the 96 weather stations (Table B1). The nine weather stations were selected because they have the longest temperature series with the fewest interruptions. We have used annual data as well as monthly data adjusted for seasonal variations. Under the Gaussian hypothesis, seasonality can affect only the monthly mean, the variance and possibly the auto-correlation function. By assuming that the auto-correlation function does not depend on seasonal variations, we have adjusted temperatures for seasonal variations by subtracting the respective monthly means from the observations and dividing the observations by the respective monthly standard deviations. The figures of the temperature series show strong cycles and local trends ( Fig. 1B and appendix E in the first on-line resource file). (It is likely that there are measurement errors in these temperature series. Some errors are due to the location of the measurement sites, which initially were often to be found in central urban areas. For example, the thermometer that was used for temperature recordings in New York City was previously in Central Park and only recently moved outside the city. It has been documented that the temperatures tend to rise with increasing urbanization. There may also have been variations in the quality of the thermometers that were used over time. ) We have also analysed the data obtained by Moberg et al. (2005aMoberg et al. ( , 2006. They reconstructed temperatures for the northern hemisphere from 1 AD to 1979 by using data from borehole drillings, tree rings and lake sediments: see Moberg et al. (2005a,b) for a detailed discussion and description of their data; see also the discussion by Moberg (2012). These data show considerable variations over time: there are several cycles, with a high swing occurring from 1000 to 1100 and a low swing occurring from 1500 to 1600: see Fig. 4 in Section 6.

Fractional Brownian motion and fractional Gaussian noise
Let {X.t/, t 0} denote the (observed) temperature process, which we view as a stochastic process in continuous time t. In this section, we provide a more precise formulation of our model and its justification. In what follows, let N denote the set of integers, including 0, and let [t] denote the integer part of t.
Property 1. The process {X.t/, t 0} is generated as averages of observations at a finer (basic) timescale, i.e.
.q/ where {X.q/, q ∈ N} denotes the process defined at the basic timescale.
Suppose, for example, that the unit of the basic timescale is 'day' and that t indexes 'month'. ThenX.q/ is the temperature recorded in day q, m ∼ = 30, and X.t/ is defined as the temperature of month t. Here, it is understood that the enumeration of the days goes from 1 to mT where T is the length of the time series when using the timescale that corresponds to the observed data, which in this case is month. Alternatively, if instead t indexes year, then X.t/ is defined as the temperature of year t. In our case property 1 is simply a formalized statement of how the observed time series (suitably adjusted for seasonal variations) have been constructed.
Hypothesis 1. The process {X.q/, q ∈ N/ defined at the basic timescale is weakly stationary with constant mean.
Note, however, that in this paper we do not assume at the outset that the temperature process is weakly stationary. Thus, hypothesis 1 is a property which we shall test.
To provide an a priori theoretical justification for our model, namely the FGN process, let If the process {X.q/, q ∈ N} is stationary and satisfies some regularity conditions, then it follows from proposition 4.4.1, page 77, and proposition 4.4.4, page 79, in Giraitis et al. (2012) that the process {V.t/, t 0} given by is fractional Brownian motion (FBM): see theorem 3 in Appendix A. The FBM process is Gaussian and has the property known as self-similarity. Remember that a self-similar stochastic process (in continuous time) {V.t/, t 0} has the property that the joint distribution of .V.bt 1 /, V.bt 2 /, : : : , V.bt n // is equal to the joint distribution of .b H V.t 1 /, b H V.t 2 /, : : : , b H V.t n // for any set of time periods .t 1 , t 2 , : : : , t n /, for any integer n, and any positive constant b where H satisfies, 0 < H < 1 (Samorodnitsky and Taqqu (1994), chapter 7). The parameter H is the so-called Hurst index, named after the British engineer Harold Edwin Hurst . One way of explaining the self-similar feature (or fractal feature in a stochastic context) is to say that the distribution of the average process (normalized to have zero mean) up to time bt is, apart from a change of scale, the same as the distribution of the average process up to time t. In other words, the time span over which the average is taken is not essential for the qualitative properties of the probability law of the process. Thus, under self-similarity, a change in the input scale by a factor b will leave the process invariant up to a change in the 'output' scale by the factor b H : The FGN process is defined by It follows immediately from the definition above that so that, when m is 'large', property 1 implies that is approximately FGN provided that it is weakly stationary. The FGN process has autocovariance function given by where σ 2 = var{X.1/}: We see that the autocovariance function is determined by σ and the Hurst index H. Note that equation (3.1) implies that the auto-correlation function is determined by only one unknown parameter H. It follows from equation (3.1) that when H = 0:5 the autocovariance of the FGN is 0 and FGN reduces to a process with independent realizations. One can prove (see Samorodnitsky and Taqqu (1994)  (3.2) shows that the auto-correlation function decreases slowly according to a power function (asymptotically). This means that FGN exhibits long-range dependence. In contrast, conventional time series models typically have exponential (or linear combinations of) auto-correlation functions. Moreover, since expression (3.2) implies that the auto-correlation function becomes independent of the timescale we conclude that the FGN process is approximately self-similar. Fig. 1 provides an intuitive illustration of the self-similar feature of the temperature process for Paris. Fig. 1(a) shows the total series of annual recorded temperatures for Paris, from 1757 to 2009 (253 years). Fig. 1(b) shows the monthly temperatures over 253 months, suitably rescaled. We note that although the two graphs are different they nevertheless appear to have quite similar patterns (fractal feature). (Beran et al. (2013) and Mandelbrot and van Ness (1968) have discussed aspects of the fractal feature of self-similar processes and FGN.) Now we turn to another property of the FGN process that will be useful for interpreting the reconstructed temperature data. The reconstructed temperature data that were obtained by Moberg et al. (2005a, b) are based on several sources of data. A key question is therefore whether or not a linear combination of self-similar processes is approximately a self-similar process. For this let W = r 1 W 1 + r 2 W 2 where W 1 and W 2 are independent self-similar processes with Hurst indices H j and r j , j = 1, 2, are weights. We have the following result.
Theorem 1. Suppose that W = r 1 W 1 + r 2 W 2 where r 1 and r 2 are constants and W 1 and W 2 are independent self-similar processes with Hurst indices H 1 and H 2 respectively, with H 1 H 2 . Then {W.bt/b −H 1 , t 0} converges weakly towards W 1 as b → ∞: The proof of theorem 1 is given in Appendix A. The result of theorem 1 means that a linear combination of independent self-similar processes will be approximately self-similar with Hurst index equal to the largest Hurst index of the processes. Accordingly, the sum of FGN processes will also be approximately FGN with Hurst index equal to the largest Hurst index of the original processes.

Methods of statistical inference
In the previous section we presented theoretical support for FGN as a model representation of the temperature process under the hypothesis of stationarity (hypothesis 1). Thus, it is of fundamental importance to test hypothesis 1. The theoretical result in support for the FGN model is valid only for large samples (asymptotic results) so it is of interest also to test the properties of the FGN even if hypothesis 1 has been found to hold. There are several methods that can be applied to analyse long memory processes; see Beran et al. (2013). In this paper we have used a non-parametric test (Cho, 2016) to test for stationarity, a method based on the empirical characteristic function, the Whittle method and the wavelet lifting method (Knight et al., 2017).

Method based on the characteristic function
In this section we outline a specific regression approach based on the empirical characteristic function that is used for estimation and for carrying out graphical tests of whether the observed temperature process is consistent with the FGN model. This method is an extension of Koutrouvelis (1980) and Koutrouvelis and Bauer (1982) to the time series setting. In the time series setting an important question is whether the empirical characteristic function is consistent for the corresponding theoretical characteristic function. Feuerverger (1990) has proved strong consistency of the empirical characteristic function in stationary time series under a specific condition on the autocovariance function. This condition does not, however, hold for long memory time series. It is therefore of interest to establish consistency of the empirical characteristic function in our setting. The following theorem yields strong consistency of the empirical characteristic function for long memory Gaussian processes.
Theorem 2. Assume that {Z.t/, t ∈ N} is a Gaussian process in discrete time with the property that |cov{Z.s/, Z.t/}| K|t − s| δ for all s, t ∈ N where K > 0 and δ < 0 are constants and let The proof of theorem 2 is given in Appendix A. Theorem 2 shows that the empirical characteristic function for Gaussian processes is a strongly consistent estimator of the mean characteristic function even when the process {Z.t/, t ∈ N} is non-stationary. Regarding the FGN process it follows from expression (3.2) with δ = 2H − 2 that the condition of theorem 2 holds. In the stationary case ψ.s, s + T ; λ/ reduces to ψ.s, s + T ; λ/ = E[exp{iλZ.s/}]: Let {Y.t/, t ∈ N} denote the aggregate process in discrete time defined by The reason why we divide the exponent above by √ |t − s| is that this will enable us to calculate reliable estimates of the corresponding empirical characteristic function for higher values of λ than otherwise. Define the corresponding empirical counterpart of the characteristic function above bŷ . 4:1/ As explained in Appendix A, it follows from equation (4.1) that : .4:2/ Formula (4.2) is more convenient than formula (4.1) for calculating numerical values. Note next that if Z is a normally distributed random variable with mean b and standard deviation σ it is well known that for any real or complex c we have E{exp.cZ/} = exp.bc + σ 2 c 2 =2/: .4:3/ When {Y.t/, t ∈ N} is FBM it follows by using equation (4.3) and the self-similarity property that where ".s, t/ is an error term that is small when the number of observations is large. Note that the right-hand side of equation (4.6) is linear in log |t − s| and in log |λ|: Thus, equation (4.6) implies that H and σ can be estimated by regression analysis by treating log |t − s| and log |λ| as independent variables with suitable values of t − s and λ. Koutrovelis (1980) has given a look-up table with recommended values of λ and t − s: Kogon and Williams (1998) proposed the use of 10 equally spaced points, ranging from λ = 0:1 to λ = 1. Because the normal distribution is symmetric around zero this is equivalent to using values from λ = 1 to λ = 10: In our estimation we have used t j − s j = 1, 2, : : : , 20, for the nine selected cities that are reported below and t j − s j = 1, 2, : : : , 10, for the remaining weather stations. As regards λ we have chosen the points λ j = 0:1, 0:2, : : : , 1, for j = 1, 2, : : : , 10. If the temperature process is not stationary log{− log |φ.t − s; λ/|} will in general not be approximately linear in log |t − s| and log |λ|: When {Y.t/, t 0} is a self-similar and stable process with stationary stable increments one can show in a similar way that where τ is the scale parameter and α ∈ .0, 2] is the index parameter representing tail thickness in the stable distribution. Thus, if the slope that is associated with log |λ| is found to be less than 2 it indicates that the process is not Gaussian but stable. Under the stationarity hypothesis let μ = E{X.t/}. In Appendix A we derive the estimatorμ of μ, given in equation (A.10), based on the characteristic-function approach. Kogon and Williams (1998) have examined the properties of the Koutrouvelis method further in the context of estimating the parameters of stable distributions and found that it performs quite well. However, to the best of our knowledge the characteristic-function-based method that was outlined above has not been applied for estimation of self-similar processes. The characteristic function approach can also be used to carry out graphical tests. This is done by plotting the expression on the left-hand side of equation (4.6) (or (4.7)). If the FGN model is correct this plot will be linear in log |t − s| and in log |λ|, with the coefficient that is associated with log |λ| equal to 2. When producing the plots we have used the same values of λ and t − s as during the estimation.
Recall that even if all one-dimensional marginal distributions are normal it does not necessarily follow that the corresponding joint distribution is multivariate normal. The tests based on the empirical characteristic function that was discussed above can be extended to test whether the joint distribution of the temperatures at several points in time is multivariate normal.
We have conducted a series of bootstrap simulations to check whether or not the distributions of the estimatesĤ,σ andμ that are obtained by the characteristic function procedure are asymptotically normal and unbiased. For this we have computed corresponding Q-Q-plots, which are obtained by bootstrapping based on 1000 simulated time series with length 2000 (Figs D1 and D2 in the first on-line resource file). These plots seem to be almost perfectly linear and clearly indicate that the estimates are normally distributed. However, the estimates of H seem to be downward biased when H is greater than 0.8.

Test of goodness of fit
We have also used a χ 2 -type statistic, here called the Q-statistic, to examine whether the data are consistent with the FGN model. (Note that the Ljung-Box test cannot be used. According to Chen and Deo (2004), the distribution of this test is not known when the data exhibit longrange dependence.) Let {Z.t/, t ∈ N} denote a Gaussian stochastic process with zero mean and let Z T = .Z.T/, : : : , Z.2/, Z.1// be a random Gaussian vector. Let Ω T .H/ be the covariance matrix of Z T when {Z.t/, t ∈ N} is FGN. In this case it is well known that Z T Ω −1 T .H/Z T is χ 2 distributed with T degrees of freedom and therefore Z T Ω −1 T .H/Z T has the same distribution as

Non-parametric testing of stationarity
Several non-parametric tests of stationarity are available. As mentioned above, we have applied the test that was developed recently by Cho (2016). In this method the test statistic is obtained from measuring and maximizing the difference in the second-order structure over pairs of randomly drawn intervals. Cho (2016) proved asymptotic normality of the statistic under specific conditions. These conditions are satisfied for Gaussian and many non-Gaussian processes. Cho (2016) demonstrated that his test has good finite sample performance on both simulated and real data. (We have chosen to apply the test of Cho (2016) because it seems to have good properties and is easy to use. R code is available and is also easy to use.) For the monthly time series, stationarity is rejected for between about 14 and 18 out of the total of 96 series at a 5% significance level. (The variability in the number of rejections stems from the fact that different subsamples are selected each time that the test procedure is applied.) The number of rejections might vary slightly from one run to another because the intervals are drawn randomly. In contrast, for the annual time series, stationarity is rejected for only one series (Djupivogur, Iceland) at 5% significance level. Note that the monthly and annual data cover the same time range. The reason why stationarity is rejected for some of the monthly series (compared with the annual series) might be that those series are quite long, or, alternatively, that the control for seasonality is insufficient (Tables C6 and C7 in appendix C in the first on-line resource file).
We have also applied the stationary test of Cho (2016) to the Moberg data (Moberg et al., 2005a(Moberg et al., , 2006. We find that stationarity is not rejected at the 5% significance level (Table C5 in appendix C in the first on-line resource file).

Inference based on the maximum likelihood, the characteristic function regression and the wavelet lifting methods
In this section we report inference results for nine selected weather stations. These were chosen because they are among those with the longest series and have the fewest missing observations (appendix F in the first on-line resource file). The estimates for the remaining stations are given in Tables C1 and C2 in appendix C in the first on-line resource file. Recall that when the time series are Gaussian processes the likelihood function is fully identified by the autocovariance function and the mean. Recall that the FGN model is approximately invariant under choice of timescale, i.e. whether one uses observations on annual or monthly data the structure of the auto-correlation function is approximately the same (with the same H). In Table 1 we show estimates of H based on the characteristic function approach, H C and estimates obtained by the Whittle method H W : see Beran (1994), chapter 6.1. The Whittle method, which is based on an approximation of the likelihood function, has been found to perform well (Rea et al., 2013). (For long time series the determinant of the corresponding autocovariance matrix of the underlying stochastic process may be close to zero. In such cases exact maximum likelihood estimation often does not work, which is the case in our application. Rea et al. (2013) have studied the performance of several estimators of long memory processes and have found that the Whittle method is among the better estimators of those examined. Also, Abry et al. (2000) have found that the Whittle estimator may be heavily affected by trends.) The estimates of the standard errors of the characteristic function estimates are obtained by bootstrap simulations. For this we have used 1000 simulations of FGN time series of length 2000 (appendix D in the first on-line resource file). A striking feature of these estimates is that the Hurst index does not vary much across weather stations.
The estimates based on monthly data are quite precise because we have long time series. We note that the maximum likelihood estimates, and the estimates obtained by the characteristic function regression procedure, are quite similar. Since our method for seasonal adjustment may not be optimal, this may produce additional 'noise' in the data. One might therefore expect estimates based on annual data to yield higher estimates of H because the presence of random noise might weaken the serial dependence in the seasonally adjusted monthly temperature series. This is in fact what we find, namely that the estimates of H based on annual data often are significantly higher than the estimates based on monthly data.
In Tables D1, D2 and D3 (appendix D in the first on-line resource file) we also report results from bootstrap simulations of the properties of various estimators. These results show that the characteristic function regression estimator seems to be downward biased when H is greater than 0.8, whereas the Whittle estimator appears to be unbiased even for large H, such as H = 0:95. We note that for values of H greater than 0.8 all the estimators seem to underestimate σ.
To test for normality of the time series, it is possible to use the Kolmogorov-Smirnov test (Beran (1994), page 199). In this paper we have used the characteristic function approach that was outlined above for this. Recall that under the assumptions of theorem 3 in appendix A it follows that the temperature process is asymptotically Gaussian. It is still of interest to test for normality since we do not know to what extent the assumptions are met in finite samples and with finite m. Under the FGN hypothesis it follows from equation (4.7) that if we select values λ j as described above and plot the left-hand side of equation (4.7) against log.λ j / the plot will be linear with the slope (tail index α) close to 2. In Fig. 2 we show corresponding plots for selected cities. From Fig. 2 we see that the plots are almost perfectly linear, with most slopes between 1.99 and 2.01. In one case (Basel) the plot is linear with the slope equal to 1.98, which indicates a stable distribution (with slightly heavier tails than a normal distribution). More results are given in appendix E in the first on-line resource file.
As discussed above, the characteristic function regression approach can also be applied to check whether data indicate that {Y.t/, t 0} is self-similar with stationary increments, and whether the increments are Gaussian. We have plotted the left-hand side of equation (4.7), for t = 1, 2, : : : , 10, against log |t − s|: The resulting plots are shown in Fig. 2 for the cities selected. Additional results are given in appendix E in the first on-line resource file. We note that in most cases the plots are approximately linear. (There is a problem with this test because the calculation of the empirical characteristic function yields imprecise results when λ is large.) We have subsequently applied the Q-statistics defined in equation (4.8) to test whether the FGN model is consistent with the temperature data. Recall that when H is known the test statistic Q T .H/ is, asymptotically, standard normally distributed, which implies that the corresponding 5% confidence interval is equal to (−1:96, 1.96). In Table 2 we present results for the nine cities selected.
We see from Table 2 that when H is estimated by the characteristic function regression method the model passes the test for all nine weather stations, whereas in two cases (Paris and Milan) models that were estimated by the Whittle maximum likelihood method are rejected. For the total data set we have found that in only two cases (Buenos Aires, Argentina, and Cap Otway, Australia) out of the 96 time series are the models that were estimated by the characteristic function regression method inconsistent with the data, whereas in eight cases the models that were estimated by the Whittle maximum likelihood method are inconsistent with the data (Tables C1 and C2 in appendix C in the first on-line resource file). When H is replaced by its corresponding estimateĤ (say), then it is not known what the corresponding distribution of Q T .Ĥ/ is. For this, we have conducted a series of simulation experiments under the assumption that the FGN model is true. These simulations indicate that when H is estimated by the Whittle method the distribution of Q T .Ĥ/ is stable with zero mean and maximally skew to the right. It follows that when we compute confidence intervals based on the unconditional (stable) distribution of Q T .Ĥ/ the model is rejected in only one case (Adelaide) (Fig. D3 in appendix D in the first on-line resource file).
We have also applied the wavelet lifting method to estimate H on the basis of the annual data. Table C8 in appendix C in the first on-line resource file contains estimates of H and of the Q T .Ĥ/ statistic. According to Table C8 the model is rejected for 36 out of the 96 time series. Fig.  C1 in appendix C in the first on-line resource file shows that the wavelet lifting method tends to produce lower estimates of H than does the Whittle method. This might be the reason for the relatively high proportion of rejections.
Some of the observed time series suffer from missing observations. To investigate whether the distribution of Q T .H/ is affected by missing observations we have carried out 1000 simulations of FGN processes with H = 0:8 and length 2000 and with 70 missing and randomly dispersed data points. This amount of missing data is typical for the 96 selected time series. The plots in Fig. D5 in appendix D in the first on-line resource file show that missing data have no effect in this case.
When looking at some of the observed temperature graphs which exhibit local trends of considerable length it may seem puzzling that a stationary model such as FGN with only three   Fig. 3 may be 1 year, 10 years or 100 years. However, the corresponding amplitudes will be affected by a change in time unit. If, for example, the timescale is changed from 1 year to 10 years the corresponding standard deviation is found by dividing the standard deviation based on annual data by 10 1−H : Recall also that most of the estimates of H based on annual data have orders of magnitude in the interval (0.7, 0.9). With H = 0:7 we see from Fig. 3 that from about 625 to about 720 time units there seems to be a decreasing trend, whereas from about 260 to about 330 time units there is an increasing trend. When H = 0:8 and H = 0:9 this type of pattern seems to be more pronounced. In these cases we note that the local trends may be several hundred time units long. To investigate whether it is possible to detect departures from stationarity given that the 'core' stationary process is FGN, we have conducted a simulation experiment. We have simulated 180 years of the following process: during the first 120 years the temperature process is assumed to be FGN, with zero mean and unit variance; during the last 60 years the temperature process is assumed to be FGN, plus a linear trend with positive slope, starting at 0 in year 120. We used the test based on the Q-statistic to see how steep the trend must be before the FGN hypothesis is rejected. It turns out that the trend must be equivalent to an increase of at least about 2.16 • C over 60 years before departure from stationarity can be seen, when H = 0:7. When H > 0:7 the slope had to be even steeper for departures from the FGN model to be revealed by the Q-test. Thus, this result indicates that the Q-test might have low power for alternatives in the class of models where each member of the class can be broken down into an FGN process plus a deterministic trend. This is hardly surprising given the properties of the FGN process. This stems from the fact that, as H increases, the FGN model exhibits increasingly complex patterns with long stochastic trends and cycles, as seen from the graphs in Fig. 3. However, these are local features that disappear in the long run.

Empirical analysis based on two millennia of temperature constructions
In this section we discuss how the FGN model fits the reconstructed data (Moberg et al., 2005a, b). Fortunately, the argument that was used to support the hypothesis that the temperature process is FGN under stationarity also applies in this case. Recall that the reconstructions of Moberg et al. (2005a, b) were based on temperature proxies that were obtained from several sources, such as tree rings, lake sediments and ice core borehole data. According to the technology of temperature reconstructions from tree rings and ice core samples, these data yield annual temperatures which may be interpreted as temporal aggregates of data generated at a finer (continuous) timescale. Consequently, it seems reasonable to assume that property 1 continues to hold also in this case. This may be true even if the observed data contain measurement errors.
Hence, provided that the observed reconstructed series is generated by a stationary process, it follows that it must be approximately an FGN process. Fig. 4 displays the reconstructed temperatures from 1 AD to 1979. Recall that when applying Cho's stationarity test the hypothesis of stationarity was not rejected. We have applied the characteristic function regression approach and the Whittle maximum likelihood estimation procedure, as well as tests based on the graphical method. Both the plots of the normality test and the plots of the self-similarity test are close to being perfectly linear and are therefore consistent with the FGN hypothesis: see Fig. 5. The estimate of H by the characteristic function regression method isĤ = 0:917: Recall that according to the simulations in Table D2 in appendix D in the first on-line resource file the characteristic function regression estimator underestimates H when H > 0:8. The estimate that is based on the Whittle method yieldsĤ = 0:990 with standard error 0.015. By using a test based on the unconditional Q-statistic we find, however, that both the estimate that is based on the characteristic function and the Whittle estimate imply that the FGN model is rejected. It turns out that only when H is close to 0.95 is the value of Q T .H/ within the confidence interval (−1:96, 1.96). Thus, only when H is close to 0.95 is the FGN model not rejected, which means thatĤ = 0:95 might be the better estimate. Recall that when H is replaced by the Whittle estimateĤ the distribution of Q T .Ĥ/ is stable with zero mean, maximally skew to the right with α = 1:99 when H = 0:7 and H = 0:8, and α = 1:96 when H = 0:9 (Table D3 in appendix D in the first on-line resource file). WhenĤ = 0:95 the 95% confidence interval for Q T .Ĥ/ is approximately (−6, 12). Computing Q forH = 0:94, 0:95, 0:96 respectively yields values equal to −5.27, −0.94 and 5.60, which are all within the confidence interval (−6, 12). In Fig. 6 we have plotted the empirical and the FGN auto-correlation function (with H = 0:95) by using the approximation formula (A.15). This formula is a modified version of the formula that was provided by Hosking (1996). The formula that was given by Hosking (1996) does not seem to perform well with high values of H, whereas we have found that the formula that is given in expression (A.15) seems to work well when H is close to 0.95 (appendix A and Fig. D4 in appendix D in the first on-line resource file). We have also provided a confidence band with 2 standard deviations on either side. We see from Fig. 6 that when H = 0:95 the FGN model tends to underpredict the empirical auto-correlations of the Moberg data but the differences between the theoretical and empirical auto-correlations are less than 2 standard deviations except for the auto-correlation of lag 1. The standard deviations are obtained by the bootstrap method, based on 1000 simulated time series data from the FGN model with length 2000. From the auto-correlation plot we see that the auto-correlation of lag 1 is significantly higher than that predicted by the FGN model with H = 0:95. One possible explanation why the auto-correlation of lag 1 is so high may be measurement error. To realize this, suppose that the reconstructed temperature observations can be expressed as the true temperature values plus measurement error. If the auto-correlation function of the true temperature process is denoted ρ k and the autocorrelation function of the measurement error process {ξ.t/, t 0} (say) is denoted γ k then if the Fig. 6. Empirical and theoretical auto-correlation plots: , empirical; ; FGN model measurement error process is independent of the true temperature process the auto-correlation of the observed process,ρ k , is given byρ So if γ 1 is very high, whereas γ k ∼ = ρ k for k > 1, then we realize that the auto-correlations of lag 1 of the true FGN model and the observed reconstructed temperature process will differ whereas the auto-correlations for higher lags will be approximately equal. The Whittle method may not be robust with respect to departures from the assumed model, especially when the Hurst index is close to 1 (Abry et al., 2000). To check whether the Whittle estimation method is sensitive to a very high auto-correlation of lag 1 we have estimated the FGN model by using only data for half of the sample, namely for the even and odd years respectively. Since the Whittle method is based on the spectral representation of the model we therefore needed to transform the spectral density of the FGN model to the case where the time unit for the observed series is twice the unit of the original series, whereas we maintain the model that is defined at the original time unit. By using the fact that the autocovariance function is the Fourier transform of the spectral density it follows easily that, if g.ω/ is the spectral density of the FGN process, then 0:5g.0:5ω/ will be the spectral density for the model in the case where either the even or the odd years of observations are used. We found that when using the odd years we obtained the estimateĤ = 0:919 with standard error equal to 0.021, and when using the even years we obtained the estimateĤ = 0:932 with standard error equal to 0.022. We note that the last estimate differs from H = 0:95 by only about 1 standard error. These estimation results clearly demonstrate that the Whittle method is not robust against this type of departure from the maintained model.
Next, we shall put forward a possible explanation for why the estimated Hurst index is found to be so high. The argument goes as follows: if the measurement error process and the true (latent) temperature process are independent and additive stationary processes, then it follows from property 1 as discussed above that they are both approximate FGN processes. Hence, theorem 1 implies that the observed process is also approximately FGN with the Hurst parameter equal to the maximum of the Hurst parameters of the true process and the measurement error process. The measurement error process might be highly persistent because of systematic biases in the applied physical methodology linking the observed tree ring and borehole data to the unobserved temperature process. In addition, Moberg et al. (2005a) have used wavelet transform techniques to standardize, smooth and interpolate the observed proxies that were obtained from the respective sources to obtain temperature reconstructions. As a result, it may be that the Hurst exponent of the measurement error process is higher than the Hurst index of the latent temperature process and therefore, according to theorem 1, the Hurst index of the reconstructed series will be higher than the Hurst index of the observed temperature series. In addition, the use of interpolation may be one reason why the auto-correlation of lag 1 is particularly high in these data. Mills (2007) has also analysed the data set that was obtained by Moberg et al. (2005a). As with our analysis, he found that these data are consistent with a model that has long memory properties and can be represented by an auto-regressive fractionally integrated moving average process that is both stationary and mean reverting. He also showed that, if we allow for a smoothly varying underlying trend function, then long memory disappears, implying that recent centuries have been characterized by trend temperatures moving upwards. Thus, his analysis provides another example of the difficulty of discriminating between competing models by statistical arguments alone: see Mills (2010).

Concluding remarks
In this paper we have provided evidence for why the FGN process is a good representation of the temperature process with the Hurst parameter ranging from about 0.63 to 0.95. Both theoretical arguments and empirical tests that were carried out support our claim. A key feature of the FGN model is long-range dependence. The implication of long-range dependence is that temperature series may exhibit long (non-systematic) trends and cycles. The most extreme case is found in the reconstructed data of Moberg et al. (2005a), where some trends last for several hundred years (Fig. 4) but still turn out to be consistent with the FGN model. Furthermore, long-range dependence implies that temperature levels are highly persistent. For example, in the case where the Hurst parameter is as high as 0.9, the correlation between temperatures 100 years apart will be about 0.29. A consequence of this feature of the temperature process is that we cannot achieve reliable claims about systematic changes in temperature levels by using simple statistical trend analyses or ad hoc statistical models.
However, it cannot be ruled out that temperature data coupled with other types of data or information, and models based on geophysical processes, might result in a different picture. Hence, it may be that a systematic change in the temperature levels is under way but that our statistical methods are unable to distinguish such changes from natural temperature variation.
Finally, we have conducted a simulation experiment based on a modification of the FGN model allowing for a linear trend with a positive slope during the last 60 years. The goal was to detect how steep the trend must be before stationarity is rejected when using the test based on the Q-statistics.
where {ξ t , t ∈ N} is a deterministic process, {" t , t ∈ N} is a white noise process and {a k } is a sequence with the property ∞ k=1 a 2 k < ∞: If {U.t/, t ∈ N} is purely stochastic then ξ t is equal to a constant. Recall also that a function L.x/ (say) is slowly varying at ∞ if it has the property that see Resnick (1987), page 13.
Theorem 3. Suppose that {X.q/, q ∈ N} is a causal linear process that satisfies Theorem 3 is a slightly modified version of proposition 4.4.1, page 77, given in Giraitis et al. (2012), where also the proof is given. Provided that H > 0:5 (which is the case in our analysis) then proposition 4.4.4, page 79, in Giraitis et al. (2012) states that {τ −1 m S m .t/, t 0} converges weakly towards the standard FBM with the uniform metric. The FBM has autocovariance function given by .A.1/

A.3. Details of the regression approach based on the empirical characteristic function
Here, we provide the derivation and justification of the regression procedure based on the empirical characteristic function. Let {Y.t/, t 0} be a self-similar process with stationary stable increments. Define the characteristic function ϕ.t − s; λ/ by for real λ where i = √ .−1/: Define the corresponding empirical counterpart of the characteristic function above byφ . A.6/ Clearly, under self-similarity it follows that which shows that the empirical characteristic function defined in equation (A.6) is an unbiased estimator of the corresponding theoretical characteristic function.
From equation (A.6) it follows readily that Since {Y.t/, t 0} is self-similar with stationary stable increments it follows that Thus, an estimator of μ can be obtained as follows (Koutrouvelis, 1980

A.5. Maximum likelihood and the Whittle estimator
Under the assumption that the temperature series is a Gaussian process we can also apply the method of maximum likelihood for estimation. Specifically, given the FGN model then the autocovariance function will depend on only two parameters, namely H and σ. Let Ω.H/ be the matrix with elements where μ = E{X.t/}: In the case where T is large it may be complicated to compute the likelihood function. However, Whittle has demonstrated (see Beran (1994)) that the likelihood can be approximated. This approximate likelihood converges to the exact likelihood as T increases without bounds.
Given H it follows readily from equation (

A.6. Estimation of the auto-correlation function
In the presence of long-range dependence the usual estimator for the auto-correlation function may be seriously biased even if long time series data are available. Let r k be the usual estimator for the autocorrelation ρ k as given by whereX T is the sample mean. Under the condition ρ k ∼ = ck −γ , for large values of k where c is a positive constant and 0 < γ < 1 2 , Hosking (1996) has obtained that E.r k / − ρ k ∼ = −2.1 − ρ k /c .1 − γ/.2 − γ/T γ : . A.12/ In our setting it follows from equation (3.2) that Hosking's condition given above is satisfied and that c = σ 2 H.2H − 1/ and γ = 2 − 2H: The approximation in expression (A.12) becomes better with increasing sample size T: Hence, by inserting these values into approximation (A.12) we obtain that E.r k / − ρ k ∼ = ρ k − 1 T 2−2H : . A.13/ Expression (A.13) can be applied to obtain an asymptotic unbiased estimator for ρ k : Letρ k = r k + T 2H−2 1 + T 2H−2 : . A.14/ Then it follows from expression (A.13) and (A.14) that E.ρ k / ∼ = ρ k , which means thatρ k is approximately an unbiased estimator for ρ k : However, by conducting simulation experiments we have found that formula (A.14) does not produce unbiased estimates when H is large (Fig. D in appendix D in the first on-line resource file). Specifically, we have found that when H = 0:95 the estimator ρ k = r k + 0:9 1:9 .A.15/ yields approximately unbiased estimates.

A.7. Bootstrap estimation and graphical test for the distributed of Q T (Ĥ) whenĤ is stochastic
WhenĤ is stochastic it is not known what the distribution of Q T .Ĥ/ is. However, we can apply the Koutrouvelis (1980) approach to conduct a graphical test for the hypothesis that Q T .Ĥ/ follows a stable distribution whenĤ is normally distributed. For this letĤ j be the bootstrap estimate that is obtained from the jth simulated sample, j = 1, 2, : : : , M. For given real λ, define the empirical characteristic function of ψ.λ/ = 1 M M j=1 exp{iλQ T .Ĥ j /}: Similarly to equation (4.7) it follows from Koutrouvelis (1980) that when Q T .Ĥ j /, j = 1, 2, : : : , M, are independent and identically distributed according to a stable distribution then log{− log |ψ.λ/|} = α log |λ| + log.σ α / + ".λ/ . A.16/ where ".λ/ is an error term that vanishes when M increases without bounds, α is the index parameter, σ is the scale parameter and |ψ.λ/| can conveniently be computed by using the relationship Similarly to Section 4.1 we can use regression analysis to estimate α and σ: Furthermore, one can use equation (A.16) to conduct graphical tests of the hypothesis that the Q-statistic is stable.