An Introduction to Applications of Wavelet Benchmarking with Seasonal Adjustment

Prior to adjustment, accounting conditions between national accounts data sets are frequently violated. Benchmarking is the procedure used by economic agencies to make such data sets consistent. It typically involves adjusting a high frequency time series (e.g. quarterly data) so it becomes consistent with a lower frequency version (e.g. annual data). Various methods have been developed to approach this problem of inconsistency between data sets. This paper introduces a new statistical procedure; namely wavelet benchmarking. Wavelet properties allow high and low frequency processes to be jointly analysed and we show that benchmarking can be formulated and approached succinctly in the wavelet domain. Furthermore the time and frequency localisation properties of wavelets are ideal for handling more complicated benchmarking problems. The versatility of the procedure is demonstrated using simulation studies where we provide evidence showing it substantially outperforms currently used methods. Finally, we apply this novel method of wavelet benchmarking to official Office of National Statistics (ONS) data.


Introduction
National statistics institutes (NSIs) such as the UK's Office for National Statistics (ONS) are responsible for collecting and analysing economic data, e.g. national accounts data and labour data (Cholette and Dagum (2006), chapter 1). Data sets that are collected by such agencies are typically adjusted for a variety of reasons. Benchmarking (the focus of this paper) is an adjustment procedure that is used to make measurements from the same statistical process across different periodicities consistent. Since national accounts data must satisfy specific accounting conditions, benchmarking has important applications. It is well documented for example that unmodified quarterly gross domestic product (GDP) data are not consistent with their annual GDP version (i.e. the quarterly totals do not sum to the corresponding annual value). Since data sets of different periodicities (temporal resolutions) are often collected from different sample surveys and compiled differently, such discrepancies occur naturally as a result of survey errors.
In many cases, for example, a larger sample is used for the less frequent survey; hence the lower frequency series is typically more reliable than its corresponding high frequency version. The aim of benchmarking is to adjust the high frequency series so that it becomes consistent with the lower frequency version while preserving short-term fluctuations. The low frequency and adjusted high frequency time series are referred to as the benchmark and benchmarked series respectively.
Benchmarking can be considered as a subclass of signal extraction problems. Current literature can be classified as providing either numerical or model-based solutions. Denton (1971) approached benchmarking by using a numerical method based on quadratic minimization. A penalty function defined by the user specifies this minimization procedure. Cholette and Dagum (1994) expressed benchmarking in terms of a stochastic regression model; hence a regressiontype solution is provided. Owing to difficulties in estimating parameters, NSIs typically simplify the benchmarking method that was originally proposed by Cholette and Dagum. However, since it has a regression setting, confidence intervals can be obtained and so uncertainty about point estimates can be quantified. In practice, NSIs often implement methods which make simplifying assumptions to allow for easier estimation and greater transparency of the model.
In this paper, we present a new non-parametric methodology for benchmarking. It is based on the natural idea that the time series can be decomposed into different timescale components, and these components are subsequently used to constrain the high frequency series. Wavelets (Daubechies, 1992) provide a natural time-frequency decomposition and can adapt to local conditions in the time series. This is important in macroeconomic times series routinely analysed by the ONS. Wavelets extend the ideas of Fourier decompositions by removing the assumption of stationarity in the time series. By combining data sets from different wavelet decomposition levels, and making use of the unbalanced Haar (UH) decomposition (Fryzlewicz, 2007) to account for the non-dyadic nature of the analysis, our proposed method can reconstruct a benchmarked series with high frequency components that still satisfy the low frequency constraints.
Outliers and abrupt structural changes are commonplace in observed time series. Current methods provide global benchmarking solutions; hence volatile regions of the high frequency series have the potential to introduce artefacts in the benchmarked series. The time-frequency localization properties of wavelets (Percival and Walden (2000), page 59) provide a local solution to benchmarking and thus overcome such a problem.
In addition, NSIs frequently publish a seasonally adjusted version of the high frequency series. Seasonal adjustment is another procedure that is applied to data to remove unwanted effects (Findley, 2005), but care must be taken when combining seasonal adjustment and benchmarking. Along with adjustments for calendar effects (e.g. trading day effects) a version of benchmarking must be applied so that both the original and the seasonally adjusted high frequency series satisfy the benchmark constraint. We show that, by using a suitable seasonal model, wavelet benchmarking and seasonal adjustment can be combined within the same framework.
The paper proceeds as follows. Section 2 provides an introduction to current benchmarking methods and a short introduction to wavelets. Section 3 describes the process of benchmarking in the wavelet domain. Additional issues which require consideration such as thresholding and seasonal adjustment are also discussed. In Section 4 wavelet benchmarking is applied to a variety of simulated data and official ONS data. Section 5 concludes the paper. Details on the simulation implementations are given in Appendices A-C.
The data that are analysed in the paper and the programs that were used to analyse them can be obtained from http://wileyonlinelibrary.com/journal/rss-datasets

Background
The requirement of benchmarking is frequently demanded by the ONS. Currently a variety of benchmarking methods have been proposed in Denton (1971), Cholette and Dagum (1994), Durbin and Quenneville (1997), Quenneville et al. (2013) and Di Fonzo and Marini (2012) to name only a few. Currently the Cholette and Dagum method is the preferred method of benchmarking within the ONS (Brown et al., 2012). However, this can also incorporate Denton benchmarking so we shall therefore consider these two approaches and provide a comparison of wavelet benchmarking with them. Consider the following introductory example. A quarterly GDP time series needs to be benchmarked to an annual GDP time series; typically the annual series is less noisy than its quarterly version. To simplify the benchmarking procedure many NSIs assume that such an annual series is not contaminated with noise, and hence the high periodicity series must equate to the lower periodicity series when suitably aggregated (binding benchmarking). Throughout this paper the above example of quarterly-annual benchmarking is used to provide a concrete description; however, the methodology is applicable to general periodicity relationships. For completeness the following discussion expresses benchmarking in a general and more formal way.
Let Y H T,t and Y L T,s describe the true evolution of high (i.e. quarterly) and low (i.e. annual) frequency time series. The disturbance terms H t and L s contaminate these time series, with Y H O,t and Y L O,s denoting the observed noisy versions respectively. This is summarized as follows: Here g s .·/ represents some function linking the unobserved true low frequency series to the true unobserved high frequency series. Often g s .·/ is a summation over a small range, Y H T = .Y H T,1 , : : : , Y H T,n /, Y L T = .Y H L,1 , : : : , Y L T,m / and f = n=m denotes the aggregation order between the two series. In the setting of quarterly to annual binding benchmarking (binding benchmarking assumes that the low frequency series is non-noisy, i.e. L Conditionally on the benchmarking procedure that is implemented, additional information summarizing statistical features, such as the time series correlation structure or estimates of model parameters, may be present. In particular for the parametric or non-parametric approach, the matrix A is respectively explicitly or implicitly data dependent. This paper considers a particular type of benchmarking, namely binding benchmarking of flow variables; in the example of quarterly to annual benchmarking the sum of the four quarterly values must equal the corresponding value from the annual series. Other types of benchmarking exist such as ensuring that the beginnings of time period values are equal between two series. Implementation of these benchmarking methods simply requires a different specification of the benchmarking matrix A. 2.1. Denton method Denton (1971) benchmarking, which was the first widely used benchmarking procedure, is based on the principle of movement preservation. This ensures that the benchmarked high frequency seriesŶ H T evolves similarly to the observed series Y H O (i.e.Ŷ H T is approximately a level shift or proportionate to Y H O depending on which Denton method is implemented). As described in Cholette and Dagum (2006), chapter 6, the Denton method has the following underlying model for discrete data: 3) giving the binding benchmarking constraint. In quarterly to annual binding benchmarking j s,t = 1, with p s,1 and p s,4 representing the beginning and end quarters corresponding to year s respectively. Two primary variants of Denton benchmarking are additive and proportional differencing with each best suited for additive and multiplicative time series respectively. (Although simulations generated in Section 4 have an additive form, the proportional Denton variant is also considered since it is the most commonly used version of Denton benchmarking.) Additive first differencing keeps the discrepancy between the benchmarked and original seriesŶ H T,t − Y H O,t as close as possible to a constant by minimizing the following objective function (equation (2.4)) subject to the benchmark constraint (equation (2.5)) being satisfied:  Denton (1971) devised the following solution based on Lagrangian optimization: . where j and 0 are f = n=m-dimensional column vectors taking values 1 and 0 respectively. In the aforementioned example, j = .1111/ , B annualizes the quarterly series and D calculates the differences between the error terms t . Defining the matrix G as . 2:9/ Matrix G can be decomposed into the following constituent components (Cholette and Dagum, 2006): . 2:11/ It is important to note that working with matrix D from equation (2.7) means that the modified Denton benchmarking solution as described in Cholette and Dagum (2006) is considered, which avoids the initial condition of the original Denton method and often results in a more satisfactory solution.
It is possible to specify equation (2.4) in terms of higher order additive differences between the original and adjusted series. For example Σ n i=h+1 .Δ hŶ H T,t − Δ h Y H O,t / 2 corresponds to the hth-order additive model with Δ h being the hth difference operator and values outside the adjustment range being defined as Y H O, t = Y H T,t , t = 0, − 1, : : : , 1− h. Section 4 implements additive Denton benchmarking with values of h = 1 (we also considered h = 2 but the results were not useful in practice so h = 2 was not further considered) and the proportional Denton variant (matrix D is replaced by D×diag.Y H O // with h = 1. Hereafter such benchmarked series are referred to as the Denton a,1 and Denton p,1 series.
Although it is not computationally demanding and requires only basic assumptions on the structural form of the time series being analysed, the Denton method occasionally performs poorly. This is evident in time series which evolve unconventionally; for example consider a time series containing a small number of extreme data points. In this case, the Denton method would adjust a disproportionate number of data points. As mentioned in Section 1, one motivation for considering wavelets is that their time-frequency localization properties can help to overcome this problem.

Cholette and Dagum method
The Cholette and Dagum (1994) benchmarking method is based on stochastic regression. However, its original structure is typically simplified in NSI applications. As such, the description that was provided in Quenneville et al. (2003) is discussed in what follows. For completeness the methodology that was proposed by Cholette and Dagum can be found in Appendix B. Here H is a zero-mean Gaussian process with auto-correlation matrix σ 2 H Ω H . Ω H is the autocovariance matrix of an auto-regressive AR.1/ process with parameter ρ, and σ 2 H is a nuisance parameter which does not require estimation. The matrix C is an n × n matrix with weights c t ∝ |Y H O,t | λ on the main diagonal and 0 elsewhere. Typically the parameter value for λ is set to 0, 1 2 or 1. The Cholette and Dagum approach used by the ONS sets λ = 0, resulting in the further simplification that C = I. In equation (2.13) J is an annualizing matrix equivalent to matrix B from the Denton method. Equation (2.13) has the interpretation that the sum of the subannual time periods from the high frequency time series Y H T must equal the corresponding annual value from the low frequency time series Y L O (this enforces the benchmarking constraint).
When we assume that the benchmark solution is characterized by equations (2.12)-(2.13), we obtain the following equations: . 2:16/ Equation (2.16) has the interpretation that the estimated high frequency time seriesŶ H T is a linear combination of the observed noisy high frequency time series Y H O and a scaled difference of the observed non-noisy low frequency time series and annualized version of the noisy observed high frequency time series. The matrix K determines the scaling effect. This solution can be expressed in a form that is consistent with equation (2.1) as follows: . 2:17/ As discussed in Appendix B, the ONS uses values of ρ 0:8 and 0:8 3 for monthly and quarterly to annual benchmarking respectively.

Wavelets
Stationarity underpins many time series methods; this assumption is often unreasonable. Wavelets' time or frequency localization enables segmentation of data over various frequency or time levels, thus providing a framework to analyse high (i.e. quarterly data) and low (i.e. annual data) frequency series jointly. Intuitively, wavelets can be seen to perform frequency analysis over localized time segments, producing a joint time-frequency analysis, in a related (but not identical) vein to windowed Fourier analysis. A more formal definition will be given in the next section.
Although wavelets have facilitated recent advances in time series, i.e. alternative modelling of non-stationary processes (Nason et al., 2000), their primary use lies in non-parametric regression and involves removing noise from a statistical process in a non-parametric setting (Donoho and Johnstone, 1994). Subsequent sections show that the combination of a strict benchmarking and thresholding (denoising) step produces a benchmarking procedure which can outperform those currently used.

Unbalanced Haar wavelets
The remainder of this section discusses UH wavelets (Fryzlewicz, 2007). Data sets observed are typically non-dyadic in length (i.e. n = 2 J , J ∈ N). UH wavelets are a generalization of Haar wavelets (Daubechies, 1992) and enable the transformation of such non-dyadic data sets into the wavelet domain. Whereas discontinuities in Haar basis functions occur in the middle of their support (Fig. 1), UH basis functions have discontinuities at arbitrary locations (Fig. 2). Consequently high and low frequency data sets with arbitrary lengths or factor differences can be jointly analysed.

Fig. 2.
Example of a UH wavelet segmentation for a time series of length t D 600 for frequency scales 1, 0, 1 and 2 (for illustration the vertical axes have been rescaled to 1 and 1; the exact scales can be determined by equations (2.18) and (2.20)): (a) ϕ 1,1 . / the average behaviour of the time series. Mother wavelet coefficients w.j, k/, j 0, provide information describing local features. Larger values of j make the region of the time series considered narrower whereas k determines the position considered on the timescale. The following equation resynthesizes the original series Y from the set of wavelet coefficients: w.j, k/ϕ j,k .t/, t = 1, : : : , n: .2:23/ Equation (2.23) expresses Y as a weighted linear combination of the elementary father wavelet and mother wavelets across various frequency and translation levels. Weights are given by their corresponding wavelet coefficients. This allows reconstruction of the benchmarked series after wavelet analysis.

Methodology
In this section we discuss the selection of wavelet bases that are used to facilitate benchmarking. Elementary wavelet benchmarking is introduced along with an application to simulated data. Finally the additional issue of thresholding and its integration with seasonal adjustment is considered.

Wavelet basis selection 3.1.1. Forming a basis for non-dyadic data by using unbalanced Haar wavelets
This paper segments data by using a basis that is similar to the traditional Haar basis. Only a limited number of UH wavelets are used to construct such a basis. The formation of such a basis is outlined as follows. At each iteration the positive region of the mother wavelet being considered is segmented into a daughter wavelet with the largest possible dyadic region and nondyadic positive region. Its negative region is segmented into a daughter wavelet with positive and negative regions of equal length (Haar segmentation).

Creating a benchmarking basis
The set of break points {b 0,1 , b 1,1 , b 1,2 , : : :} determines the UH wavelet basis. Benchmarking requires the bases for low and high frequency processes to be comparable.
The set of break points L BP is selected by the method that was described in Section 3.1.1. Break points for H BP with overlapping frequency levels with L BP are defined as } can be chosen arbitrarily as they exist on frequency levels that are not affected by elementary benchmarking. To maintain consistency the procedure in Section 3.1.1 is used. The sets L BP and H BP provide the foundation that is required to perform elementary wavelet benchmarking.

Elementary wavelet benchmarking
The construction of wavelet functions ϕ Q j,k .·/ and ϕ A j,k .·/ defined on the sets {1, : : : , n} and {1, : : : , m} respectively is discussed in Section 2.3.1. w j, k/ denote noisy and non-noisy wavelet coefficients from the quarterly and annual time series respectively. J is the highest frequency level of the annual time series; the quarterly time series has two additional frequency levels.
Quarterly wavelet coefficients existing on lower frequency levels of the wavelet domain have corresponding annual wavelet coefficients with similar interpretations, i.e. coefficients existing on frequency levels j = −1, : : : , J. A comparison of elementary quarterly and annual father wavelet coefficients illustrates this: (since the annual GDP series is non-noisy),  Elementary wavelet benchmarking is expressed in a form that is consistent with equation (2.1) as follows. Although computationally inefficient, the wavelet transform can be expressed as an orthogonal matrix; see Nason (2008), chapter 2, for details. Suppose that W Q and W A transform the quarterly and annual series into the wavelet domain respectively: .
The non-noisy annual wavelet coefficients y A T can be incorporated in the noisy quarterly wavelet coefficients y Q O as follows:ŷ As seen from equation (3.3), in the example of quarterly to annual benchmarking c = 2. The benchmarked series {Ŷ H T,t } n t=1 is then calculated as follows:

Thresholding
In } n t=1 produces a more reliable benchmarked series, as it removes spurious noise.
Technical details of thresholding are available in Vidakovic (1999), chapter 6; however, two features of the error term are important. Firstly its structure: in real data sets, random components typically exhibit some form of auto-correlation. Hence independently and identically distributed Gaussian noise is not appropriate. Thus simulations in this paper use an auto-regressive moving average ARMA(1,1) process to generate disturbance terms. Consequently, wavelet coefficients on a given frequency level are correlated; hence thresholding based on Stein's unbiased risk estimator SURE (Percival and Walden (2000), chapter 10) is used. Secondly, we use the method in Percival (1995) for estimating the error variance across different frequency levels.

Thresholding framework
Suppose that Y O,t = Y T,t + t , t = 1, : : : , n, is observed, with t being an error term. Transforming Y O = .Y O,1 , : : : , Y O,n / into the wavelet domain by using the orthonormal matrix W, we have .3:6/ w = .w 1 , : : : , w n / . Typically either hard or soft thresholding (Donoho and Johnstone, 1994) is used. In this paper, we use soft thresholding with estimates obtained as follows:

8/
where λ denotes the threshold value (a parameter depending on the noise variance), sgn.·/ is the sign function and 1.·/ is the indicator function.
If the magnitude of an observed wavelet coefficient is greater than λ it is shrunk in magnitude by λ. Otherwise it is set to 0. As mentioned above λ is estimated based on SURE; such an estimator depends on both the series length and the variance of the noise term. In particular Percival's estimator based on the maximal overlap discrete wavelet transform is used to estimate the variance of wavelet coefficients; Percival (1995) discussed this in more detail. Consequently λ is a data-dependent parameter, i.e. λ = λ.Y O /. Using equation (3.7), estimates of the true wavelet coefficients are obtained as follows:ŵ = diag.z 1 , : : : , z n / Z w w: . 3:9/ The diagonal elements of Z w are z i = .1 −λ=|w i |/1.|w i | λ /, withλ being a threshold estimate. An estimate of the unobserved true series Y T is now obtained aŝ . 3:10/

Alternative seasonal model
As seen in Section 4, in many cases the noisy high frequency series is seasonally adjusted (Durbin and Koopman (2001), chapter 3) before benchmarking or thresholding and is reintroduced afterwards. The seasonal component is unknown and hence must be estimated. The time series data that are studied in this paper are represented in state space form (Durbin and Koopman (2001), chapter 3) as this allows most generic models that are used in seasonal adjustment to be represented in one form. To estimate the seasonal component we apply the Kalman smoother (Durbin and Koopman (2001), chapter 4). A stochastic seasonal model that takes the following form is often used: .3:11/ However, the zero-sum constraint of the seasonal component is violated .i.e. Σ f j=1 γ t+1−j = 0/. Consequently the benchmark constraint will no longer be satisfied once the seasonal estimate has been reintroduced in the series. Therefore the following representation, which allows the seasonal process to vary stochastically while ensuring that the zero-sum constraint is satisfied, is considered: ⎛ ⎝ γ 1,t+1 : : : : . 3:13/ In equation (3.12) any season j within a given year t + 1 takes the value γ j,t+1 and is equal to its value from the previous year γ j,t plus a disturbance term. One way to ensure that the seasonally adjusted series satisfies the benchmark constraint is to define an appropriate correlation structure .var.ω t / = σ 2 ω {I f − .1=f/I f×1 I f×1 }/ between the components of ω t . Therefore the sum of each year's f seasons is constant, i.e. holds. Imposing the above correlation structure results in E.I f×1 ω t / = 0 = var.I f×1 ω t /. This, along with the initialization condition Σ f j=1 γ j,0 = 0, forces the benchmark constraint to hold. To maintain consistency, other components from structural time series models (i.e. trend, slope and error components) are represented in a similar form to equation (3.12). Such models are known as periodic structural time series; more information is provided in Tripodis and Penzer (2004).

Data analysis
We now consider the application of wavelet benchmarking to simulated data and ONS data. The advantages of a wavelet approach to benchmarking discussed in previous sections are supported by diagnostic measures of performance. Since simulated time series are additive, additive methods of benchmarking have been used. However, analogous results hold for multiplicative time series. An algorithmic summary of wavelet benchmarking can be found in Appendix A to aid interpretation of the steps. Initialization and parameter values that were used to generate subsequent simulations are available from http://wileyonlinelibrary.com/journal/ rss-datasets, along with the code to reproduce the simulations.

Revision metric for benchmarking
Subsequent sections assess benchmarking methods by using a number of metrics; mean-squared error (MSE), a revision metric and a growth rate metric.
The MSE metric assesses the performance of simulations but real data sets require an alternative metric since the true high frequency series is unobserved. As mentioned earlier, since published economic data impacts decisions that are made by policy makers, producing a stable benchmarked series is important. Therefore, when current data sets are revised or new data become available, adjustments to a benchmarked series should be minor. In particular the effect on latter regions of the benchmarked series is most important since these points describe most recent economic conditions. The following metric measures the sensitivity of the latter regions of a benchmarked series when the observed high and low frequency series are adjusted.
Consider quarterly to annual GDP benchmarking; the series {Y . We compare each of these new benchmarked series and the original benchmarked series {Ŷ Q T,t } t=1,:::,n . In subsequent sections, the mean of these differences for a suitably chosen p, which will depend on the length of the data, will be referred to as the revision metric. In particular, this metric will be 0 for both the original series and elementary wavelet benchmarking. In this case, additional data have no effect on the estimated high frequency series at earlier time points.

Growth rate metric for benchmarking
Ideally benchmarked series published by NSIs would additionally preserve the movements of the true original high frequency series. Hence the growth rate metric that is now described is used to measure the ability of a particular benchmarking method to preserve the movements of a series. Denote {Y H T,t } n t=1 and {Ŷ H T,t } n t=1 as the true observed high frequency series and benchmarked series respectively. The following metric is used: . 4:2/ When considering simulated data this growth rate can be calculated exactly. However, when analysing real data sets Y H T,t is unobserved and so replaced by Y H O,t . As shown in the simulations, wavelet benchmarking outperforms currently used methods when the growth rate metric compares the true unobserved data and benchmarked data. However, when noisy observed data are compared with benchmarked data, the proportionate Denton method minimizes the growth rate metric. This occurs as a result of the proportionate Denton method's definition; it minimizes the sum of squares between the observed and benchmarked series. Since structural time series model simulations provide a good representation of economic time series being analysed comparisons are made with the true 'unobserved' series. However, if NSIs believe that noisy observations are the best indicator in terms of explaining fluctuations, then the Denton proportionate method provides the optimal solution in terms of this metric.

Dyadic quarterly and annual data
Wavelet benchmarking is applied to simulated quarterly-annual GDP time series. For simplicity, data are dyadic allowing the Haar transform to be applied. Since univariate structural time series models (Durbin and Koopman (2001), chapter 3) adequately describe many economic processes they are used to generate simulations and in particular do not conform to any of the chosen methodologies providing a valid comparison, not biased to any of the underlying benchmarking methods. 500 simulated data sets were generated by the structural time series model in Appendix C. (Parameter and initialization values used to generate simulations in this section can be found at http://wileyonlinelibrary.com/journal/rss-datasets.) Since in certain circumstances performing elementary wavelet benchmarking is sufficient (i.e. small survey error), it is included in the data analysis.
Equation (3.4) decomposed the benchmarked quarterly series into a signal (Y A,Q T,t ) and noisy As mentioned previously, thresholding the noisy component should produce a more reliable series. However, the structural form of the quarterly time series needs to be considered. Its seasonal component exists primarily on the high frequency regions of the wavelet domain. Thresholding has a tendency to interpret such subtle and localized features as noise; consequently thresholding inadvertently removes the seasonal component.
Removing the seasonal component before benchmarking or thresholding and reintroducing it afterwards offers one solution, as seen in Section 3.4. Therefore we used wavelet benchmarking with seasonal adjustment for analysis of the simulations.
MSE, revision metric values (with p = 3, corresponding to three additional years and approximately 5% of data being available) and growth rate metric values (to calculate the average revision and growth rate metric values the median across all 500 simulations is recorded; when the mean is used a small number of simulations can disproportionately affect the average) for the different benchmarking methods are summarized in Table 2. (When additional data are introduced, it should be noted that data sets are no longer dyadic. Hence a traditional Haar basis is no longer appropriate to transform the data from the time to wavelet domain. Therefore a UH Haar basis was used.) Clearly wavelet benchmarking outperforms all previous methods discussed so far; this is illustrated by its MSE values being lower than the other benchmarking methods' corresponding values. In terms of revisions, elementary wavelet benchmarking produces a benchmarked series which is not revised when new data become available. The revision metric value also implies that wavelet benchmarking outperforms currently used methods in terms of producing a stable benchmarked series. (The same results were also qualitatively found for other values of p; the data are not shown.) The growth rate metric also indicates that wavelet benchmarking outperforms other benchmarking methods in terms of preserving movements in the true unobserved series.

Comparison with current methods
In this section and Section 4.5, four additional sets of simulations were generated by using different parameter values. The results of these different simulations along with parameter values can be found at http://wileyonlinelibrary.com/journal/rss-datasets. These results support the conclusion that wavelet benchmarking outperforms currently used methods.
Simulated data from Section 4.3 relied on the unrealistic assumption that both data sets have dyadic length. This assumption can be relaxed and now non-dyadic monthly and quarterly data sets are analysed. Furthermore, the monthly series has a periodicity of 3, resulting in a nondyadic relationship between these two data sets. As in Section 4.3 the model that is specified by equations (C.1)-(C.7) in Appendix C is used to generate the high frequency monthly data. Once again 500 simulations were generated. The MSE, revision metric (p = 3) and growth rate metric values for various benchmarking methods are recorded in Table 3. Fig. 3 shows a boxplot comparing MSE values of the observed series with the benchmarked series.
Results from Table 3 and Fig. 3 are consistent with results from Section 4.3. Elementary wavelet benchmarking performs similarly to currently used benchmarking methods with improvements being offered with wavelet benchmarking. As would be expected wavelet benchmarking outperforms elementary wavelet benchmarking in terms of the MSE for each of the 500 simulations. The revision metric once again implies that wavelet benchmarking produces a more stable benchmarked series in terms of revisions compared with currently used methods. The growth rate metric shows that wavelet benchmarking is the best method in terms of preserving movements in the true series. Such evidence suggests that wavelet benchmarking significantly outperforms currently used methods implemented by NSIs.

Comparison with current methods (shorter series)
Previous examples used simulated time series with long lengths which are not typically seen in  Hence the performance of wavelet benchmarking in this setting is of interest. The same structural time series model as defined by equations (C.1)-(C.7) in Appendix C was used to generate data. The quarterly and monthly time series considered have respective lengths of 10 and 30. Once again 500 simulations were generated with results summarized in Table 4 and Fig. 4. As expected, wavelet benchmarking outperforms both Denton and Cholette and Dagum benchmarking; however, improvements from wavelet benchmarking are reduced. This is reflected by comparing MSE values recorded in Table 3 and Table 4. For shorter time series the revision metric (here with p = 1, given the short length of the series) shows that wavelet benchmarking produces more stable benchmarked series compared with currently used methods. In the shorter series, wavelet benchmarking is the best method for preserving movements in the true unobserved series as seen by the growth rate metric.

Benchmarking a time series with outliers
As mentioned in Section 1 the time-frequency localization properties of wavelets provide a local solution of benchmarking and thus reduce the potential for artefacts to be introduced when dealing with series with outliers and abrupt structural changes. The 500 simulations from Section 4.4 are reconsidered, but within these series an outlier and level shift were introduced in the observed high frequency series (but not in the true unobserved high frequency series). Initialization and parameter values (including outlier values and times of occurrence) can be found at http://wileyonlinelibrary.com/journal/rss-datasets.
The results of these simulations are recorded in Table 5. Table 5 shows that when outliers are present wavelet benchmarking outperforms currently used methods in terms of MSE, revision metric and growth rate metric readings. This results from the ability of wavelets to isolate local features of a time series.

Official Office for National Statistics data
As mentioned previously benchmarking is frequently applied to time series originating from national accounts. The set of national accounts includes GDP, which is a measure of the value of goods and services produced in a geographic area for a particular period (Office for National Statistics, 2012). Data that are used to construct estimates of GDP come from a range of surveys and administrative data sources measuring the value of goods and services produced by different areas of the economy. There are three different measures of GDP (Office for National Statistics, 2012), and each of these can be broken down into different components measuring different areas of the economy. Benchmarking methods are typically applied early in the process of estimating GDP at a level of detail where the series may be disclosive and are therefore not published because of concerns over confidentiality. However, benchmarking is not only applied in national accounts and could be used in other areas of official statistics. For example, the ONS published data on total turnover from the Annual Business Survey (ABS) (Office for National Statistics, 2010a), and also a higher frequency estimate of total turnover from the Monthly Business Survey (Office for National Statistics, 2010b). Estimates from the monthly survey could be used as an indicator variable and benchmarked to ABS data to give a high frequency estimate that is consistent with the ABS. In the following sections we assess the performance of benchmarking methods by using a set of quarterly and annual series from a component of GDP (Section 4.7.1) and a set of monthly and annual series from the Monthly Business Survey and ABS (Section 4.7.2).

Component of UK gross domestic product data
The following section investigates the application of various benchmarking methods to data  from UK national accounts. In particular one component of GDP data is considered. For confidentiality (this is a low level component of GDP), the component cannot be named but the data themselves are available from http://wileyonlinelibrary.com/journal/ rss-datasets. Fig. 5 shows the results of applying quarterly to annual benchmarking to this one component of GDP data. In Fig. 5 the Denton and Cholette and Dagum versions of the benchmarked series perform similarly. The output of wavelet benchmarking is similar to currently used methods; however, in some time periods wavelet benchmarking performs better at preserving movements in the observed quarterly series. One such time period is from 2004, quarter 1, to 2005, quarter 1. This is due to the localized nature of a wavelet benchmarking solution.
In the time period from 2007, quarter 1, to 2008, quarter 1, the observed quarterly time series seems to exhibit a structural break. This structural break most likely is a result of the economic recession, which began in 2007. By creating a wavelet basis that considers the structure of the observed time series, wavelet benchmarking has the ability to offer further improvements in terms of ensuring that movements in the original quarterly time series are maintained in the formation of the benchmarked quarterly time series. In this paper wavelet bases are determined solely by the length of observed time series. Future work could incorporate the structure of these time series during the selection of wavelet bases.  Table 6 records the revision and growth rate metric values for the various benchmarking methods. Wavelet benchmarking produces a stable benchmarked series and on the whole performs similarly to currently used methods. For this example the maximum lag length p = 3 was used, since this corresponds to 20% of the length of the observed series. However, results were not qualitatively different for smaller values of p.

Summary of benchmarking 39 Office for National Statistics quarterly to annual time series
This section considers the mass application of various ONS time series. Monthly to annual benchmarking is considered. However, these time series originate from business surveys as opposed to being components of GDP. For these time series a lag value of p = 1 is used. The length of the annual time series was 6; this restricted the length of the lag considered.
From Table 7 it can be seen that, across the 39 different business survey time series, Cholette and Dagum and proportionate Denton benchmarking have the lowest revision (excluding elementary wavelet benchmarking) and growth rate metrics respectively. However, the difference between these various benchmarking methods is relatively small. Although wavelet benchmarking is no longer conclusively outperforming currently used methods, it still produces stable benchmarked estimates. Fig. 6 illustrates the application of benchmarking of one of the 39 time series that were considered. In particular, it contains information on the turnover of companies manufacturing soft drinks and bottled water. As can been seen all the benchmarking methods perform relatively similarly.

Discussion
Benchmarking is a problem that is frequently encountered by NSIs; this paper has provided an introduction to wavelet-based solutions. Wavelet-based benchmarking consists of a nonparametric and a parametric step. The first step involved introducing true information from the benchmark series into the noisy observed high frequency series via the wavelet domain. Afterwards high frequency wavelet coefficients were thresholded to remove any remaining noise. However, the structural form of time series being analysed had to be considered; in particular the seasonal component is often incorrectly identified as noise and inadvertently removed. Consequently periodic structural time series models were used to adjust the high frequency series seasonally while ensuring that the benchmark constraint was satisfied. After thresholding the seasonally adjusted high frequency series the estimated seasonal component was reintroduced to form the final benchmarked series.
To illustrate wavelet benchmarking both simulated and real data sets were analysed. Simulation studies showed that wavelet benchmarking outperformed currently used methods, whereas the real data showed that elementary wavelet benchmarking has some useful properties in practice.
There are several areas which could be considered to extend work on wavelet benchmarking. Seasonal adjustment could be performed in the wavelet domain, thus allowing the entire benchmarking problem to be considered in the wavelet domain. Secondly, by forcing the benchmarked series to be consistent with the benchmark series there is an implicit and unrealistic assumption that the benchmark series is not contaminated with noise. This assumption can be relaxed; both high and low frequency processes can be treated as noisy. Benchmarking can now be described as optimally combining both high and low frequency processes to create a benchmarked series. The weights would depend on the variances of the distortion terms H t and L t . This results in the additional complexities of estimating the distortion terms' variances along with determining an optimal estimator. Although NSIs rarely consider non-binding benchmarking, it does have applications to areas outside national accounts. Furthermore since national accounts data sets originate from sample surveys an argument could be presented that non-binding benchmarking is more appropriate. However, this would require a substantial change in the manner in which NSIs process such data.
Wavelet benchmarking can also be extended to situations where multiple constraints must be satisfied. One such example occurs when a time series is classified according to periodicity and geographical location. Benchmark constraints need to be satisfied on both individual and aggregate levels; wavelet benchmarking could facilitate this also.
The selection of wavelet bases needs to be considered in greater detail. This paper constructed such bases on the basis of the length of observed time series. Although a reasonable starting point for an introduction to wavelet benchmarking, bases which incorporate the structure of observed time series could be used in future work. It should also be noted that although Denton and Cholette and Dagum benchmarking naturally provide a framework to consider extrapolation this is currently not considered in the case of benchmarking via wavelets and would be of considerable interest to explore in future research. Finally the ONS performs benchmarking on a large number of time series and therefore would require a method of wavelet benchmarking which can be used in a mass production setting.

Appendix B: Cholette and Dagum benchmarking method
This section provides a comprehensive description of Cholette and Dagum (1994) benchmarking. It is based on the following three stochastic equations: 1) decomposes the observed quarterly process into its true unobserved quarterly process .Y Q T = Zδ + θ/ and deterministic .Hb/ and stochastic . H / disturbance terms. Typically H is a vector of 1s and b is a constant column vector forming a bias term capturing the average difference between the observed quarterly .Y Q O / and annual .Y A O / series. Z is an n × p matrix of known regressors and δ is a p × 1 vector of unknown coefficients modelling calendar effects. Although θ may be modelled by a variety of statistical processes, in Cholette and Dagum benchmarking it typically has an auto-regressive integrated moving average structure; this is discussed below. Equation (B.2) decomposes the observed annual series Y A O into its true unobserved annual series Y A T = JZδ + Jθ and a disturbance term L . J is an annualizing matrix equivalent to matrix B from the Denton method. The disturbance component L is assumed to be Gaussian noise.
Matrix S in equation (B.3) transforms the stochastic component θ into a stationary time series; this requires estimating the order of the seasonal and non-seasonal differencing operators. Set θ t = υ t + γ t + t , with υ t being approximately linear, i.e. υ t ≈ a + bt, γ t capturing quarterly seasonality and t being Gaussian random noise. In this scenario, to make θ stationary, the following matrix is required: .B:6/ block.·, · , ·/ denotes a block diagonal matrix. Cholette and Dagum (1994) provided the following solution: α = .X V −1 e X/ −1 X V −1 e y: .B:7/ The benchmarked estimate is given byβ = X Åα , where X Å = .0 Z I n /. The following equation expresses the solution in a form that is consistent with equation ( 3) is removed. Since NSIs usually implement binding benchmarking L = 0. Finally H is modelled as an AR.1/ process. For practical implementation of Cholette and Dagum benchmarking, Cholette and Dagum (2006), chapter 3, recommended setting the AR.1/ parameter value between 0:7 and 0:9 for monthly series and between 0:7 3 and 0:9 3 for quarterly series. For monthly and quarterly time series, the ONS uses parameter values of 0:8 and 0:8 3 (these parameter values can be varied if necessary) respectively (Brown et al., 2012). Naturally such adjustments can in some cases have a negative effect on the accuracy of the benchmarking process. In particular the ONS uses the following form of the Cholette-Dagum model: