Generalized additive models for location, scale and shape
Abstract
Summary. A general class of statistical models for a univariate response variable is presented which we call the generalized additive model for location, scale and shape (GAMLSS). The model assumes independent observations of the response variable y given the parameters, the explanatory variables and the values of the random effects. The distribution for the response variable in the GAMLSS can be selected from a very general family of distributions including highly skew or kurtotic continuous and discrete distributions. The systematic part of the model is expanded to allow modelling not only of the mean (or location) but also of the other parameters of the distribution of y, as parametric and/or additive nonparametric (smooth) functions of explanatory variables and/or random‐effects terms. Maximum (penalized) likelihood estimation is used to fit the (non)parametric models. A Newton–Raphson or Fisher scoring algorithm is used to maximize the (penalized) likelihood. The additive terms in the model are fitted by using a backfitting algorithm. Censored data are easily incorporated into the framework. Five data sets from different fields of application are analysed to emphasize the generality of the GAMLSS class of models.
1. Introduction
The quantity of data collected and requiring statistical analysis has been increasing rapidly over recent years, allowing the fitting of more complex and potentially more realistic models. In this paper we develop a very general regression‐type model in which both the systematic and the random parts of the model are highly flexible and where the fitting algorithm is sufficiently fast to allow the rapid exploration of very large and complex data sets.
Within the framework of univariate regression modelling techniques the generalized linear model (GLM) and generalized additive model (GAM) hold a prominent place (Nelder and Wedderburn (1972) and Hastie and Tibshirani (1990) respectively). Both models assume an exponential family distribution for the response variable y in which the mean μ of y is modelled as a function of explanatory variables and the variance of y, given by V(y)=φ v(μ), depends on a constant dispersion parameter φ and on the mean μ, through the variance function v(μ). Furthermore, for an exponential family distribution both the skewness and the kurtosis of y are, in general, functions of μ and φ. Hence, in the GLM and GAM models, the variance, skewness and kurtosis are not modelled explicitly in terms of the explanatory variables but implicitly through their dependence on μ.
Another important class of models, the linear mixed (random‐effects) models, which provide a very broad framework for modelling dependent data particularly associated with spatial, hierarchical and longitudinal sampling schemes, assume normality for the conditional distribution of y given the random effects and therefore cannot model skewness and kurtosis explicitly.
The generalized linear mixed model (GLMM) combines the GLM and linear mixed model, by introducing a (usually normal) random‐effects term in the linear predictor for the mean of a GLM. Bayesian procedures to fit GLMMs by using the EM algorithm and Markov chain Monte Carlo methods were described by McCulloch (1997) and Zeger and Karim (1991). Lin and Zhang (1999) gave an example of a generalized additive mixed model (GAMM). Fahrmeir and Lang (2001) discussed GAMM modelling using Bayesian inference. Fahrmeir and Tutz (2001) discussed alternative estimation procedures for the GLMM and GAMM. The GLMM and GAMM, although more flexible than the GLM and GAM, also assume an exponential family conditional distribution for y and rarely allow the modelling of parameters other than the mean (or location) of the distribution of the response variable as functions of the explanatory variables. Their fitting often depends on Markov chain Monte Carlo or integrated (marginal dis‐tribution) likelihoods (e.g. Gaussian quadrature), making them highly computationally intensive and time consuming, at least at present, for large data sets where the model selection requires the investigation of many alternative models. Various approximate procedures for fitting a GLMM have been proposed (Breslow and Clayton, 1993; Breslow and Lin, 1995; Lee and Nelder, 1996, 2001a, b). An alternative approach is to use nonparametric maximum likelihood based on finite mixtures; Aitkin (1999).
In this paper we develop a general class of univariate regression models which we call the generalized additive model for location, scale and shape (GAMLSS), where the exponential family assumption is relaxed and replaced by a very general distribution family. Within this new framework, the systematic part of the model is expanded to allow not only the mean (or location) but all the parameters of the conditional distribution of y to be modelled as parametric and/or additive nonparametric (smooth) functions of explanatory variables and/or random‐effects terms. The model fitting of a GAMLSS is achieved by either of two different algorithmic procedures. The first algorithm (RS) is based on the algorithm that was used for the fitting of the mean and dispersion additive models of Rigby and Stasinopoulos (1996a), whereas the second (CG) is based on the Cole and Green (1992) algorithm.
Section 2 formally introduces the GAMLSS. Parametric terms in the linear predictors are considered in Section 3.1, and several specific forms of additive terms which can be incorporated in the predictors are considered in Section 3.2. These include nonparametric smooth function terms, using cubic splines or smoothness priors, random‐walk terms and many random‐effects terms (including terms for simple overdispersion, longitudinal random effects, random‐coefficient models, multilevel hierarchical models and crossed and spatial random effects). A major advantage of the GAMLSS framework is that any combinations of the above terms can be incorporated easily in the model. This is discussed in Section 3.3.
Section 4 describes specific families of distributions for the dependent variable which have been implemented in the GAMLSS. Incorporating censored data and centile estimation are also discussed there. The RS and CG algorithms (based on the Newton–Raphson or Fisher scoring algorithm) for maximizing the (penalized) likelihood of the data under a GAMLSS are discussed in Section 5. The details and justification of the algorithms are given in Appendices B and C respectively. The inferential framework for the GAMLSS is considered in Appendix A, where alternative inferential approaches are considered. Model selection, inference and residual diagnostics are considered in Section 6. Section 7 gives five practical examples. Section 8 concludes the paper.
2. The generalized additive model for location, scale and shape
2.1. Definition
The p parameters θT=(θ1,θ2,…,θp) of a population probability (density) function f(y|θ) are modelled here by using additive models. Specifically the model assumes that, for i=1,2,…,n, observations yi are independent conditional on θi, with probability (density) function f(yi|θi), where θiT=(θi1,θi2,…,θip) is a vector of p parameters related to explanatory variables and random effects. (If covariate values are stochastic or observations yi depend on their past values then f(yi|θi) is understood to be conditional on these values.)
(1)
,
is a parameter vector of length
, Xk is a known design matrix of order
, Zjk is a fixed known n × qjk design matrix and γjk is a qjk‐dimensional random variable. We call model (1) the GAMLSS.
The vectors γjk for j=1,2,…,Jk could be combined into a single vector γk with a single design matrix Zk; however, formulation (1) is preferred here as it is suited to the backfitting algorithm (see Appendix B) and allows combinations of different types of additive random‐effects terms to be incorporated easily in the model (see Section 3.3).
(2)
(3)The first two population parameters θ1 and θ2 in model (1) are usually characterized as location and scale parameters, denoted here by μ and σ, whereas the remaining parameter(s), if any, are characterized as shape parameters, although the model may be applied more generally to the parameters of any population distribution.
(4)The GAMLSS model (1) is more general than the GLM, GAM, GLMM or GAMM in that the distribution of the dependent variable is not limited to the exponential family and all parameters (not just the mean) are modelled in terms of both fixed and random effects.
2.2. Model estimation
Crucial to the way that additive components are fitted within the GAMLSS framework is the backfitting algorithm and the fact that quadratic penalties in the likelihood result from assuming a normally distributed random effect in the linear predictor. The resulting estimation uses shrinking (smoothing) matrices within a backfitting algorithm, as shown below.
Assume in model (1) that the γjk have independent (prior) normal distributions with
, where
is the (generalized) inverse of a qjk × qjk symmetric matrix Gjk=Gjk(λjk), which may depend on a vector of hyperparameters λjk, and where if Gjk is singular then γjk is understood to have an improper prior density function proportional to
. Subsequently in the paper we refer to Gjk rather than to Gjk(λjk) for simplicity of notation, although the dependence of Gjk on hyperparameters λjk remains throughout.
The assumption of independence between different random‐effects vectors γjk is essential within the GAMLSS framework. However, if, for a particular k, two or more random‐effect vectors are not independent, they can be combined into a single random‐effect vector and their corresponding design matrices Zjk into a single design matrix, to satisfy the condition of independence.
(5)
is the log‐likelihood function of the data given θi for i=1,2,…,n. This is equivalent to maximizing the extended or hierarchical likelihood defined by

(6)The hyperparameters λ can be fixed or estimated. In Appendix A.2 we propose four alternative methods of estimation of λ which avoid integrating out the random effects.
2.3. Comparison of generalized additive models for location, scale and shape and hierarchical generalized linear models
Lee and Nelder (1996, 2001a) developed hierarchical generalized linear models. In the notation of the GAMLSS, they use, in general, extended quasi‐likelihood to approximate the conditional distribution of y given θ=(μ,φ), where μ and φ are mean and scale parameters respectively, and any conjugate distribution for the random effects γ (parameterized by λ). They model predictors for μ, φ and λ in terms of explanatory variables, and the predictor for μ also includes random‐effects terms. Lee and Nelder (1996, 2001a) assumed independent random effects, whereas Lee and Nelder (2001b) relaxed this assumption to allow correlated random effects.
However, extended quasi‐likelihood does not provide a proper distribution which integrates or sums to 1 (and the integral or sum cannot be obtained explicitly, varies between cases and depends on the parameters of the model). In large samples this has been found to lead to serious inaccuracies in the fitted global deviance, even for the gamma distribution (see Stasinopoulos et al. (2000)), resulting potentially in a misleading comparison with a proper distribution. It is also quite restrictive in the shape of distributions that are available for y given θ, particularly for continuous distributions where it is unsuitable for negatively skew data, or for platykurtic data or for leptokurtic data unless positively skewed. In addition, hierarchical generalized linear models allow neither explanatory variables nor random effects in the predictors for the shape parameters of f(y|θ).
3. The linear predictor
3.1. Parametric terms
In the GAMLSS (1) the linear predictors ηk, for k=1,2,…,p, comprise a parametric component Xkβk and additive components Zjkγjk, for j=1,…,Jk. The parametric component can include linear and interaction terms for explanatory variables and factors, polynomials, fractional polynomials (Royston and Altman, 1994) and piecewise polynomials (with fixed knots) for variables (Smith, 1979; Stasinopoulos and Rigby, 1992).
Non‐linear parameters can be incorporated into the GAMLSS (1) and fitted by either of two methods:
- (a)
the profile or
- (b)
the derivative method.
In the profile fitting method, estimation of non‐linear parameters is achieved by maximizing their profile likelihood. An example of the profile method is given in Section 7.1 where the age explanatory variable is transformed to x=ageξ where ξ is a non‐linear parameter. In the derivative fitting method, the derivatives of a predictor ηk with respect to non‐linear parameters are included in the design matrix Xk in the fitting algorithm; see, for example, Benjamin et al. (2003). Lindsey (http://alpha.luc.ac.be/jlindsey/) has also considered modelling parameters of a distribution as non‐linear functions of explanatory variables.
3.2. Additive terms
The additive components Zjkγjk in model (1) can model a variety of terms such as smoothing and random‐effect terms as well as terms that are useful for time series analysis (e.g. random walks). Different additive terms that can be included in the GAMLSS will be discussed below. For simplicity of exposition we shall drop the subscripts j and k in the vectors and matrices, where appropriate.
3.2.1. Cubic smoothing splines terms
. Following Reinsch (1967), the maximizing functions h(t) are all natural cubic splines and hence can be expressed as linear combinations of their natural cubic spline basis functions Bi(t) for i=1,2,…,n (de Boor, 1978; Schumaker, 1993), i.e.
. Let h=h(x) be the vector of evaluations of the function h(t) at the values x of the explanatory variable X (which is assumed to be distinct for simplicity of exposition). Let N be an n × n non‐singular matrix containing as its columns the n‐vectors of evaluations of functions Bi(t), for i=1,2,…,n, at x. Then h can be expressed by using coefficient vector δ as a linear combination of the columns of N by h=Nδ. Let Ω be the n × n matrix of inner products of the second derivatives of the natural cubic spline basis functions, with (r,s)th entry given by


The model can be formulated as a random‐effects GAMLSS (1) by letting γ=h, Z=In, K=N‐TΩN−1 and G=λK, so that h∼Nn(0,λ−1K−), a partially improper prior (Silverman, 1985). This amounts to assuming complete prior uncertainty about the constant and linear functions and decreasing uncertainty about higher order functions; see Verbyla et al. (1999).
3.2.2. Parameter‐driven time series terms and smoothness priors
First assume that an explanatory variable X has equally spaced observations xi, i=1,…,n, sorted into the ordered sequence x(1)<…<x(i)<…<x(n) defining an equidistant grid on the x‐axis. Typically, for a parameter‐driven time series term, X corresponds to time units as days, weeks, months or years. First‐ and second‐order random walks, denoted as rw(1) and rw(2), are defined respectively by h[x(i)]=h[x(i−1)]+ɛi and h[x(i)]=2 h[x(i−1)]−h[x(i−2)]+ɛi with independent errors ɛi∼N(0,λ−1) for i>1 and i>2 respectively, and with diffuse uniform priors for h[x(1)] for rw(1) and, in addition, for h[x(2)] for rw(2). Let h=h(x); then D1h∼Nn−1(0,λ−1I) and D2h∼Nn−2(0,λ−1I), where D1 and D2 are (n−1) × n and (n−2) × n matrices giving first and second differences respectively. The above terms can be included in the GAMLSS framework (1) by letting Z=In and G=λK so that γ=h∼N(0,λ−1K−), where K has a structured form given by
or
for rw(1) or rw(2) respectively; see Fahrmeir and Tutz (2001), pages 223–225 and 363–364. (The resulting quadratic penalty λhTKh for rw(2) is a discretized version of the corresponding cubic spline penalty term
.) Hence many of the state space models of Harvey (1989) can be incorporated in the GAMLSS framework.
The more general case of a non‐equally spaced variable X requires modifications to K (Fahrmeir and Lang, 2001), where X is any continuous variable and the prior distribution for h is called a smoothness prior.
3.2.3. Penalized splines terms
Smoothers in which the number of basis functions is less than the number of observations but in which their regression coefficients are penalized are referred to as penalized splines or P‐splines; see Eilers and Marx (1996) and Wood (2001). Eilers and Marx (1996) used a set of qB‐spline basis functions in the explanatory variable X (whose evaluations at the values x of X are the columns of the n × q design matrix Z in equation (1)). They suggested the use of a moderately large number of equal‐spaced knots (i.e. between 20 and 40), at which the spline segments connect, to ensure enough flexibility in the fitted curves, but they imposed penalties on the B‐spline basis function parameters γ to guarantee sufficient smoothness of the resulting fitted curves. In effect they assumed that Drγ∼Nn−r(0,λ−1I) where Dr is a (q−r) × q matrix giving rth differences of the q‐dimensional vector γ. (The same approach was used by Wood (2001) but he used instead a cubic Hermite polynomial basis rather than a B‐spine. He also provided a way of estimating the hyperparameters by using generalized cross‐validation (Wood, 2000).) Hence, in the GAMLSS framework (1), this corresponds to letting G=λK so that γ∼N(0,λ−1K−) where
.
3.2.4. Other smoothers
Other smoothers can be used as additive terms, e.g. the R implementation of a GAMLSS allows local regression smoothers, loess; Cleveland et al. (1993).
3.2.5. Varying‐coefficient terms
Varying‐coefficient models (Hastie and Tibshirani, 1993) allow a particular type of interaction between smoothing additive terms and continuous variables or factors. They are of the form r h(x) where r and x are vectors of fixed values of the explanatory variables R and X. It can be shown that they can be incorporated easily in the GAMLSS fitting algorithm by using a smoothing matrix in the form of equation (6) in the backfitting algorithm, with Z=In, K=N−TΩN−1 and G=λK as in Section 3.2.1 above, but, assuming that the values of R are distinct, with the diagonal matrix of iterative weights W multiplied by
and the partial residuals ɛi divided by ri for i=1,2,…,n.
3.2.6. Spatial (covariate) random‐effect terms
Besag et al. (1991) and Besag and Higdon (1999) considered models for spatial random effects with singular multivariate normal distributions, whereas Breslow and Clayton (1993), Lee and Nelder (2001b) and Fahrmeir and Lang (2001) considered incorporating these spatial terms in the predictor of the mean in GLMMs. In model (1) the spatial terms can be included in the predictor of one or more of the location, scale and shape parameters. For example consider an intrinsic autoregressive model (Besag et al., 1991), in which the vector of random effects for q geographical regions γ=(γ1,γ2,…,γq)T has an improper prior density that is proportional to
, denoted γ∼Nq(0,λ−1K−), where the elements of the q × q matrix K are given by kmm=nm where nm is the total number of regions adjacent to region m and kmt=−1 if regions m and t are adjacent, and kmt=0 otherwise, for m=1,2,…,q and t=1,2,…,q. This model has the attractive property that, conditional on λ and γt for t≠m, then
where the summation is over all regions which are neighbours of region m. This is incorporated in a GAMLSS by setting Z=Iq and G=λK.
3.2.7. Specific random‐effects terms
Lee and Nelder (2001b) considered various random‐effect terms in the predictor of the mean in GLMMs. Many specific random‐effects terms can be incorporated in the predictors in model (1) including the following.
- (a)
An overdispersion term: in model (1) let Z=In and γ∼Nn(0,λ−1In); then this provides an overdispersion term for each observation (i.e. case) in the predictor.
- (b)
A one‐factor random‐effect term: in model (1) let Z be an n × q incidence design matrix (for a q‐level factor) defined by elements zit=1 if the ith observation belongs to the tth factor level, and otherwise zit=0, and let γ∼Nq(0,λ−1Iq); then this provides a one‐factor random‐effects model.
- (c)
A correlated random‐effects term: in model (1), since γ∼N(0,G−), correlated structures can be applied to the random effects by a suitable choice of the matrix G, e.g. first‐ or second‐order random walks, first‐ or second‐order autoregressive, (time‐dependent) exponential decaying and compound symmetry correlation models.
3.3. Combinations of terms
Any combinations of parametric and additive terms can be combined (in the predictors of one or more of the location, scale or shape parameters) to produce more complex terms or models.
3.3.1. Combinations of random‐effect terms
3.3.1.1. Two‐level longitudinal repeated measurement design. Consider a two‐level design with subjects as the first level, where yij for i=1,2,…,nj are repeated measurements at the second level on subject j, for j=1,2,…,J. Let η be a vector of predictor values, partitioned into values for each subject, i.e.
of length
. Let Zj be an n × qj design matrix (for random effects γj for subject j) having non‐zero values for the nj rows corresponding to subject j, and assume that the γj are all independent with
, for j=1,2,…,J. (The Zj‐matrices and random effects γj for j=1,2,…,J could alternatively be combined into a single design matrix Z and a single random vector γ.)
3.3.1.2. Repeated measures with correlated random‐effects terms. In Section 3.3.1.1, set qj=nj and set the non‐zero submatrix of Zj to be the identity matrix Inj, for j=1,2,…,J. This allows various covariance or correlation structures in the random effects of the repeated measurements to be specified by a suitable choice of matrices Gj, as in point (c) in Section 3.2.7.
3.3.1.3. Random‐ (covariate) coefficients terms. In Section 3.3.1.1 for j=1,2,…,J, set qj=q and Gj=G, i.e. γj∼Nq(0,G−1), and set the non‐zero submatrix of the design matrices Zj suitably by using the covariate(s). This allows the specification of random (covariate) coefficient models.
3.3.1.4. Multilevel (nested) hierarchical model terms. Let each level of the hierarchy be a one‐factor random‐effect term as in point (b) in Section 3.2.7.
3.3.1.5. Crossed random‐effect terms. Let each of the crossed factors be a one‐factor random‐effect term as in point (b) in Section 3.2.7.
3.3.2. Combinations of random effects and spline terms
There are many useful combinations, e.g. combining random (covariate) coefficients and cubic smoothing spline terms in the same covariate.
3.3.3. Combinations of spline terms
For example, combining cubic smoothing spline terms in different covariates gives the additive model; Hastie and Tibshirani (1990).
4. Specific families of population distribution f(y|θ)
4.1. General comments
The population probability (density) function f(y|θ) in model (1) is deliberately left general with no explicit conditional distributional form for the response variable y. The only restriction that the R implementation of a GAMLSS (Stasinopoulos et al., 2004) has for specifying the distribution of y is that the function f(y|θ) and its first (and optionally expected second and cross‐) derivatives with respect to each of the parameters of θ must be computable. Explicit derivatives are preferable but numerical derivatives can be used (resulting in reduced computational speed). Table 1 shows a variety of one‐, two‐, three‐ and four‐parameter distributions that the authors have successfully implemented in their software. Johnson et al. (1993, 1994, 1995) are the classic references on distributions and cover most of the distributions in Table 1. More information on those distributions which are not covered is provided in Section 4.2. Clearly Table 1 provides a wide selection of distributions from which to choose, but to extend the list to include other distributions is a relatively easy task. For some of the distributions that are shown in Table 1 more that one parameterization has been implemented.
| Number of parameters | Distribution |
|---|---|
| Discrete, one parameter | Binomial |
| Geometric | |
| Logarithmic | |
| Poisson | |
| Positive Poisson | |
| Discrete, two parameters | Beta–binomial |
| Generalized Poisson | |
| Negative binomial type I | |
| Negative binomial type II | |
| Poisson–inverse Gaussian | |
| Discrete, three parameters | Sichel |
| Continuous, one parameter | Exponential |
| Double exponential | |
| Pareto | |
| Rayleigh | |
| Continuous, two parameters | Gamma |
| Gumbel | |
| Inverse Gaussian | |
| Logistic | |
| Log‐logistic | |
| Normal | |
| Reverse Gumbel | |
| Weibull | |
| Weibull (proportional hazards) | |
| Continuous, three parameters | Box–Cox normal (Cole and Green, 1992) |
| Generalized extreme family | |
| Generalized gamma family (Box–Cox gamma) | |
| Power exponential family | |
| t‐family | |
| Continuous, four parameters | Box–Cox t |
| Box–Cox power exponential | |
| Johnson–Su original | |
| Reparameterized Johnson–Su |


Quantile residuals (Section 6.2) are obtained easily provided that the cumulative distribution function (CDF) can be computed, and centile estimation is achieved easily provided that the inverse CDF can be computed. This applies to the continuous distributions in Table 1 which transform to simple standard distributions, whereas the CDF and inverse CDF of the discrete distributions can be computed numerically, if necessary.
Censoring can be incorporated easily in a GAMLSS. For example, assume that an observation is randomly right censored at value y; then its contribution to the log‐likelihood l is given by log {1−F(y|θ)}, where F(y|θ) is the CDF of y. Hence, the incorporation of censoring requires functions for computing F(y|θ) and also its first (and optionally expected second and cross‐) derivatives with respect to each of the parameters (θ1,θ2,…,θp) in the fitting algorithm. This has been found to be straightforward for the distributions in Table 1 for which an explicit form for the CDF exists. Similarly, truncated distributions are easily incorporated in a GAMLSS.
4.2. Specific distributions
Many three‐ and four‐parameter families of continuous distribution for y can be defined by assuming that a transformed variable z, obtained from y, has a simple well‐known distribution.
(7)Cole and Green (1992) were the first to model all three parameters of a distribution as nonparametric smooth functions of a single explanatory variable.
The generalized gamma family for y>0, as parameterized by Lopatatzidis and Green (2000), denoted by GG(μ,σ,ν), assumes that z has a gamma GA(1,σ2ν2) distribution with mean 1 and variance σ2ν2, where z=(y/μ)ν, for ν>0.


The Student t‐family for −∞<y<∞ (e.g. Lange et al. (1989)), denoted by TF(μ,σ,ν), assumes that z has a standard t‐distribution with ν degrees of freedom, where z=(y−μ)/σ.
The four‐parameter Box–Cox t‐family for y>0, denoted by BCT(μ,σ,ν,τ), is defined by assuming that z given by expression (7) has a standard t‐distribution with τ degrees of freedom; Rigby and Stasinopoulos (2004a).
The Box–Cox power exponential family for y>0, denoted BCPE(μ,σ,ν,τ), is defined by assuming that z given by expression (7) has a standard power exponential distribution; Rigby and Stasinopoulos (2004b). This distribution is useful for modelling (positive or negative) skewness combined with (lepto or platy) kurtosis in continuous data.
The Johnson–Su family for −∞<y<∞, denoted by JSU0(μ,σ,ν,τ) (Johnson, 1949), is defined by assuming that z=ν+τ sinh −1{(y−μ)/σ} has a standard normal distribution. The reparameterized Johnson–Su family, denoted by JSU(μ,σ,ν,τ), has mean μ and standard deviation σ for all values of ν and τ.
5. The algorithms
Two basic algorithms are used for maximizing the penalized likelihood that is given in equation (5). The first, the CG algorithm, is a generalization of the Cole and Green (1992) algorithm (and uses the first and (expected or approximated) second and cross‐derivatives of the likelihood function with respect to the parameters θ). However, for many population probability (density) functions f(y|θ) the parameters θ are information orthogonal (since the expected values of the cross‐derivatives of the likelihood function are 0), e.g. location and scale models and dispersion family models, or approximately so. In this case the simpler RS algorithm, which is a generalization of the algorithm that was used by Rigby and Stasinopoulos (1996a, b) for fitting mean and dispersion additive models (and does not use the cross‐derivatives), is more suited. The parameters θ are fully information orthogonal for only the negative binomial, gamma, inverse Gaussian, logistic and normal distributions in Table 1. Nevertheless, the RS algorithm has been successfully used for fitting all the distributions in Table 1, although occasionally it can be slow to converge. Note also that the RS algorithm is not a special case of the CG algorithm, as explained in Appendix B.
The object of the algorithms is to maximize the penalized likelihood function lp, given by equation (5), for fixed hyperparameters λ. The details of the algorithms are given in Appendix B, whereas the justification that the CG algorithm maximizes the penalized likelihood lp, given by equation (5), is provided in Appendix C. The justification for the RS algorithm is similar.
The algorithms are implemented in the option method in the function gamlss()within the R package GAMLSS (Stasinopoulos et al., 2004), where a combination of both algorithms is also allowed. The major advantages of the two algorithms are
- (a)
the modular fitting procedure (allowing different model diagnostics for each distribution parameter),
- (b)
easy addition of extra distributions,
- (c)
easy addition of extra additive terms and
- (d)
easily found starting values since they only require initial values for the θ‐ rather than for the β‐parameters.
The algorithms have generally been found to be stable and fast using very simple starting values (e.g. constants) for the θ‐parameters.
Clearly, for a specific data set and model, the (penalized) likelihood can potentially have multiple local maxima. This is investigated by using different starting values and has generally not been found to be a problem in the data sets that were analysed, possibly because of the relatively large sample sizes that were used.
Singularities in the likelihood function that are similar to those that were reported by Crisp and Burridge (1994) can potentially occur in specific cases within the GAMLSS framework, especially when the sample size is small. The problem can be alleviated by appropriate restrictions on the scale parameter (penalizing it for going close to 0).
6. Model selection
6.1. Statistical modelling
Let ℳ={𝒟,𝒢,𝒯,λ} represent the GAMLSS, where
- (a)
𝒟 specifies the distribution of the response variable,
- (b)
𝒢 specifies the set of link functions (g1,…,gp) for parameters (θ1,…,θp),
- (c)
𝒯 specifies the set of predictor terms (t1,…,tp) for predictors (η1,…,ηp) and
- (d)
λ specifies the set of hyperparameters.
For a specific data set, the GAMLSS model building process consists of comparing many different competing models for which different combinations of components ℳ={𝒟,𝒢,𝒯,λ} are tried.
Inference about quantities of interest can be made either conditionally on a single selected ‘final’ model or by averaging between selected models. Conditioning on a single final model was criticized by Draper (1995) and Madigan and Raftery (1994) since it ignores model uncertainty and generally leads to an underestimation of the uncertainty about quantities of interest. Averaging between selected models can reduce this underestimation; Hjort and Claeskens (2003).
As with all scientific inferences the determination of the adequacy of any model depends on the substantive question of interest and requires subject‐specific knowledge.
6.2. Model selection, inference and diagnostics
For parametric GAMLSS models each model ℳ of the form (2) can be assessed by its fitted global deviance GD given by
where
. Two nested parametric GAMLSS models, ℳ0 and ℳ1, with fitted global deviances GD0 and GD1 and error degrees of freedom dfe0 and dfe1 respectively may be compared by using the (generalized like‐lihood ratio) test statistic Λ=GD0−GD1 which has an asymptotic χ2‐distribution under ℳ0, with degrees of freedom d=dfe0−dfe1 (given that the regularity conditions are satis‐fied). For each model ℳ the error degrees of freedom parameter dfe is defined by
, where dfθk are the degrees of freedom that are used in the predictor model for parameter θk for k=1,…,p.
For comparing non‐nested GAMLSSs (including models with smoothing terms), to penalize overfitting the generalized Akaike information criterion GAIC (Akaike, 1983) can be used. This is obtained by adding to the fitted global deviance a fixed penalty # for each effective degree of freedom that is used in a model, i.e. GAIC(#)=GD+#df, where df denotes the total effective degrees of freedom used in the model and GD is the fitted global deviance. The model with the smallest value of the criterion GAIC(#) is then selected. The Akaike information criterion AIC (Akaike, 1974) and the Schwarz Bayesian criterion SBC (Schwarz, 1978) are special cases of the GAIC(#) criterion corresponding to #=2 and #= log (n) respectively. The two criteria, AIC and SBC, are asymptotically justified as predicting the degree of fit in a new data set, i.e. approximations to the average predictive error. A justification for the use of SBC comes also as a crude approximation to Bayes factors; Raftery (1996, 1999). Claeskens and Hjort (2003) considered a focused information criterion in which the criterion for model selection depends on the objective of the study, in particular on the specific parameter of interest. Using GAIC(#) allows different penalties # to be tried for different modelling purposes. The sensitivity of the selected model to the choice of # can also be investigated.
For GAMLSSs with hyperparameters λ, the hyperparameters can be estimated by one of the methods that are described in Appendix A.2. Different random‐effect models (for the same fixed effects models) can be compared by using their maximized (Laplace approximated) profile marginal likelihood of λ (eliminating both fixed and random effects),
, given by equation (14) in Appendix A.2.3 in the way that Lee and Nelder (1996, 2001a, b) used their adjusted profile h‐likelihood. Different fixed effects models (for the same random‐effects models) can be compared by using their approximate maximized (Laplace approximated) marginal likelihood of β (eliminating the random effects γ), i.e.
, where
evaluated at
and lh is defined in Section 2.2, conditional on chosen hyperparameters.
To test whether a specific fixed effect predictor parameter is different from 0, a χ2‐test is used, comparing the change in global deviance Λ for parametric models (or the change in the approximate marginal deviance (eliminating the random effects) for random‐effects models) when the parameter is set to 0 with a
critical value. Profile (marginal) likelihood for fixed effect model parameters can be used for the construction of confidence intervals. The above test and confidence intervals are conditional on any hyperparameters being fixed at selected values.
An alternative approach, which is suitable for very large data sets, is to split the data into
- (a)
training,
- (b)
validation and
- (c)
test data sets
and to use them for model fitting, selection and assessment respectively; Ripley (1996) and Hastie et al. (2001).
For each ℳ the (normalized randomized quantile) residuals of Dunn and Smyth (1996) are used to check the adequacy of ℳ and, in particular, the distribution component 𝒟. The (normalized randomized quantile) residuals are given by
where Φ−1 is the inverse CDF of a standard normal variate and
if yi is an observation from a continuous response, whereas ui is a random value from the uniform distribution on the interval
if yi is an observation from a discrete integer response, where F(y|θ) is the CDF. For a right‐censored continuous response ui is defined as a random value from a uniform distribution on the interval
. Note that, when randomization is used, several randomized sets of residuals (or a median set from them) should be studied before a decision about the adequacy of model ℳ is taken. The true residuals ri have a standard normal distribution if the model is correct.
7. Examples
The following five examples are used primarily to demonstrate the power and flexibility of GAMLSSs.
7.1. Dutch girls’ body mass index data example
The variables body mass index BMI and age were recorded for 20 243 Dutch girls in a cross‐sectional study of growth and development in the Dutch population in 1980; Cole and Roede (1999). The objective here is to obtain smooth reference centile curves for BMI against age.
Figs 1(a) and 1(b) provide plots of BMI against age, separately for age ranges 0–2 years and 2–21 years respectively for clarity of presentation, indicating a positively skew (and possibly leptokurtic) distribution for BMI given age and also a non‐linear relationship between the location (and possibly also the scale, skewness and kurtosis) of BMI with age. Previous modelling of the variable BMI (e.g. Cole et al. (1998)) using the LMS method of Cole and Green (1992), has found significant kurtosis in the residuals after fitting the model, indicating that the kurtosis was not adequately modelled. It has also previously been found (e.g. Rigby and Stasinopoulos (2004a)) that a power transformation of age to explanatory variable X=ageξ improves the model fit substantively in similar data analysis.

Body mass index data: BMI against age with fitted centile curves
(8)Here hk(x) are arbitrary smooth functions of x for k=1,2,3,4 as in Section 3.2.1, and xi= age
for i=1,2,…,n, where ξ is a non‐linear parameter in the model. Log‐link functions were used for σ and τ in expression (8) to ensure that σ>0 and τ>0.
In the model fitting, the above model is denoted
where df′ indicates the extra degrees of freedom on top of a linear term in x. For example, in the model for μ, the total degrees of freedom used are
. Hence x or cs(x,0) refers to a linear model in x.
Model selection was achieved by minimizing the generalized Akaike information criterion GAIC(#), which is discussed in Section 6.2 and Appendix A.2.1, with penalty #=2.4, over the parameters df,df,df,df and ξ using the numerical optimization algorithm L‐BFGS‐B in function optim (from the R package; Ihaka and Gentleman (1996)), which is incorporated in the GAMLSS package. The algorithm converged to the values (df,df,df,df,ξ)=(16.2,8.5,4.7,6.1,0.50), correct to the decimal places given, with total effective degrees of freedom equal to 36.5 (including one for the parameter ξ), global deviance GD=76 454.5 and GAIC(2.4)=76 542.1, and this was the model selected. (The choice of penalty, which was selected here to demonstrate flexible modelling of the parameters, affects particularly the fitted τ model for this data set. For example, a penalty of #=2.5 led to a model with (df,df,df,df,ξ)=(16.0,8.0,4.8,1,0.52) with a constant τ model, GD=76 468.1 and GAIC(2.5)=76 545.1).
The fitted models for μ, σ, ν and τ for the selected model are displayed in
Fig. 2. The fitted ν indicates positive skewness in BMI for all ages (since
), whereas the fitted τ indicates modest leptokurtosis particularly at the lower ages.
Fig. 3 displays the (normalized quantile) residuals, which were defined in Section 6.2, from the fitted model. Figs 3(a) and 3(b) plot the residuals against the fitted values of μ and against age respectively, whereas Figs 3(c) and 3(d) provide a kernel density estimate and normal QQ‐plot for them respectively. The residuals appear random, although the QQ‐plot shows a possible single outlier in the upper tail and a slightly longer extreme (0.06%) lower tail than the Box–Cox t‐distribution. Nevertheless the model provides a good fit to the data. The fitted model centile curves for BMI for centiles 100α=0.4, 2.3, 10, 25, 50, 75, 90, 97.7, 99.6 (chosen to be two‐thirds of a z‐score apart) are displayed in Figs 1(a) and 1(b) for age ranges 0–2 years and 2–21 years respectively.

Body mass index data: fitted parameters (a) μ, (b) σ, (c) ν and (d) τ against age

Body mass index data: (a) residuals against fitted values of μ, (b) residuals against age, (c) kernel density estimate and (d) QQ‐plot
7.2. Hodges's health maintenance organization data example
Here we consider a one‐factor random‐effects model for response variable health insurance premium (prind) with state as the random factor. The data were analysed in Hodges (1998).
Hodges (1998) modelled the data by using a normal conditional model for yij given γj the random effect in the mean for state j, and a normal distribution for γj, i.e. his model can be expressed by yij|μij,σ∼N(μij,σ2), μij=β1+γj, log (σ)=β2 and
, independently for i=1,2,…,nj and j=1,2,…,J, where i indexes the observations within states.
Fig. 4 provides box plots of prind against state, showing the variation in the location and scale of prind between states and a positively skewed (and possible leptokurtic) distribution of prind within states. Although Hodges (1998) used an added variable diagnostic plot to identify the need for a Box–Cox transformation of y, he did not model the data by using a transformation of y.

Health maintenance organization data: box plots of prind against state
In the discussion of Hodges (1998), Wakefield commented as follows.
‘If it were believed that there were different within‐state variances then one possibility would be to assume a hierarchy for these also.’
Hodges, in his reply, also suggested treating the ‘within‐state precisions or variances as draws from some distribution’.

independently for j=1,2,…,J and k=1,2,3,4.
Using an Akaike information criterion, i.e. GAIC(2), for hyperparameter selection, as discussed in Section 6.2 and Appendix A.2.1, led to the conclusion that the random‐effect parameters for ν and τ are not needed, i.e. σ3=σ4=0. The remaining random‐effect parameters were estimated by using the approximate marginal likelihood approach, which is described in Appendix A.2.3, giving fitted parameter values
and
with corresponding fixed effects parameter values
,
,
and
and an approximate marginal deviance of 3118.62 obtained from equation (14) in Appendix A.2.3. This was the chosen fitted model.
Since
is close to 0, the fitted conditional distribution of yij is approximately defined by
, a t‐distribution with
degrees of freedom, for i=1,2,…,nj and j=1,2,…,J.
Fig. 5 plots the sample and fitted medians (μ) of prind against state (ordered by the sample median). The fitted values of σ (which are not shown here) vary very little. The heterogeneity in the sample variances of prind between the states (in Fig. 4) seems to be primarily due to sampling variation caused by the high skewness and kurtosis in the conditional distribution of y (rather than either the variance–mean relationship or the random effect in σ).
Fig. 6 provides marginal (Laplace‐approximated) profile deviance plots, as described in Section 6.2, for each of ν and τ, for fixed hyperparameters, giving 95% intervals (−0.866,0.788) for ν and (4.6,196.9) for τ, indicating considerable uncertainty about these parameters. (The fitted model suggests a log‐transformation for y, whereas the added variable plot that was used by Hodges (1998) suggested a Box–Cox transformation parameter ν=0.67 which, although rather different, still lies within the 95% interval for ν. Furthermore the wide interval for τ suggests that a conditional distribution model for yij defined by
may provide a reasonable model. This model has
and
.)

Health maintenance organization data: sample (○) and fitted (+) medians of prind against state

Health maintenance organization data: profile approximate marginal deviances for (a) ν and (b) τ
Fig. 7(a) provides a normal QQ‐plot for the (normalized quantile) residuals, which were defined in Section 6.2, for the chosen model. Fig. 7(a) indicates an adequate model for the conditional distribution of y. The outlier case for Washington state, identified by Hodges (1998), does not appear to be an outlier in this analysis. Figs 7(b) and 7(c) provide respectively normal QQ‐plots for the fitted random effects γj1 for μ and γj2 for log (σ), for j=1,2,…,J. Fig. 7(b) indicates that the normal distribution for the random effects in the model for μ may be adequate, although there appear to be five outlier states with high prind medians, i.e. states CT, DE, MA, ME and NJ, and also possibly two outlier states with low prind medians, GU and MN. Fig. 7(c) indicates some departure from the assumption of normal random effects in the model for log (σ).

Health maintenance organization data: QQ‐plots for (a) the residuals, (b) the random effects in μ and (c) the random effects in log (σ)
7.3. The hospital stay data
The hospital stay data, 1383 observations, are from a study at the Hospital del Mar, Barcelona, during the years 1988 and 1990; see Gange et al. (1996). The response variable is the number of inappropriate days (noinap) out of the total number of days (los) that patients spent in hospital. The following variables were used as explanatory variables:
- (a)
age, the age of the patient;
- (b)
ward, the type of ward in the hospital (medical, surgical or other);
- (c)
year, the year (1988 or 1990);
- (d)
loglos, log(los/10).
Gange et al. (1996) used a logistic regression model for the number of inappropriate days, with binomial and beta–binomial errors, and found that the latter provided a better fit to the data. They modelled both the mean and the dispersion of the beta–binomial distribution as functions of explanatory variables by using the epidemiological package EGRET (Cytel Software Corporation, 2001), which allowed them to fit a parametric model using a logit link for the mean and an identity link for the dispersion φ=σ. Their final model was BB{logit(μ)=ward + year + loglos,σ=year}.
First we fit their final model, which is equivalent to model I in Table 2. Although we use a log‐link for the dispersion σ in Table 2, this does not affect model I since year is a factor. Table 2 shows GD, AIC and SBC, which were defined in Section 6.2, for model I, to be 4519.4, 4533.4 and 4570.1 respectively. Here we are interested in whether we can improve the model by using the flexibility of a GAMLSS. For the dispersion parameter model we found that the addition of ward improves the fit (see model II in Table 2 with AIC=4501.0 and SBC=4548.1) but no other term was found to be significant. Non‐linearities in the mean model for the terms loglos and age were investigated by using cubic smoothing splines in models III and IV. There is strong support for including a smoothing term for loglos as indicated by the reduction in AIC and SBC for model III compared with model II. The inclusion of a smoothing term for age is not so clear cut since, although there is some marginal support from AIC, it is clearly not supported by SBC, when comparing model III with model IV.
| Model | Link | Terms | GD | AIC | SBC |
|---|---|---|---|---|---|
| I | logit(μ) | ward + loglos + year | 4519.4 | 4533.4 | 4570.1 |
| log (σ) | year | ||||
| II | logit(μ) | ward + loglos + year | 4483.0 | 4501.0 | 4548.1 |
| log (σ) | year + ward | ||||
| III | logit(μ) | ward + cs(loglos,1) + year | 4459.4 | 4479.4 | 4531.8 |
| log (σ) | year + ward | ||||
| IV | logit(μ) | ward + cs(loglos,1) + year + cs(age,1) | 4454.4 | 4478.4 | 4541.2 |
| log (σ) | year + ward |
The fitted smoothing functions for loglos and age from model IV are shown in Fig. 8. Fig. 9 displays a set of the (normalized randomized quantile) residuals (see Section 6.2) from model IV. The residuals seem to be satisfactory. Other sets of (normalized randomized quantile) residuals were very similar.

Hospital stay data: fitted smoothing curves for (a) loglos and (b) age from model IV

Hospital stay data: (a) residuals against fitted values, (b) residuals against index, (c) kernel density estimate and (d) QQ‐plot
7.4. The epileptic seizure data
The epileptic seizure data, which were obtained from Thall and Vail (1990), comprise four repeated measurements of seizure counts (each over a 2‐week period preceding a clinical visit) for 59 epileptics: a total of 236 cases. Breslow and Clayton (1993) and Lee and Nelder (1996) identified casewise overdispersion in the counts which they modelled by using a random effect for cases in the predictor for the mean in a Poisson GLMM, whereas Lee and Nelder (2000) additionally considered an overdispersed Poisson GLMM (using extended quasi‐likelihood). They also identified random effects for subjects in the predictor for the mean.
Here we directly model the casewise overdispersion in the counts by using a negative binomial (type I) model and consider random effects for subjects in the predictors for both the mean and the dispersion. Specifically we assume that, conditional on the mean μi and σi (i.e. conditional on the random effects), the seizure counts yij are independent over subjects i=1,2,…,59 and repeated measurements j=1,2,3,4 with a negative binomial (type I) distribution, yij|μij,σij∼NBI(μij,σij) where the logarithm of the mean is modelled by using explanatory terms and the logarithms of both the mean and the dispersion include a random‐effects term for subjects. (Note that the conditional variance of yij is given by
.)
The model is denoted by NBI{ log (μ)=lbase * trt+visit+lage+random(subjects), log (σ)=random(subjects)}, where, equivalently to Breslow and Clayton (1993), lbase is the logarithm of a quarter of the number of base‐line seizures, trt is a treatment factor (coded 0 for placebo and 1 for drug), visit is a covariate for the clinic visits (coded −0.3, −0.1, 0.1, 0.3 for the four visits), lage is the logarithm of the age of the subject, lbase * trt indicates an interaction term and random(subjects) indicates a random‐effect term for subjects with distribution
and
in the log‐mean and log‐dispersion models respectively.
The approximate marginal likelihood approach that is described in Appendix A.2.3 led to the fitted random‐effects parameters
and
with an approximate marginal deviance of 1250.84, obtained from equation (14). (Alternatively, using a generalized Akaike information criterion with penalty 3, i.e. GAIC(3), for hyperparameter selection, as discussed in Appendix A.2.1, led to
and
(corresponding to df=39.9 and df=9.99 respectively) with GAIC(3)=1255.7.) Hence it appears that there are random effects for subjects in both the log‐mean and the log‐dispersion models of the negative binomial distribution of the seizure count. The fitted parameters for log (μ) are the intercept
,
,
,
,
and
, and for log (σ) the intercept
.
Breslow and Clayton (1993) considered including in the mean model random slopes in the covariate visit for subjects; however, this was not found to improve the model. Lee and Nelder (1996, 2000) suggested that the casewise overdispersion may depend on an indicator variable for the fourth visit, denoted here by V4. In our model this is equivalent to replacing the dispersion model by log (σ)=V4. This model led to
with GAIC(3)=1268.5.
Fig. 10 provides diagnostic plots for our model. Figs 10(a) and 10(b) plot the (normalized randomized quantile) residuals, which were defined in Section 6.2, against the fitted values and the covariate visit respectively and appear random. Figs 10(c) and 10(d) provide a kernel density estimate and normal QQ‐plot for the residuals respectively and indicate some departure from the conditional negative binomial distribution for y. Figs 10(e) and 10(f) provide normal QQ‐plots for the subject random effects in the log‐mean and log‐dispersion models respectively, indicating that the normal distribution for the random effects is adequate for log (μ) but not for log (σ).

Diagnostic plots for the epileptic seizures data: (a) residuals against fitted values, (b) residuals against visit, (c) residual kernel density estimate, (d) QQ‐plot of the residuals, (e) QQ‐plot of the random effects in log (μ) and (f) QQ‐plot of the random effects in log (σ)
7.5. The river flow data
The river flow data, which were obtained from Tong (1990), comprise 1096 consecutive observations of the daily river flow r, of the river Vatnsdalsa in Iceland, measured in cubic metres per second, the daily precipitation p in millimetres and the mean daily temperature t in degrees centigrade at the meteorological station at Hveravellir in north‐west Iceland. The data span the period of 1972, 1973 and 1974 and are shown in Fig. 11. The task is to build a stochastic model to predict the river flow by using the temperature and precipitation. Tong (1990) used a heavily parameterized self‐exciting threshold autoregressive model with normal errors (conditional on current and past values of the explanatory variables p and t and past values of r). Here we investigate a variety of (conditional) distributions to model the river flow. We include the following explanatory variables, which were computed from r, p and t:

River flow data: (a) riverflow, (b) temperature and (c) precipitation against time in days
- (a)
lr, log(r);
- (b)
lp, log(p+1);
- (c)
lp90, the logarithm of the average precipitation for the last 90 days;
- (d)
tp, an indicator variable for positive t (i.e. t>0);
- (e)
t7, the average temperature over the last week (i.e. 7 days);
- (f)
t90, the average temperature for the last 90 days.
Lag variables for river flow at lags 1, 2 and 3 (i.e. r1, r2 and r3 respectively), and at lag 1 for log‐river‐flow (lr1), precipitation (p1) and log‐precipitation (lp1) were also used as explanatory variables. The first 90 observations and the last observation were weighted out from the analysis, leaving 1005 observations for model fitting.
Initially an inverse Gaussian distribution was assumed, owing to the skewed distribution of the river flow, with a constant dispersion model. After some initial search an adequate location model was found. Column I of Table 3 shows the global deviance GD and SBC for the resulting model I given by {μ=poly(r1,2)+r2+r3+(tw+p+p1 * t90) * tp−(tw+p+p1 * t90)}, for a variety of distribution families, where any scale and shape parameters (e.g. σ, ν and τ) were modelled as constants. Note that poly(r1,2) refers to a polynomial of order 2 in r1, i.e. a quad‐ratic in r1. The conclusion, by looking at the SBC values, is that the Box–Cox t‐distribution family fits best, indicating that the distribution of the river flow is both skew and leptokurtic.
| Distribution | I | II | III | |||
|---|---|---|---|---|---|---|
| GD | SBC | GD | SBC | GD | SBC | |
| Gumbel | 4903 | 4986 | 2346 | 2505 | 3503 | 3787 |
| Reverse Gumbel | 3343 | 3426 | 2060 | 2219 | 2581 | 2865 |
| Normal | 4077 | 4160 | 1982 | 2141 | 3041 | 3325 |
| Gamma | 2578 | 2661 | 1944 | 2103 | 2443 | 2726 |
| Inverse Gaussian | 2333 | 2416 | 1932 | 2091 | 2343 | 2626 |
| Logistic | 3267 | 3350 | 1872 | 2031 | 2426 | 2709 |
| Box–Cox normal | 2440 | 2529 | 1923 | 2089 | 2277 | 2568 |
| t‐family | 2361 | 2451 | 1831 | 1997 | 2081 | 2371 |
| Johnson–Su | 2354 | 2451 | 1816 | 1988 | 2031 | 2238 |
| Box–Cox t | 2096 | 2192 | 1805 | 1978 | 1950 | 2247 |
Selecting now the Box–Cox t‐distribution BCT, a search for an adequate dispersion model was made. Column II of Table 3 shows the resulting model II with μ as in model I and { log (σ)=poly(lr1,3)+lp+lp1+lp90+(t+t90) * tp} fitted with different distribution families, again using constant shape parameters. BCT again fits best and the dispersion model has dramatically improved the fit, since SBC is reduced from 2192 to 1978. Constant models were found to be adequate for both the shape parameters of BCT.
For comparison, column III of Table 3 shows the Tong (1990) model, fitted with different distribution families. Tong (1990) used a normal distribution model with a heavily parameterized threshold mean model including many lags of r,t and p and a simple threshold dispersion model (using the optimum common threshold cut‐off at r=13). This model resulted in an SBC of 3325.
where
has a t‐distribution with fitted degrees of freedom parameter
, shape parameter
and


Note that, for the BCT model, the location parameter μ is approximately the median of y. Fig. 12 displays the fitted values of μ and σ plotted against time in days. The (normalized quantile) residuals (see Section 6.2) for the final BCT model are shown in Fig. 13. Figs 13(a) and 13(b) plot their autocorrelation and partial autocorrelation functions respectively, whereas Figs 13(c) and 13(d) provide a kernel density estimate and QQ‐plot for them respectively. The residuals appear satisfactory. In particular the assumptions of (conditional) independence and a BCT distribution for the river flow observations appear to be reasonable.

River flow data: fitted values of (a) μ and (b) σ against time in days

River flow data residual plots (a) autocorrelation function ACF, (b) partial autocorrelation function, (c) kernel density estimate and (d) QQ‐plot
8. Conclusions
The GAMLSS is a very general class of models for a univariate response variable. It provides a common coherent framework for regression‐type models, uniting models that are often considered as different in the statistical literature. It is therefore highly suited to educational objectives. It allows a very wide family of distributions for the response variable to be fitted, reducing the danger of distributional misspecification. It allows all the parameters of the distribution of the dependent variable to be modelled, so that location, scale, skewness and kurtosis parameters can each be modelled explicitly if required. Different terms can be included in the predictor for each parameter, including splines and random effects, providing extra flexibility. For fixed hyperparameters the fitting algorithm of the GAMLSS model is very fast, so many alternative models can be fitted and explored before a final selection of a model or a combination of models is made. The hyperparameters can be estimated if required. The GAMLSS is implemented as a package (which is available free of charge from the authors) in the statistical environment R. The modular nature of the fitting algorithm allows additional alternative distributions and additive terms to be incorporated easily. The GAMLSS can also be used as an exploratory tool to select potential models for a subsequent fully Bayesian analysis.
Acknowledgements
The authors thank Calliope Akantziliotou for her help in the R implementation of the GAMLSS, Bob Gilchrist and Brian Francis for their comments and their encouragement during this work, Tim Cole for suggesting the body mass index data set, Jim Hodges, S. Gange and Howell Tong for providing the health maintenance organization, the hospital stay and river flow data sets respectively, the R Development Core Team for the package R (which is free of charge) and finally the four referees for comments that helped to improve the paper.
Appendices
Appendix A: Inferential framework for the generalized additive model for location, scale and shape
A.1. Posterior mode estimation of the parameters β and random effects γ
For the GAMLSS (1) we use an empirical Bayesian argument, to obtain MAP, or posterior mode, estimation (see Berger (1985)) of both the βks and the γjks assuming normal, possibly improper, priors for the γjks. We show below that this is equivalent to maximizing the penalized likelihood lp, which is given by equation (5). To show this we shall use arguments that have been developed in the statistical literature by Wahba (1978), Silverman (1985), Green (1985), Kohn and Ansley (1988), Speed (1991), Green and Silverman (1994), Verbyla et al. (1999), Hastie and Tibshirani (2000) and Fahrmeir and Lang (2001).
The components of a GAMLSS (1) are
- (a)
y, the response vector of length n,
- (b)
X=(X1,X2,…,Xp), design matrices,
- (c)
, linear parameters,
- (d)
Z=(Z11,Z21,…,ZJ11,…,Z1p,Z2p,…,ZJpp), design matrices,
- (e)
, random effects, and
- (f)
, hyperparameters.
(9)
(10)
. Hence, from expression (10),

, and c(y,λ) is a function of y and λ. Note that, for a GAMLSS, lp is equivalent, with respect to (β,γ), to the h‐likelihood of Lee and Nelder (1996, 2001a, b).
Hence lp is maximized over (β,γ), giving posterior mode (or MAP) estimation of (β,γ) and, for fixed hyperparameters λ, MAP estimation of β and γ is equivalent to maximizing the penalized likelihood lp that is given by equation (5).
The details of the RS and CG algorithms for maximizing the penalized likelihood lp, over both the parameters β and the random‐effects terms γ (for fixed hyperparameters λ), are given in Appendix B. The justification of the CG algorithm is given in Appendix C.
A.2. Hyperparameter estimation

The maximization of L(β,λ|y) over β and λ involves high dimensional integration so any approach to maximizing it will be computer intensive. Note that the maximum likelihood estimator for β from this approach will not in general be the same as the MAP estimator for β that was described in the previous section.
In restricted maximum likelihood (REML) estimation, effectively a non‐informative (constant) prior is assumed for β and both γ and β are integrated out of the joint density f(y,γ,β|λ) to give the marginal likelihood L(λ|y), which is maximized over λ.
In a fully Bayesian inference for the GAMLSS, the posterior distribution of (β,γ,λ) is obtained from equation (9), e.g. by using Markov chain Monte Carlo sampling; see Fahrmeir and Tutz (2001) or Fahrmeir and Lang (2001).
The above methods of estimation of the hyperparameters λ are in general highly computationally intensive: the maximum likelihood and REML methods require high dimensional integration, whereas the fully Bayes method requires Markov chain Monte Carlo sampling.
The following four methods, which do not require such computational intensity, are considered for hyperparameter estimation in GAMLSSs.
The methods are summarized in the following algorithm.
- (a)
Procedure 1: estimate the hyperparameters λ by one of the methods
- (i)
minimizing a profile generalized Akaike information criterion GAIC over λ,
- (ii)
minimizing a profile generalized cross‐validation criterion over λ,
- (iii)
maximizing the approximate marginal density (or profile marginal likelihood) for λ by using a Laplace approximation or
- (iv)
approximately maximizing the marginal likelihood for λ by using an (approximate) EM algorithm.
- (i)
- (b)
Procedure 2: for fixed current hyperparameters λ, use the GAMLSS (RS or CG) algorithm to obtain posterior mode (MAP) estimates of (β,γ).
Procedure 2 is nested within procedure 1 and a numerical algorithm is used to estimate λ.
We now consider the methods in more detail.
A.2.1. Minimizing a profile generalized Akaike information criterion over λ
GAIC (Akaike, 1983) was considered by Hastie and Tibshirani (1990), pages 160 and 261, for hyperparameter estimation in GAMs. In GAMs a cubic smoothing spline function h(x) is used to model the dependence of a predictor on explanatory variable x. For a single smoothing spline term, since λ is related to the smoothing degrees of freedom df=tr(S) through equation (6), selection (or estimation) of λ may be achieved by minimizing GAIC(#), which is defined in Section 6.2, over λ.
When the model contains p cubic smoothing spline functions in different explanatory variables, then the corresponding p smoothing hyperparameters λ=(λ1,λ2,…,λp) can be jointly estimated by minimizing GAIC(#) over λ. However, with multiple smoothing splines
is only an approximation to the full model complexity degrees of freedom.
The GAIC(#) criterion can be applied more generally to estimate hyperparameters λ in the distribution of random‐effects terms. The (model complexity) degrees of freedom df need to be obtained for models with random‐effects terms. This has been considered by Hodges and Sargent (2001). The degrees of freedom of a model with a single random‐effects term can be defined as the trace of the random‐effect (shrinkage) smoother S, i.e. df=tr(S), where S is given by equation (6). As with smoothing terms, when there are other terms in the model
is only an approximation to the full model complexity degrees of freedom. The full model complexity degrees of freedom for model (1) are given by df=tr(A−1B) where A is defined in Appendix C and B is obtained from A by omitting the matrices Gjk for j=1,2,…,Jk and k=1,2,…,p.
A.2.2. Minimizing a generalized cross‐validation over λ
The generalized cross‐validation criterion was considered by Hastie and Tibshirani (1990), pages 259–263, for hyperparameter estimation in GAMs. The criterion GAIC in Appendix A.2.1 is replaced by the generalized cross‐validation criterion, which is minimized over λ. Verbyla et al. (1999) considered the approximate equivalence of generalized cross‐validation and REML methods of estimating λ in smoothing splines models, which was considered in more detail by Wahba (1985) and Kohn et al. (1991).
A.2.3. Maximizing the approximate marginal density (or profile marginal likelihood) of λ by using a Laplace approximation
For GLMMs, Breslow and Clayton (1993) used a first‐order Laplace integral approximation to integrate out the random effects γ and to approximate the marginal likelihood, leading to estimating equations based on penalized quasi‐likelihood for the mean model parameters and pseudonormal (REML) likelihood for the dispersion components. Breslow and Lin (1995) extended this to a second‐order Laplace approximation.
Lee and Nelder (1996) took a similar approach, estimating the dispersion components by using a first‐order approximation to the Cox and Read (1987) profile likelihood which eliminates the nuisance parameters β from the marginal likelihood, which they called an adjusted profile h‐likelihood. Lee and Nelder (2001a) extended this to a second‐order approximation.
(11)
(12)
and
(13)
and
, the MAP estimates of β and γ given each fixed λ. (Note that matrix D is a rearrangement of matrix A from Appendix C.) Estimation of λ can be achieved by maximizing approximation (12) over λ (e.g. by using a numerical maximization algorithm). Alternatively, this can be considered as a generalization of REML estimation of λ, maximizing an approximate profile log‐likelihood for λ, denoted here as l(λ), given by replacing
by the expected information
, giving
(14)This is closely related to the adjusted profile h‐likelihood of Lee and Nelder (1996, 2001a, b).
A.2.4. Approximately maximizing the marginal likelihood for λ by using an (approximate) EM algorithm
An approximate EM algorithm was used by Fahrmeir and Tutz (2001), pages 298–303, and by Diggle et al. (2002), pages 172–175, to estimate hyperparameters in GLMMs and is similarly applied here to maximize approximately over λ the marginal likelihood of λ, L(λ) (or equivalently the posterior marginal distribution of λ for a non‐informative uniform prior).
, is approximated, where the expectation is over the posterior distribution of (β,γ) given y and
, i.e.
, where
is the current estimate of λ, giving, apart from a function of y,
(15)
and
are the posterior mode and curvature (i.e. submatix of A−1) of γjk from the MAP estimation in Appendix C.
is maximized over λ by a numerical maximization algorithm (e.g. the function optim in the R package). If Gjk=Gk for j=1,2,…,Jk and k=1,2,…,p, and the Gk are unconstrained positive definite symmetric matrices (e.g. in a random‐coefficients model), then equation (15) can be maximized explicitly giving, for k=1,2,…,p,
(16)Appendix B: The algorithms
B.1. Introduction
be the adjusted dependent variables and Wks be diagonal matrices of iterative weights, for k=1,2,…,p and s=1,2,…,p, which can have one of the forms



Let r be the outer cycle iteration index, k the parameter index, i the inner cycle iteration index, m the backfitting index and j the random‐effects (or nonparametric) term index. Also, for example, let
denote the current value of the vector γjk in the rth outer, ith inner and mth backfitting cycle iteration and let
denote the value of γjk at the convergence of the backfitting cycle for the ith inner cycle of the rth outer cycle, which is also the starting value
for the (i+1)th inner cycle of the rth outer cycle, for j=1,2,…,Jk and k=1,…,p. Note also, for example, that
means the current (i.e. most recently) updated estimate of γjk and the algorithm operates in the backfitting cycle of the ith inner cycle of the rth outer cycle.
B.2. The RS algorithm
Essentially the RS algorithm has an outer cycle which maximizes the penalized likelihood with respect to βk and γjk, for j=1,…,Jk, in the model successively for each θk in turn, for k=1,…,p. At each calculation in the algorithm the current updated values of all the quantities are used.
The RS algorithm is not a special case of the CG algorithm because in the RS algorithm the diagonal weight matrix Wkk is evaluated (i.e. updated) within the fitting of each parameter θk, whereas in the CG algorithm all weight matrices Wks for k=1,2,…,p and s=1,2,…,p are evaluated after fitting allθk for k=1,2,…,p.
The RS algorithm is as follows.
-
Step 1
: start—initialize fitted values
and random effects
, for j=1,…,Jk and k=1,2,…,p. Evaluate the initial linear predictors
, for k=1,2,…,p.
-
Step 2
: start the outer cycle r=1,2,… until convergence. For k=1,2,…,p:
- (a)
start the inner cycle i=1,2,… until convergence—
- (i)
evaluate the current
,
and
;
- (ii)
start the backfitting cycle m=1,2,… until convergence;
- (iii)
regress the current partial residuals
against design matrix Xk, using the iterative weights
to obtain the updated parameter estimates
;
- (iv)
for j=1,2,…,Jk smooth the partial residuals
, using the shrinking (smoothing) matrix Sjk given by equation (6) to obtain the updated (and current) additive predictor term
;
- (v)
end the backfitting cycle, on convergence of
and
and set
and
for j=1,2,…,Jk and otherwise update m and continue the backfitting cycle;
- (vi)
calculate the updated
and
;
- (i)
- (b)
end the inner cycle on convergence of
and the additive predictor terms
and set
,
, for j=1,2,…,Jk,
and
; otherwise update i and continue the inner cycle.
- (a)
-
Step 3: update the value of k.
-
Step 4: end the outer cycle—if the change in the (penalized) likelihood is sufficiently small; otherwise update r and continue the outer cycle.
B.3. The CG algorithm
Algorithm CG, based on Cole and Green (1992) is as follows.
-
Step 1
: start—initialize
and
for j=1,2,…,Jk and k=1,2,…,p. Evaluate
for k=1,2,…,p.
-
Step 2
: start the outer cycle r=1,2,… until convergence.
-
Step 3: evaluate and fix the current
,
and
for k=1,2,…,p and s=1,2,…,p. Perform a single rth step of the Newton–Raphson algorithm by
- (a)
starting the inner cycle i=1,2,… until convergence—for k=1,2,…,p,
- (i)
start the backfitting cycle m=1,2,… until convergence
and for j=1,2,…,Jk

- (ii)
end the backfitting cycle, on convergence of
and
and set
and
for j=1,2,…,Jk and otherwise update m and continue the backfitting cycle, and
- (iii)
calculate the updated
and
and then update k;
- (i)
- (b)
end the inner cycle on convergence of
and the additive predictor terms
and set
,
,
and
, for j=1,2,…,Jk and k=1,2,…,p; otherwise update i and continue the inner cycle.
- (a)
-
Step 4: end the outer cycle if the change in the (penalized) likelihood is sufficiently small; otherwise update r and continue the outer cycle.
The matrices
and
, which are defined in Appendix C, are the projection matrices and the shrinking matrices, for the parametric and additive components of the model respectively, at the rth iteration, for j=1,2,…,Jk and k=1,2,…,p.
and
are the current working variables for fitting the parametric and the additive (random‐effects or smoothing) components of the model respectively and are defined as


for k=1,2,…,p, at the end of the inner cycle for the rth outer cycle and then evaluating
,
and
, for k=1,2,…,p and s=1,2,…,p, using the
for k=1,2,…,p. The optimum step length for a particular iteration r can be obtained by maximizing lp(α) over α.
The inner (backfitting) cycle of the algorithm can be shown to converge (for cubic smoothing splines and similar linear smoothers); Hastie and Tibshirani (1990), chapter 5. The outer cycle is simply a Newton–Raphson algorithm. Thus, if step size optimization is performed, the outer loop will converge as well. Standard general results on the Newton–Raphson algorithm ensure convergence (Ortega and Rheinboldt, 1970). Step optimization is rarely needed in practice in our experience.
Appendix C: Maximization of the penalized likelihood
In this appendix it is shown that maximization of the penalized log‐likelihood function lp that is given by equation (5) over the parameters βk and terms γjk for j=1,2,…,Jk and k=1,2,…,p leads to the algorithm that is described in Appendix B.
This is achieved by the following two steps.
- (a)
The first and second derivatives of equation (5) are obtained to give a Newton–Raphson step for maximizing equation (5) with respect to βk and γjk for j=1,2,…,Jk and k=1,2,…,p.
- (b)
Each step of the Newton–Raphson algorithm is achieved by using a backfitting procedure cycling through the parameters and through the additive terms of the k linear predictors.
C.1. Step (a)
The algorithm maximizes the penalized likelihood function lp, given by equation (5), using a Newton–Raphson algorithm. The first derivative (score function) and the second derivatives of lp with respect to βk and γjk for all j=1,2,…,Jk and k=1,2,…,p are evaluated at iteration r at the current predictors
for k=1,2,…,p.
Let
, ak=∂lp/∂αk and
for k=1,2,…,p and s=1,2,…,p, and let
, a=∂lp/∂α and A=−∂2lp/∂α∂αT.



over i=1,2,…,n, for k=1,2,…,p and s=1,2,…,p (see Appendix B for alternative weight matrices).
C.2. Step (b)

(17)

is the adjusted dependent variable. (A device for obtaining updated estimate
in equation (17) is to apply weighted least squares estimation to an augmented data model given by
(18)
,
and ejk∼N(0,I). This device can be generalized to estimate αk and even α.)
(19)

A single rth Newton–Raphson step is achieved by using a backfitting procedure for each k, cycling through equation (19) and then equation (17) for j=1,2,…,Jk and cycling over k=1,2,…,p until convergence of the set of updated values
for k=1,2,…,p. The updated predictors
, first derivatives
, diagonal weighted matrices
and adjusted dependent variables
, for k=1,2,…,p and s=1,2,…,p, are then calculated and the (r+1)th Newton–Raphson step is performed, until convergence of the Newton–Raphson algorithm.
References
Discussion on the paper by Rigby and Stasinopoulos
Peter W. Lane (GlaxoSmithKline, Harlow)
I congratulate Robert Rigby and Mikis Stasinopoulous on their addition to the toolbox for analytical statistics. They have clearly been working towards the present generality of the generalized additive model for location, scale and shape for several years and have developed the supporting theory in conjunction with a software package in the public domain R system. The model includes many of the modelling extensions that have been introduced by researchers in the past few decades and provides a unifying framework for estimation and inference. Moreover, they have found other directions in which to extend it themselves, allowing for modelling of further parameters beyond the mean and variance and with a much wider class of distributions.
This is a very extensive paper, and it would take much longer than the time that is available today to get to grips with the many ideas and issues that are covered. Two particular aspects encourage me to go away to experiment with the new tool. One is the inclusion of facilities for smooth terms, which have much potential for practial use in handling relationships that must be adjusted for, without the need for a parametric model. I am particularly glad to see facilities for smoothing made available as an integrated part of a general model, unlike the approach that is taken in some statistical software. The other aspect is the provision for non‐linear hyperparameters, which I experimented with myself in a class I called generalized non‐linear models and made available in GenStat (Lane, 1996). The structure of the generalized additive model for location, scale and shape allows such parameters to be estimated by non‐linear algorithms, involving the inevitable concerns over details of the search process, without having to sacrifice the benefits of not having these concerns within the main generalized additive parts of the model.
I am surprised not to see the beta distribution included in the very extensive list of available distributions. In fact, none of the distributions that are listed there are suitable for the analysis of continuous variables observed in a restricted range. In pharmaceutical trials in several therapeutic areas, responses from patients are gathered in the form of a visual analogue scale. This requires patients to mark a point on a line in the range [0,1] to represent some aspect under study, such as their perception of pain. Some of my colleagues (Wu et al., 2003) have investigated the analysis of such data by using the beta distribution, and it would be useful to see how to fit this into the general scheme.
I am very pleased to see that facilities for model checking are also provided and feature prominently in the illustrative examples in this paper. These are invaluable in helping to understand the fit of a model, and in highlighting potential problems.
I would like to raise three concerns with the paper. The main one is with the use of maximum likehood for fitting models with random effects. I am under the impression that such an approach in general leads to biased estimators, and that it is preferable to use restricted maximum likehood. This strikes me as indicating that the generalized linear mixed model and hierarchical generalized linear model approaches are more appropriate for those problems that come within their scope.
My experience with general tools for complex regression models has given me a sceptical outlook when presented with a new one. All too often, I have found that models cannot be applied in practice without extensive knowledge of the underlying algorithms to cope with difficulties in start‐up or convergence. As a result, the apparent flexibility of a tool cannot actually be used, and I have to make do with a simpler model than I would like because of difficulties that I cannot overcome. I fear that the disclaimer in Section 5 about potential problems with the likelihood approach for these very general models may signal similar difficulties here. It is noticeable that three of the illustrative examples involve large numbers of observations (over 1000) and the other two, still with over 200 observations, have few parameters.
I am also concerned by the arbitrary nature of the generalized Akaike information criterion that is suggested for comparing models. The examples use three different values, 2.0, 2.4 and 3.0, for what I can only describe as a ‘fudge factor’, and they include no comment on why these values are used rather than any others. I am aware that, with large data sets, automatic methods of model selection tend to lead to the inclusion of more model terms than are needed for a reasonable explanation; we need a better approach than is offered by these information criteria.
However, I appreciate that most of my concerns can probably be levelled at any scheme for fitting a wide class of complex models. So I am happy to conclude by proposing a vote of thanks to the authors for a stimulating paper and a new modelling tool to experiment with.
Simon Wood (University of Glasgow)
I would like to start by congratulating the authors on a very interesting paper, reporting an impressive piece of work. It is good to see sophisticated approaches to the modelling of the mean being extended to other moments.

. Given smoothing parameters λj, model estimation can then proceed by direct maximization of the penalized likelihood of the model



Fig. 14 illustrates the results of applying this approach and should be compared with Fig. 2 of Rigby and Stasinopoulos. All computations were performed using R 2.0.0 (R Development Core Team, 2004). For this example, h1 was represented by using a rank 20 cubic regression spline whereas h3 and h4 were each represented by using rank 10 cubic regression splines (class cr smooth constructor functions from R library mgcv were used to set up the model matrices and penalty matrices). Given smoothing parameters, the penalized likelihood was maximized by Newton's method with step halving, backed up by steepest descent with line searching, if the Hessian of the penalized log‐likelihood was not negative definite. Constants were used as starting values for the functions, these being obtained by fitting a model in which h1–h4 were each assumed to be constant. Rapid convergence is facilitated by first conditioning on a moderate constant value for h4 and optimizing only h1–h3. The resulting h1–h3‐estimates were used as starting values in a subsequent optimization with respect to all the functions. The two‐stage optimization helps because of the flatness of the log‐likelihood with respect to changes in h4 corresponding to τ>30. This penalized likelihood maximization was performed using ‘exact’ first and second derivatives. The smoothing parameters were estimated by GAIC minimization, with #=2. The GAIC was optimized by using a quasi‐Newton method with finite differenced derivatives (R routine optim). The estimated degrees of freedom for the smooth functions were 20, 9.2, 5.9 and 7.4, which are higher than those which were obtained in Section 7.1, since I used #=2 rather than #=2.4. Fitting required around a fifth of the time of the gamlss package, and with some optimization and judicious use of compiled code a somewhat greater speed up might be expected.

Equivalent figure to Fig. 2, showing estimates that were achieved by using penalized cubic regression splines to represent the smooth terms in the model (note the wide bands in (d); clearly the data provide only limited information about τ): ——, estimated functions; –––, limits of twice the standard error bands
So the direct penalized regression approach to the generalized additive model for location, scale and shape class may have the potential to offer some computational benefits, as well as making approxim‐ate inference about the uncertainty of the model components quite straightforward. Clearly, then, this is a paper which not only presents a substantial body of work but also suggests many further areas for ex‐ploration, and it is therefore a pleasure to second the vote of thanks.
The vote of thanks was passed by acclamation.
M. C. Jones (The Open University, Milton Keynes)
It is excellent to see three‐ and four‐parameter distributional families being employed for continuous response variables in the authors’ general models. My comments on this fine and important paper focus on Section 4.2.





The remainder of the distributions in Section 4.2 live on ℜ+. Three of the four employ the much overrated Box–Cox transformation. A big disadvantage, at least for the purist, is that the Box–Cox transformation requires messy truncated distributions for z with the truncation point depending on the parameters of the transformation. The authors recognize this elsewhere (Rigby and Stasinopoulos, 2004a,b). A better alternative, if one must take the transformation approach, might be to take logarithms and then to employ the wider families of distributions genuinely on ℜ, such as those above.
But there are many distributions on ℜ+ directly including, finally, the generalized gamma family. My only comment here is that this is a well‐known family with a long history before an unpublished 2000 report, e.g. Amoroso (1925), Stacy (1962) and Johnson et al. (1994), section 8.7.
John A. Nelder (Imperial College London)

with
and
, w=λ/(λ+σ2/2) and
. This has a serious bias, e.g.
when λ=∞ (i.e. w=1). Lee and Nelder (2001) showed that the use of APHL in equation (15) gives a consistent REML estimator. PL has been proposed for fitting smooth terms such as occur in generalized additive models. However, in random‐effect models the number of random effects can increase with the sample size, so the use of the appropriate APHL is important. If appropriate profiling is used the algebra for fitting dispersion is fairly complicated; I predict that for fitting kurtosis it will be enormously complicated.
Lee and Nelder use extended quasi‐likelihood for more general models, where no likelihood is available: for its good performances see Lee (2004). When the model allows exact likelihoods they use them in forming the h‐likelihood; even with binary data the h‐likelihood method often produces the least bias compared with other methods, including Markov chain Monte Carlo sampling (Noh and Lee, 2004).
Youngjo Lee (Seoul National University)
I am unsure by how much the generalized additive model for location, scale and shape class is more general than the hierarchical generalized linear model (HGLM) class of models. Recently, the latter class has been extended to allow random effects in both the mean and the dispersion (Lee and Nelder, 2004). This class enables models with various heavy‐tailed distributions to be explored, some of which may be new. Various forms of skewness can also be generated. Although this approach uses a combination of interlinked generalized linear models, it does not mean that we are restricted to the variance function, and higher cumulants, of exponential families.


where
is a random variable with the χ2‐distribution with α degrees of freedom, then marginally the ɛi follow the t‐distribution with α degrees of freedom. Alternatively we may assume that

In summary, models in this paper can generate potentially useful new models, but these will require the proper use of h‐likelihood if they are to be useful for inferences.
Mario Cortina Borja (Institute of Child Health, London)

As an example of using the GAMLSS to model circular responses, I have analysed the number of cases of sudden infant death syndrome (SIDS) in the UK by month of death between 1983 and 1998; these data appear in Mooney et al. (2003) and were corrected to 31‐day months. Though it is not easy to decide on an optimal model, one strong contender, based on the Schwarz Bayesian criterion, fits a constant mean μ (indicating a peak incidence in January) and a natural cubic spline with three effective degrees of freedom as a function of year of death for the scale parameter κ. The fitted smooth curve for this parameter (Fig. 15) may reflect the effect of the ‘back‐to‐sleep’ campaign that was implemented in the early 1990s which reduced the number of SIDS cases by 70% in the UK; it corresponds to a dampening of the seasonal effect on SIDS.

Effect of year of death on the scale parameter of the von Mises distribution for the number of SIDS cases in the UK, 1983–1998
Non‐symmetric circular distributions and zero‐inflated distributions can be modelled as mixtures, and I wonder whether it would be easy to implement these in a GAMLSS.
N. T. Longford (SNTL, Leicester)
This paper competes with Lee and Nelder (1996) and their extensions, conveying the message that for any data structure and associations that we could possibly think of there are models and algorithms to fit them. But now models are introduced even for some structures that we would not have thought of …. I want to rephrase my comment on Lee and Nelder (1996) which I regard equally applicable to this paper. The new models are top of the range mathematical Ferraris, but the model selection that is used with them is like a sequence of tollbooths at which partially sighted operators inspect driver's licences and road worthiness certificates.
, each of them unbiased for the parameter of interest θ, and having sampling variance
estimated without bias by
, if model m is appropriate: not when it is selected, but when it is valid! Model selection, by whichever criterion and sequence of model comparisons, leads to the estimator

) is conventionally estimated by

, and does so not only because
is biased. The distribution of the mixture
is difficult to establish because the indicators Im are correlated with
.
A misconception underlying all attempts to find the model is that the maximum likelihood assuming the most parsimonious valid model is efficient. This is only asymptotically so. For some parameters (and finite samples), maximum likelihood under some invalid submodels of this model is more efficient because the squared bias that is incurred is smaller than the reduction of the variance. Proximity to asymptotics is not indicated well by the sample size because information about the parameters for the distributional tail behaviour is relatively modest in the complex models engaged.
Longford (2003, 2005) discusses the problem and proposes a solution.
Adrian Bowman (University of Glasgow)
I congratulate the authors on a further substantial advance in flexible modelling. The generalized linear model represented a major synthesis of regression models by allowing a wide range of types of response data and explanatory variables to be handled in a single unifying framework. The generalized additive model approach considerably extended this by allowing smooth nonparametric effects to be added to the list of available model components. The authors have gone substantially further by incorporating the rich set of tools that has been created by recent advances in mixed models and, in addition, by allowing models to describe the structure of parameters beyond the mean. The end result is an array of models of astonishing variety.
One major issue which this complexity raises is what tools can be used to navigate such an array of models? The authors rightly comment that particular applications provide contexts which can give guidance on the structure of individual components. Where the aim is one of prediction, as is the case in several of the examples of the paper, criteria such as Akaike's information criterion and the Schwarz Bayesian criterion are appropriate. However, where interest lies in more specific aspects of model components, such as the identification of whether an individual variable enters the model in a linear or nonparametric manner, or indeed may have no effect, then prediction‐based methods are less appropriate. Even with the usual form of generalized additive model, likelihood ratio test statistics do not have the usual χ2 null distributions and the problem seems likely to be exacerbated in the more complex setting of a generalized additive model for location, scale and shape.
In view of this, any further guidance which the authors could provide on how to interpret the global deviance column of Table 2, or more generally on appropriate reference distributions when comparing models, would be very welcome.
The following contribution was received in writing after the meeting.
T. J. Cole (Institute of Child Health, London)
I congratulate the authors on their development of the generalized additive model for location, scale and shape (GAMLSS). Its flexible approach to the modelling of higher moments of the distribution is very powerful and works particularly well with age‐related reference ranges.
In my experience with the LMS method (Cole and Green, 1992), which is a special case of the GAMLSS, it is difficult to choose the effective degrees of freedom (EDFs) for the cubic smoothing spline curves as there is no clear criterion for goodness of fit (see Pan and Cole (2004)). In theory the authors’ generalized Akaike information criterion GAIC(#) (Section 6.2) provides such a criterion, but in practice it can be supremely sensitive to the choice of the hyperparameter #. I am glad that the authors chose to highlight this in their first example (Section 7.1). With #=2.4 the shape parameter τ was modelled as a cubic smoothing spline with 6.1 EDFs (Fig. 2), whereas with #=2.5 it was modelled as a constant. The two most well‐known cases of the GAIC are the AIC itself (where #=2) and the Schwarz Bayesian criterion (SBC) (where #=log(n)=9.9 here), so the distinction between 2.4 and 2.5 is clearly tiny on this scale. The use of the SBC in the example would have led to a much more parsimonious model than for GAIC(2.5).
The take‐home message is that, although optimal GAMLSSs are simple to fit conditional on #, the choice of # is largely subjective on the scale from 2 to log(n) and can affect the model dramatically. In my view # should reflect the sample size in some way, so I prefer the SBC to the AIC. In addition it is good practice to reduce the EDFs as far as possible (Pan and Cole, 2004), which comes to the same thing. I also wonder whether a different GAIC might be applied to the different parameters of the distribution, so that for example an extra EDF used to model the shape parameter should be penalized more heavily than an extra EDF for the mean.
The authors replied later, in writing, as follows.
We thank all the disscusants for their constructive comments and reply below to the issues that were raised.
Distributions
An important advantage of the generalized additive model for location, scale and shape (GAMLSS) is that the model allows any distribution for the response variable y. In reply to Dr Borja, mixture distributions (including zero‐inflated distributions) are easily implemented in a GAMLSS. For example, a zero‐inflated negative binomial distribution (a mixture of zero with probability ν and a negative binomial NB(μ,σ) distribution with probability 1−ν) is easily implemented as a three‐parameter distribution (e.g. with log‐links for μ and σ and a logit link for ν). The beta distribution BE(μ,σ), which was suggested by Dr Lane, has now been implemented, as has an inflated beta distribution with additional point probabilities for y at 0 and 1.
The exponential family distribution that is used in generalized linear, additive and mixed models usually has at most two parameters: a mean parameter μ and a scale parameter φ (=σ in our notation). Having only two parameters it cannot model skewness and kurtosis. The exponential family distribution has been approximated by using extended quasi‐likelihood (see McCullagh and Nelder (1989)) and used in hierarchical generalized linear models (HGLMs) by Lee and Nelder (1996, 2001). However, extended quasi‐likelihood is not a proper distribution, as discussed in Section 2.3 of the paper, and suffers from the same skewness and kurtosis restrictions as the exponential family. The range of distributions that are available in HGLMs is extended via a random‐effect term. However, the GAMLSS allows any distribution for y and is conceptually simpler because it models the distribution of y directly, rather than via a random‐effect term. The level of generality of the double HGLM will be clearer on publication of Lee and Nelder (2004).
The four‐parameter distributions on ℜ that were discussed by Professor Jones can be implemented in a GAMLSS. The Box–Cox t‐ and Box–Cox power exponential distributions in the paper are four‐parameter distributions on ℜ+ for which there are fewer direct contenders. They are easy to fit in our experience and provide generalizations of the Box–Cox normal distribution (Cole and Green, 1992), which is widely used in centile estimation, allowing the modelling of kurtosis as well as skewness. Users are also welcome to implement other distributions.
Restricted maximum likelihood
Dr Lane and Professor Nelder highlight the use of restricted maximum likelihood (REML) estimation for reducing bias in parameter estimation. In the paper, the random‐effects hyperparameters λ are estimated by REML estimation, whereas the fixed effects parameters β and random‐effects parameters γ are estimated by posterior mode estimation, conditional on the estimated λ. If the total (effective) degrees of freedom for estimating the random effects γ and the fixed effects β1 for the distribution parameter μ are substantial relative to the total degrees of freedom (i.e. the sample size), then REML estimation of the hyperparameters λand the fixed effects (β2,β3,β4) for parameters (σ,ν,τ) respectively may be preferred. This is achieved in a GAMLSS by treating (β2,β3,β4) in the same way as λ in Appendix A.2.3 and obtaining the approximate marginal likelihood l(ζ1) for ζ1=(β2,β3,β4,λ) obtained by integrating out ζ2=(β1,γ) from the joint posterior density of ζ=(β,γ,λ), giving
, where
and
, evaluated at
, the posterior mode estimate of ζ2 given ζ1. Hence REML estimation of ζ1 is achieved by maximizing l(ζ1) over ζ1. This procedure leads to REML estimation of the scale and shape parameters and the random‐effects hyperparameters.
For example, in Hodges's data from Section 7.2 of the paper, the above procedure gives the following REML estimates (with the original estimates given in parentheses):
and
.
Bias in the estimators may be further reduced by use of a second‐order Laplace approximation to the integrated joint posterior density above; see Breslow and Lin (1995) and Lee and Nelder (2001).
Alternatively, other methods of bias reduction, e.g. bootstrapping, could be considered.
Model selection
Dr Lane and Professor Cole highlight the issue of the choice of penalty # in the generalized Akaike information criterion GAIC(#) that is used in the paper for model selection. The use of criterion GAIC(#) allows investigation of the sensitivity of the selected model to the choice of penalty #. This is well illustrated in the Dutch girls’ body mass index (BMI) data example from Section 7.1. The resulting optimal effective degrees of freedom that were selected for μ,σ,ν and τ and the estimated parameter ξ in the transformation x=ageξ are given in Table 4 for each of the penalties #=2, 2.4, 2.5, 9.9.
| #(criterion) | df | df | df | df | ξ |
|---|---|---|---|---|---|
| 2 (AIC) | 16.9 | 8.7 | 5.0 | 9.5 | 0.51 |
| 2.4 (GAIC) | 16.2 | 8.5 | 4.7 | 6.1 | 0.50 |
| 2.5 (GAIC) | 16.0 | 8.0 | 4.8 | 1 | 0.52 |
| 9.9 (SBC) | 12.3 | 6.3 | 3.7 | 1 | 0.53 |
The apparent sensitivity of df to # is due to the existence of two local optima. The value #=2.4 that is used in the paper is the critical value of # above which the optimization switches from one local optimum to the other. Reducing # below 2.4, or increasing # above 2.5, changes the selected degrees of freedom smoothly. Hence there are two clearly different models for the BMI, one corresponding to #2.4 and the other corresponding to #2.5.
A sensitivity analysis of the chosen model to outliers shows that the non‐constant τ‐function for #=2.4 in Fig. 2(d), and in particular the minima in τ at 0 and 4 years (with corresponding peaks in the kurtosis) are due to substantial numbers of outliers in the age ranges 0–0.5 and 3–5 years respectively. Consequently we believe that these peaks in kurtosis may be genuine, requiring physiological explanation. We therefore recommend the chosen model for #=2.4 as in the paper.
In our opinion the Schwarz Bayesian criterion (SBC) is too conservative (i.e. restrictive) in its model selection, leading to bias in the selected functions for μ,σ,ν and τ (particularly at turning‐points), whereas the AIC is too liberal, leading to rough (or erratic) selected functions. Fig. 16 gives the selected parameter functions using the AIC. Compare this with Fig. 14 of Simon Wood. The standard errors in Fig. 16 are conditional on the chosen degrees of freedom and ξ, and on the other selected parameter functions. The final selection of model(s) should be made with the expert prior knowledge of specialists in the field.

BMI data: fitted parameters against age by using the AIC for model selection
Conditioning on a single selected model ignores model uncertainty and generally leads to an under‐estimation of the uncertainty about quantities of interest, as discussed in Section 6.1. This issue was also raised by Dr Longford. Clearly it is an important issue, but not the focus of the current paper.
Where the focus is on whether an explanatory variable, say x, has a significant effect (rather than on prediction), then for a parametric GAMLSS this can be tested by using the generalized likelihood ratio test statistic Λ, as discussed in Section 6.2. The inadequacy of a linear function in x can be established by testing a linear against a polynomial function in x using Λ. The statistic Λ may be used as a guide to comparing a linear with a nonparametric smooth function in x, although, as pointed out by Professor Bowman, the asymptotic χ2‐distribution no longer applies, and so a formal test is not available.
Algorithm convergence
Dr Lane highlights the issue of possible convergence problems. Occasional problems with convergence may be due to one of the following reasons: using a highly inappropriate distribution for the response variable y (e.g. a symmetric distribution when y is highly skewed), using an unnecessarily complicated model (especially for σ, ν or τ), using extremely poor starting values (which is usually overcome by fitting a related model and using its fitted values as starting values for the current model) or overshooting in the Fisher scoring (or quasi‐Newton) algorithm (which is usually overcome for parametric models by using a reduced step length). Hence any convergence problems are usually easily resolved. The possibility of multiple maxima is investigated by using different starting values.
Extensions to generalized additive models for location, scale and shape
The GAMLSS has been extended to allow for non‐linear parametric terms, non‐normal random‐effects terms, correlations between random effects for different distribution parameters and incorporating priors for β and/or λ.
Conclusion
The GAMLSS provides a very general class of models for a univariate response variable, presented in a unified and coherent framework. The GAMLSS allows any distribution for the response variable and allows modelling of all the parameters of the distribution. The GAMLSS is highly suited to flexible data analysis and provides a framework that is suitable for educational objectives.
Appendices
Appendix A: Inferential framework for the generalized additive model for location, scale and shape
A.1. Posterior mode estimation of the parameters β and random effects γ
For the GAMLSS (1) we use an empirical Bayesian argument, to obtain MAP, or posterior mode, estimation (see Berger (1985)) of both the βks and the γjks assuming normal, possibly improper, priors for the γjks. We show below that this is equivalent to maximizing the penalized likelihood lp, which is given by equation (5). To show this we shall use arguments that have been developed in the statistical literature by Wahba (1978), Silverman (1985), Green (1985), Kohn and Ansley (1988), Speed (1991), Green and Silverman (1994), Verbyla et al. (1999), Hastie and Tibshirani (2000) and Fahrmeir and Lang (2001).
The components of a GAMLSS (1) are
- (a)
y, the response vector of length n,
- (b)
X=(X1,X2,…,Xp), design matrices,
- (c)
, linear parameters,
- (d)
Z=(Z11,Z21,…,ZJ11,…,Z1p,Z2p,…,ZJpp), design matrices,
- (e)
, random effects, and
- (f)
, hyperparameters.
(9)
(10)
. Hence, from expression (10),

, and c(y,λ) is a function of y and λ. Note that, for a GAMLSS, lp is equivalent, with respect to (β,γ), to the h‐likelihood of Lee and Nelder (1996, 2001a, b).
Hence lp is maximized over (β,γ), giving posterior mode (or MAP) estimation of (β,γ) and, for fixed hyperparameters λ, MAP estimation of β and γ is equivalent to maximizing the penalized likelihood lp that is given by equation (5).
The details of the RS and CG algorithms for maximizing the penalized likelihood lp, over both the parameters β and the random‐effects terms γ (for fixed hyperparameters λ), are given in Appendix B. The justification of the CG algorithm is given in Appendix C.
A.2. Hyperparameter estimation

The maximization of L(β,λ|y) over β and λ involves high dimensional integration so any approach to maximizing it will be computer intensive. Note that the maximum likelihood estimator for β from this approach will not in general be the same as the MAP estimator for β that was described in the previous section.
In restricted maximum likelihood (REML) estimation, effectively a non‐informative (constant) prior is assumed for β and both γ and β are integrated out of the joint density f(y,γ,β|λ) to give the marginal likelihood L(λ|y), which is maximized over λ.
In a fully Bayesian inference for the GAMLSS, the posterior distribution of (β,γ,λ) is obtained from equation (9), e.g. by using Markov chain Monte Carlo sampling; see Fahrmeir and Tutz (2001) or Fahrmeir and Lang (2001).
The above methods of estimation of the hyperparameters λ are in general highly computationally intensive: the maximum likelihood and REML methods require high dimensional integration, whereas the fully Bayes method requires Markov chain Monte Carlo sampling.
The following four methods, which do not require such computational intensity, are considered for hyperparameter estimation in GAMLSSs.
The methods are summarized in the following algorithm.
- (a)
Procedure 1: estimate the hyperparameters λ by one of the methods
- (i)
minimizing a profile generalized Akaike information criterion GAIC over λ,
- (ii)
minimizing a profile generalized cross‐validation criterion over λ,
- (iii)
maximizing the approximate marginal density (or profile marginal likelihood) for λ by using a Laplace approximation or
- (iv)
approximately maximizing the marginal likelihood for λ by using an (approximate) EM algorithm.
- (i)
- (b)
Procedure 2: for fixed current hyperparameters λ, use the GAMLSS (RS or CG) algorithm to obtain posterior mode (MAP) estimates of (β,γ).
Procedure 2 is nested within procedure 1 and a numerical algorithm is used to estimate λ.
We now consider the methods in more detail.
A.2.1. Minimizing a profile generalized Akaike information criterion over λ
GAIC (Akaike, 1983) was considered by Hastie and Tibshirani (1990), pages 160 and 261, for hyperparameter estimation in GAMs. In GAMs a cubic smoothing spline function h(x) is used to model the dependence of a predictor on explanatory variable x. For a single smoothing spline term, since λ is related to the smoothing degrees of freedom df=tr(S) through equation (6), selection (or estimation) of λ may be achieved by minimizing GAIC(#), which is defined in Section 6.2, over λ.
When the model contains p cubic smoothing spline functions in different explanatory variables, then the corresponding p smoothing hyperparameters λ=(λ1,λ2,…,λp) can be jointly estimated by minimizing GAIC(#) over λ. However, with multiple smoothing splines
is only an approximation to the full model complexity degrees of freedom.
The GAIC(#) criterion can be applied more generally to estimate hyperparameters λ in the distribution of random‐effects terms. The (model complexity) degrees of freedom df need to be obtained for models with random‐effects terms. This has been considered by Hodges and Sargent (2001). The degrees of freedom of a model with a single random‐effects term can be defined as the trace of the random‐effect (shrinkage) smoother S, i.e. df=tr(S), where S is given by equation (6). As with smoothing terms, when there are other terms in the model
is only an approximation to the full model complexity degrees of freedom. The full model complexity degrees of freedom for model (1) are given by df=tr(A−1B) where A is defined in Appendix C and B is obtained from A by omitting the matrices Gjk for j=1,2,…,Jk and k=1,2,…,p.
A.2.2. Minimizing a generalized cross‐validation over λ
The generalized cross‐validation criterion was considered by Hastie and Tibshirani (1990), pages 259–263, for hyperparameter estimation in GAMs. The criterion GAIC in Appendix A.2.1 is replaced by the generalized cross‐validation criterion, which is minimized over λ. Verbyla et al. (1999) considered the approximate equivalence of generalized cross‐validation and REML methods of estimating λ in smoothing splines models, which was considered in more detail by Wahba (1985) and Kohn et al. (1991).
A.2.3. Maximizing the approximate marginal density (or profile marginal likelihood) of λ by using a Laplace approximation
For GLMMs, Breslow and Clayton (1993) used a first‐order Laplace integral approximation to integrate out the random effects γ and to approximate the marginal likelihood, leading to estimating equations based on penalized quasi‐likelihood for the mean model parameters and pseudonormal (REML) likelihood for the dispersion components. Breslow and Lin (1995) extended this to a second‐order Laplace approximation.
Lee and Nelder (1996) took a similar approach, estimating the dispersion components by using a first‐order approximation to the Cox and Read (1987) profile likelihood which eliminates the nuisance parameters β from the marginal likelihood, which they called an adjusted profile h‐likelihood. Lee and Nelder (2001a) extended this to a second‐order approximation.
(11)
(12)
and
(13)
and
, the MAP estimates of β and γ given each fixed λ. (Note that matrix D is a rearrangement of matrix A from Appendix C.) Estimation of λ can be achieved by maximizing approximation (12) over λ (e.g. by using a numerical maximization algorithm). Alternatively, this can be considered as a generalization of REML estimation of λ, maximizing an approximate profile log‐likelihood for λ, denoted here as l(λ), given by replacing
by the expected information
, giving
(14)This is closely related to the adjusted profile h‐likelihood of Lee and Nelder (1996, 2001a, b).
A.2.4. Approximately maximizing the marginal likelihood for λ by using an (approximate) EM algorithm
An approximate EM algorithm was used by Fahrmeir and Tutz (2001), pages 298–303, and by Diggle et al. (2002), pages 172–175, to estimate hyperparameters in GLMMs and is similarly applied here to maximize approximately over λ the marginal likelihood of λ, L(λ) (or equivalently the posterior marginal distribution of λ for a non‐informative uniform prior).
, is approximated, where the expectation is over the posterior distribution of (β,γ) given y and
, i.e.
, where
is the current estimate of λ, giving, apart from a function of y,
(15)
and
are the posterior mode and curvature (i.e. submatix of A−1) of γjk from the MAP estimation in Appendix C.
is maximized over λ by a numerical maximization algorithm (e.g. the function optim in the R package). If Gjk=Gk for j=1,2,…,Jk and k=1,2,…,p, and the Gk are unconstrained positive definite symmetric matrices (e.g. in a random‐coefficients model), then equation (15) can be maximized explicitly giving, for k=1,2,…,p,
(16)Appendix B: The algorithms
B.1. Introduction
be the adjusted dependent variables and Wks be diagonal matrices of iterative weights, for k=1,2,…,p and s=1,2,…,p, which can have one of the forms



Let r be the outer cycle iteration index, k the parameter index, i the inner cycle iteration index, m the backfitting index and j the random‐effects (or nonparametric) term index. Also, for example, let
denote the current value of the vector γjk in the rth outer, ith inner and mth backfitting cycle iteration and let
denote the value of γjk at the convergence of the backfitting cycle for the ith inner cycle of the rth outer cycle, which is also the starting value
for the (i+1)th inner cycle of the rth outer cycle, for j=1,2,…,Jk and k=1,…,p. Note also, for example, that
means the current (i.e. most recently) updated estimate of γjk and the algorithm operates in the backfitting cycle of the ith inner cycle of the rth outer cycle.
B.2. The RS algorithm
Essentially the RS algorithm has an outer cycle which maximizes the penalized likelihood with respect to βk and γjk, for j=1,…,Jk, in the model successively for each θk in turn, for k=1,…,p. At each calculation in the algorithm the current updated values of all the quantities are used.
The RS algorithm is not a special case of the CG algorithm because in the RS algorithm the diagonal weight matrix Wkk is evaluated (i.e. updated) within the fitting of each parameter θk, whereas in the CG algorithm all weight matrices Wks for k=1,2,…,p and s=1,2,…,p are evaluated after fitting allθk for k=1,2,…,p.
The RS algorithm is as follows.
-
Step 1
: start—initialize fitted values
and random effects
, for j=1,…,Jk and k=1,2,…,p. Evaluate the initial linear predictors
, for k=1,2,…,p.
-
Step 2
: start the outer cycle r=1,2,… until convergence. For k=1,2,…,p:
- (a)
start the inner cycle i=1,2,… until convergence—
- (i)
evaluate the current
,
and
;
- (ii)
start the backfitting cycle m=1,2,… until convergence;
- (iii)
regress the current partial residuals
against design matrix Xk, using the iterative weights
to obtain the updated parameter estimates
;
- (iv)
for j=1,2,…,Jk smooth the partial residuals
, using the shrinking (smoothing) matrix Sjk given by equation (6) to obtain the updated (and current) additive predictor term
;
- (v)
end the backfitting cycle, on convergence of
and
and set
and
for j=1,2,…,Jk and otherwise update m and continue the backfitting cycle;
- (vi)
calculate the updated
and
;
- (i)
- (b)
end the inner cycle on convergence of
and the additive predictor terms
and set
,
, for j=1,2,…,Jk,
and
; otherwise update i and continue the inner cycle.
- (a)
-
Step 3: update the value of k.
-
Step 4: end the outer cycle—if the change in the (penalized) likelihood is sufficiently small; otherwise update r and continue the outer cycle.
B.3. The CG algorithm
Algorithm CG, based on Cole and Green (1992) is as follows.
-
Step 1
: start—initialize
and
for j=1,2,…,Jk and k=1,2,…,p. Evaluate
for k=1,2,…,p.
-
Step 2
: start the outer cycle r=1,2,… until convergence.
-
Step 3: evaluate and fix the current
,
and
for k=1,2,…,p and s=1,2,…,p. Perform a single rth step of the Newton–Raphson algorithm by
- (a)
starting the inner cycle i=1,2,… until convergence—for k=1,2,…,p,
- (i)
start the backfitting cycle m=1,2,… until convergence
and for j=1,2,…,Jk

- (ii)
end the backfitting cycle, on convergence of
and
and set
and
for j=1,2,…,Jk and otherwise update m and continue the backfitting cycle, and
- (iii)
calculate the updated
and
and then update k;
- (i)
- (b)
end the inner cycle on convergence of
and the additive predictor terms
and set
,
,
and
, for j=1,2,…,Jk and k=1,2,…,p; otherwise update i and continue the inner cycle.
- (a)
-
Step 4: end the outer cycle if the change in the (penalized) likelihood is sufficiently small; otherwise update r and continue the outer cycle.
The matrices
and
, which are defined in Appendix C, are the projection matrices and the shrinking matrices, for the parametric and additive components of the model respectively, at the rth iteration, for j=1,2,…,Jk and k=1,2,…,p.
and
are the current working variables for fitting the parametric and the additive (random‐effects or smoothing) components of the model respectively and are defined as


for k=1,2,…,p, at the end of the inner cycle for the rth outer cycle and then evaluating
,
and
, for k=1,2,…,p and s=1,2,…,p, using the
for k=1,2,…,p. The optimum step length for a particular iteration r can be obtained by maximizing lp(α) over α.
The inner (backfitting) cycle of the algorithm can be shown to converge (for cubic smoothing splines and similar linear smoothers); Hastie and Tibshirani (1990), chapter 5. The outer cycle is simply a Newton–Raphson algorithm. Thus, if step size optimization is performed, the outer loop will converge as well. Standard general results on the Newton–Raphson algorithm ensure convergence (Ortega and Rheinboldt, 1970). Step optimization is rarely needed in practice in our experience.
Appendix C: Maximization of the penalized likelihood
In this appendix it is shown that maximization of the penalized log‐likelihood function lp that is given by equation (5) over the parameters βk and terms γjk for j=1,2,…,Jk and k=1,2,…,p leads to the algorithm that is described in Appendix B.
This is achieved by the following two steps.
- (a)
The first and second derivatives of equation (5) are obtained to give a Newton–Raphson step for maximizing equation (5) with respect to βk and γjk for j=1,2,…,Jk and k=1,2,…,p.
- (b)
Each step of the Newton–Raphson algorithm is achieved by using a backfitting procedure cycling through the parameters and through the additive terms of the k linear predictors.
C.1. Step (a)
The algorithm maximizes the penalized likelihood function lp, given by equation (5), using a Newton–Raphson algorithm. The first derivative (score function) and the second derivatives of lp with respect to βk and γjk for all j=1,2,…,Jk and k=1,2,…,p are evaluated at iteration r at the current predictors
for k=1,2,…,p.
Let
, ak=∂lp/∂αk and
for k=1,2,…,p and s=1,2,…,p, and let
, a=∂lp/∂α and A=−∂2lp/∂α∂αT.



over i=1,2,…,n, for k=1,2,…,p and s=1,2,…,p (see Appendix B for alternative weight matrices).
C.2. Step (b)

(17)

is the adjusted dependent variable. (A device for obtaining updated estimate
in equation (17) is to apply weighted least squares estimation to an augmented data model given by
(18)
,
and ejk∼N(0,I). This device can be generalized to estimate αk and even α.)
(19)

A single rth Newton–Raphson step is achieved by using a backfitting procedure for each k, cycling through equation (19) and then equation (17) for j=1,2,…,Jk and cycling over k=1,2,…,p until convergence of the set of updated values
for k=1,2,…,p. The updated predictors
, first derivatives
, diagonal weighted matrices
and adjusted dependent variables
, for k=1,2,…,p and s=1,2,…,p, are then calculated and the (r+1)th Newton–Raphson step is performed, until convergence of the Newton–Raphson algorithm.
References in the discussion
Citing Literature
Number of times cited according to CrossRef: 668
- Pengcheng Xu, Dong Wang, Vijay P. Singh, Huayu Lu, Yuankun Wang, Jichun Wu, Lachun Wang, Jiufu Liu, Jianyun Zhang, Copula-based seasonal rainfall simulation considering nonstationarity, Journal of Hydrology, 10.1016/j.jhydrol.2020.125439, (125439), (2020).
- Eva-Maria Rathke, Julia Fischer, Differential ageing trajectories in motivation, inhibitory control and cognitive flexibility in Barbary macaques ( Macaca sylvanus ) , Philosophical Transactions of the Royal Society B: Biological Sciences, 10.1098/rstb.2019.0617, 375, 1811, (20190617), (2020).
- Vincent J. Maffei, Tekeda F. Ferguson, Meghan M. Brashear, Donald E. Mercante, Katherine P. Theall, Robert W. Siggins, Christopher M. Taylor, Patricia Molina, David A. Welsh, Lifetime alcohol use among persons living with HIV is associated with frailty, AIDS, 10.1097/QAD.0000000000002426, 34, 2, (245-254), (2020).
- Miroslav Popper, Ivan Lukšík, Martin Kanovský, Quality of life in children brought up by married and cohabiting couples, Human Affairs, 10.1515/humaff-2020-0005, 30, 1, (47-59), (2020).
- Mei‐Man Lee, Susan A. Jebb, Jason Oke, Carmen Piernas, Reference values for skeletal muscle mass and fat mass measured by bioelectrical impedance in 390 565 UK adults, Journal of Cachexia, Sarcopenia and Muscle, 10.1002/jcsm.12523, 11, 2, (487-496), (2020).
- Jan-Hendrik Meier, Stephan Schneider, Chan Le, Iwana Schmidt, Short-Term Electricity Price Forecasting: Deep ANN vs GAM, Information and Communication Technologies in Education, Research, and Industrial Applications, 10.1007/978-3-030-39459-2_12, (257-276), (2020).
- Nadja Klein, Thomas Kneib, Giampiero Marra, Rosalba Radice, Bayesian mixed binary-continuous copula regression with an application to childhood undernutrition, Flexible Bayesian Regression Modelling, 10.1016/B978-0-12-815862-3.00011-1, (121-152), (2020).
- Jianqing Fan, Yang Feng, Lucy Xia, A projection-based conditional dependence measure with applications to high-dimensional undirected graphical models, Journal of Econometrics, 10.1016/j.jeconom.2019.12.016, (2020).
- Mark Pilling, Natasha Clarke, Rachel Pechey, Gareth J. Hollands, Theresa M. Marteau, The effect of wine glass size on volume of wine sold: a mega‐analysis of studies in bars and restaurants, Addiction, 10.1111/add.14998, 115, 9, (1660-1667), (2020).
- Katie Dunkley, Ashley J. W. Ward, Sarah E. Perkins, Jo Cable, To clean or not to clean: Cleaning mutualism breakdown in a tidal environment, Ecology and Evolution, 10.1002/ece3.6120, 10, 6, (3043-3054), (2020).
- Lior Drukker, Eleonora Staines-Urias, José Villar, Fernando C. Barros, Maria Carvalho, Shama Munim, Rose McGready, Francois Nosten, James A. Berkley, Shane A. Norris, Ricardo Uauy, Stephen H. Kennedy, Aris T. Papageorghiou, International gestational age-specific centiles for umbilical artery Doppler indices: a longitudinal prospective cohort study of the INTERGROWTH-21st Project, American Journal of Obstetrics and Gynecology, 10.1016/j.ajog.2020.01.012, (2020).
- R. N. Bigirinama, J. A. Ntaongo, D. Batumbo, N. A. Sam‐Agudu, P. D. M. C. Katoto, L. N. Byamungu, K. Karume, J. B. Nachega, D. N. Bompangue, Environmental and anthropogenic factors associated with increased malaria incidence in South‐Kivu Province, Democratic Republic of the Congo, Tropical Medicine & International Health, 10.1111/tmi.13379, 25, 5, (600-611), (2020).
- Kristian Kleinke, Jost Reinecke, Daniel Salfrán, Martin Spiess, Kristian Kleinke, Jost Reinecke, Daniel Salfrán, Martin Spiess, Multiple Imputation: Theory, Applied Multiple Imputation, 10.1007/978-3-030-38164-6_4, (85-131), (2020).
- Kristian Kleinke, Jost Reinecke, Daniel Salfrán, Martin Spiess, Kristian Kleinke, Jost Reinecke, Daniel Salfrán, Martin Spiess, Multiple Imputation: New Developments, Applied Multiple Imputation, 10.1007/978-3-030-38164-6_6, (219-256), (2020).
- Hong Kyu Park, Young Suk Shim, Distribution of Tri-Ponderal Mass Index and its Relation to Body Mass Index in Children and Adolescents Aged 10 to 20 Years, The Journal of Clinical Endocrinology & Metabolism, 10.1210/clinem/dgaa030, 105, 3, (2020).
- Sarah J. J. Adcock, Cassandra B. Tucker, Conditioned place preference reveals ongoing pain in calves 3 weeks after disbudding, Scientific Reports, 10.1038/s41598-020-60260-7, 10, 1, (2020).
- Karoliina Hämäläinen, Elena Saltikoff, Otto Hyvärinen, Ville Vakkari, Sami Niemelä, Assessment of Probabilistic Wind Forecasts at 100 m Above Ground Level Using Doppler Lidar and Weather Radar Wind Profiles, Monthly Weather Review, 10.1175/MWR-D-19-0184.1, 148, 3, (1321-1334), (2020).
- Lauren A Fowler, Lacey N Dennis-Cornelius, John A Dawson, Robert J Barry, James L Davis, Mickie L Powell, Yuan Yuan, Michael B Williams, Robert Makowsky, Louis R D'Abramo, Stephen A Watts, Both Dietary Ratio of n–6 to n–3 Fatty Acids and Total Dietary Lipid Are Positively Associated with Adiposity and Reproductive Health in Zebrafish, Current Developments in Nutrition, 10.1093/cdn/nzaa034, 4, 4, (2020).
- Rosanna J. Milligan, E. Marian Scott, Daniel O. B. Jones, Brian J. Bett, Alan J. Jamieson, Robert O’Brien, Sofia Pereira Costa, Gilbert T. Rowe, Henry A. Ruhl, Ken L. Smith, Philippe Susanne, Michael F. Vardaro, David M. Bailey, Evidence for seasonal cycles in deep‐sea fish abundances: A great migration in the deep SE Atlantic?, Journal of Animal Ecology, 10.1111/1365-2656.13215, 89, 7, (1593-1603), (2020).
- Xinyan Zhang, Nengjun Yi, Fast zero-inflated negative binomial mixed modeling approach for analyzing longitudinal metagenomics data, Bioinformatics, 10.1093/bioinformatics/btz973, 36, 8, (2345-2351), (2020).
- Claire R. Brandenburger, Martin Kim, Eve Slavich, Floret L. Meredith, Juha‐Pekka Salminen, William B. Sherwin, Angela T. Moles, Evolution of defense and herbivory in introduced plants—Testing enemy release using a known source population, herbivore trials, and time since introduction, Ecology and Evolution, 10.1002/ece3.6288, 10, 12, (5451-5463), (2020).
- Venkatesh Uddameri, Ali Ghaseminejad, E. Annette Hernandez, A tiered stochastic framework for assessing crop yield loss risks due to water scarcity under different uncertainty levels, Agricultural Water Management, 10.1016/j.agwat.2020.106226, 238, (106226), (2020).
- Ross Sparks, Aditya Joshi, Cecile Paris, Sarvnaz Karimi, C. Raina MacIntyre, Monitoring events with application to syndromic surveillance using social media data, Engineering Reports, 10.1002/eng2.12152, 2, 5, (2020).
- Ciaran Gilbert, Jethro Browell, David McMillan, Probabilistic access forecasting for improved offshore operations, International Journal of Forecasting, 10.1016/j.ijforecast.2020.03.007, (2020).
- Theresa N. Chimponda, Takafira Mduluza, Inflammation during Schistosoma haematobium infection and anti‐allergy in pre‐school‐aged children living in a rural endemic area in Zimbabwe, Tropical Medicine & International Health, 10.1111/tmi.13376, 25, 5, (618-623), (2020).
- Christiana Rousseva, Vishnu Kammath, Tara Tancred, Helen Smith, Health workers’ views on audit in maternal and newborn healthcare in LMICs: a qualitative evidence synthesis, Tropical Medicine & International Health, 10.1111/tmi.13377, 25, 5, (525-539), (2020).
- Tracey Smythe, Jaimie D Adelson, Sarah Polack, Systematic review of interventions for reducing stigma experienced by children with disabilities and their families in low‐ and middle‐income countries: state of the evidence, Tropical Medicine & International Health, 10.1111/tmi.13388, 25, 5, (508-524), (2020).
- Irene Njuguna, Kristin Beima‐Sofie, Caren Mburu, Danae Black, Yolanda Evans, Brandon Guthrie, Anjuli D Wagner, Cyrus Mugo, Jillian Neary, Janet Itindi, Alvin Onyango, Dalton Wamalwa, Grace John‐Stewart, What happens at adolescent and young adult HIV clinics? A national survey of models of care, transition and disclosure practices in Kenya, Tropical Medicine & International Health, 10.1111/tmi.13374, 25, 5, (558-565), (2020).
- Elena Bassi, Andrea Gazzola, Paolo Bongi, Massimo Scandura, Marco Apollonio, Relative impact of human harvest and wolf predation on two ungulate species in Central Italy, Ecological Research, 10.1111/1440-1703.12130, 35, 4, (662-674), (2020).
- R. Zahan, S. Khan, D.C. Rennie, C.P. Karunanayake, M. Fenton, J. Seeseequasis, D. Arnault, J. Gardipy, J.A. Dosman, P. Pahwa, Lung function reference equations and lower limit of normal for Cree First Nations Children and adolescents living in rural Saskatchewan, Canada, Model Assisted Statistics and Applications, 10.3233/MAS-200485, 15, 2, (153-165), (2020).
- Marco Palma, Shahin Tavakoli, Julia Brettschneider, Thomas E. Nichols, Quantifying uncertainty in brain-predicted age using scalar-on-image quantile regression, NeuroImage, 10.1016/j.neuroimage.2020.116938, (116938), (2020).
- Kun Li, Lixin Hu, Yaguang Peng, Ruohua Yan, Qiliang Li, Xiaoxia Peng, Wenqi Song, Xin Ni, Comparison of four algorithms on establishing continuous reference intervals for pediatric analytes with age-dependent trend, BMC Medical Research Methodology, 10.1186/s12874-020-01021-y, 20, 1, (2020).
- Giampiero Marra, Rosalba Radice, David M. Zimmer, Estimating the binary endogenous effect of insurance on doctor visits by copula‐based regression additive models, Journal of the Royal Statistical Society: Series C (Applied Statistics), 10.1111/rssc.12419, 69, 4, (953-971), (2020).
- Ali Manouchehrinia, Pernilla Stridh, Mohsen Khademi, David Leppert, Christian Barro, Zuzanna Michalak, Pascal Benkert, Jan Lycke, Lars Alfredsson, Ludwig Kappos, Fredrik Piehl, Tomas Olsson, Jens Kuhle, Ingrid Kockum, Plasma neurofilament light levels are associated with risk of disability in multiple sclerosis, Neurology, 10.1212/WNL.0000000000009571, 94, 23, (e2457-e2467), (2020).
- Roselinde Kessels, Anne Hoornweg, Thi Kim Thanh Bui, Guido Erreygers, A distributional regression approach to income-related inequality of health in Australia, International Journal for Equity in Health, 10.1186/s12939-020-01189-1, 19, 1, (2020).
- Christian Winkler, Michael Neidlin, Simon J. Sonntag, Anna Grünwald, Sascha Groß-Hardt, Johannes Breuer, Katharina Linden, Ulrike Herberg, Estimation of left ventricular stroke work based on a large cohort of healthy children, Computers in Biology and Medicine, 10.1016/j.compbiomed.2020.103908, 123, (103908), (2020).
- Mukhtar Ahmed, undefined Fayyaz-ul-Hassan, Shakeel Ahmad, Rifat Hayat, Muhammad Ali Raza, Application of Generalized Additive Model for Rainfall Forecasting in Rainfed Pothwar, Pakistan, Systems Modeling, 10.1007/978-981-15-4728-7, (403-414), (2020).
- Hao-Ming Dong, F. Xavier Castellanos, Ning Yang, Zhe Zhang, Quan Zhou, Ye He, Lei Zhang, Ting Xu, Avram J. Holmes, B.T. Thomas Yeo, Feiyan Chen, Bin Wang, Christian Beckmann, Tonya White, Olaf Sporns, Jiang Qiu, Tingyong Feng, Antao Chen, Xun Liu, Xu Chen, Xuchu Weng, Michael P. Milham, Xi-Nian Zuo, Charting brain growth in tandem with brain templates for schoolchildren, Science Bulletin, 10.1016/j.scib.2020.07.027, (2020).
- 秋萍 李, Analysis of Driving Factors of Carbon Dioxide Emission Based on GAMLSS Model, Advances in Applied Mathematics, 10.12677/AAM.2020.98137, 09, 08, (1177-1186), (2020).
- Monsurul Hoq, Louise Canterford, Susan Matthews, Gulshan Khanom, Vera Ignjatovic, Paul Monagle, Susan Donath, John Carlin, Statistical methods used in the estimation of age-specific paediatric reference intervals for laboratory blood tests: a systematic review, Clinical Biochemistry, 10.1016/j.clinbiochem.2020.08.002, (2020).
- H. Peter, G. Singer, A. J. Ulseth, T. Dittmar, Y. T. Prairie, T. J. Battin, Travel Time and Source Variation Explain the Molecular Transformation of Dissolved Organic Matter in an Alpine Stream Network, Journal of Geophysical Research: Biogeosciences, 10.1029/2019JG005616, 125, 8, (2020).
- Noah John Phillips, Ginta Motohashi, Kohtaro Ujiie, Christie D. Rowe, Evidence of Localized Failure Along Altered Basaltic Blocks in Tectonic Mélange at the Updip Limit of the Seismogenic Zone: Implications for the Shallow Slow Earthquake Source, Geochemistry, Geophysics, Geosystems, 10.1029/2019GC008839, 21, 7, (2020).
- Dario Bilardello, Subir K. Banerjee, Michael W. R. Volk, Jennifer A. Soltis, R. Lee Penn, Simulation of Natural Iron Oxide Alteration in Soil: Conversion of Synthetic Ferrihydrite to Hematite Without Artificial Dopants, Observed With Magnetic Methods, Geochemistry, Geophysics, Geosystems, 10.1029/2020GC009037, 21, 7, (2020).
- G. L. Christeson, R. S. Reece, D. A. Kardell, J. D. Estep, A. Fedotova, J. A. Goff, South Atlantic Transect: Variations in Oceanic Crustal Structure at 31°S, Geochemistry, Geophysics, Geosystems, 10.1029/2020GC009017, 21, 7, (2020).
- Mang Lin, Mark H. Thiemens, A Simple Elemental Sulfur Reduction Method for Isotopic Analysis and Pilot Experimental Tests of Symmetry‐Dependent Sulfur Isotope Effects in Planetary Processes, Geochemistry, Geophysics, Geosystems, 10.1029/2020GC009051, 21, 7, (2020).
- Artur J. Lemonte, A parametric regression framework for the skew sinh-arcsinh t distribution, Applied Mathematical Modelling, 10.1016/j.apm.2020.08.036, (2020).
- Qiyun Pan, Eunshin Byon, Young Myoung Ko, Henry Lam, Adaptive importance sampling for extreme quantile estimation with stochastic black box computer models, Naval Research Logistics (NRL), 10.1002/nav.21938, 67, 7, (524-547), (2020).
- Jethro Browell, Ciaran Gilbert, undefined, 2020 International Conference on Probabilistic Methods Applied to Power Systems (PMAPS), 10.1109/PMAPS47429.2020.9183441, (1), (2020).
- Vincent J Maffei, Robert W Siggins, Meng Luo, Meghan M Brashear, Donald E Mercante, Christopher M Taylor, Patricia Molina, David A Welsh, Alcohol Use Is Associated With Intestinal Dysbiosis and Dysfunctional CD8+ T-Cell Phenotypes in Persons With Human Immunodeficiency Virus, The Journal of Infectious Diseases, 10.1093/infdis/jiaa461, (2020).
- Leydson G. Dantas, Carlos A. C. dos Santos, Ricardo A. de Olinda, José I. B. de Brito, Celso A. G. Santos, Eduardo S. P. R. Martins, Gabriel de Oliveira, Nathaniel A. Brunsell, Rainfall Prediction in the State of Paraíba, Northeastern Brazil Using Generalized Additive Models, Water, 10.3390/w12092478, 12, 9, (2478), (2020).
- Ezra Gayawan, Oluwatoyin Deborah Fasusi, Dipankar Bandyopadhyay, Structured additive distributional zero augmented beta regression modeling of mortality in Nigeria, Spatial Statistics, 10.1016/j.spasta.2020.100415, (100415), (2020).
- Marco A.F. Pimentel, Oliver C. Redfern, Robert Hatch, J. Duncan Young, Lionel Tarassenko, Peter J. Watkinson, Trajectories of vital signs in patients with COVID-19, Resuscitation, 10.1016/j.resuscitation.2020.09.002, 156, (99-106), (2020).
- Kevin Burke, M. C. Jones, Angela Noufaily, A flexible parametric modelling framework for survival analysis, Journal of the Royal Statistical Society: Series C (Applied Statistics), 10.1111/rssc.12398, 69, 2, (429-457), (2020).
- Ross Stewart Sparks, Hossein Hazrati-Marangaloo, Exponentially Weighted Moving Averages of Counting Processes When the Time between Events Is Weibull Distributed, Control Charts [Working Title], 10.5772/intechopen.83306, (2020).
- Laura-Chloé Kuntz, Beta dispersion and market timing, Journal of Empirical Finance, 10.1016/j.jempfin.2020.09.003, (2020).
- Michał Narajewski, Florian Ziel, Ensemble forecasting for intraday electricity prices: Simulating trajectories, Applied Energy, 10.1016/j.apenergy.2020.115801, 279, (115801), (2020).
- Bin Xiong, Lihua Xiong, Shenglian Guo, Chong‐Yu Xu, Jun Xia, Yixuan Zhong, Han Yang, Nonstationary Frequency Analysis of Censored Data: A Case Study of the Floods in the Yangtze River From 1470 to 2017, Water Resources Research, 10.1029/2020WR027112, 56, 8, (2020).
- Patrick Michaelis, Nadja Klein, Thomas Kneib, Mixed discrete‐continuous regression—A novel approach based on weight functions, Stat, 10.1002/sta4.277, 9, 1, (2020).
- Dushan P. Kumarathunge, John E. Drake, Mark G. Tjoelker, Rosana López, Sebastian Pfautsch, Angelica Vårhammar, Belinda E. Medlyn, The temperature optima for tree seedling photosynthesis and growth depend on water inputs, Global Change Biology, 10.1111/gcb.14975, 26, 4, (2544-2560), (2020).
- Guillermo Briseño Sanchez, Maike Hohberg, Andreas Groll, Thomas Kneib, Flexible instrumental variable distributional regression, Journal of the Royal Statistical Society: Series A (Statistics in Society), 10.1111/rssa.12598, 183, 4, (1553-1574), (2020).
- Moritz N. Lang, Lisa Schlosser, Torsten Hothorn, Georg J. Mayr, Reto Stauffer, Achim Zeileis, Circular regression trees and forests with an application to probabilistic wind direction forecasting, Journal of the Royal Statistical Society: Series C (Applied Statistics), 10.1111/rssc.12437, 69, 5, (1357-1374), (2020).
- Yanqiu Zhou, Ying Li, Shunqing Xu, Jiaqiang Liao, Hongna Zhang, Jiufeng Li, Yanjun Hong, Wei Xia, Zongwei Cai, Prenatal Exposure to Benzotraizoles and Benzothiazoles in Relation to Fetal and Birth Size: A Longitudinal Study, Journal of Hazardous Materials, 10.1016/j.jhazmat.2020.122828, (122828), (2020).
- A. Rodríguez-González, A.L. May-Tec, J. Herrera-Silveira, C. Puch-Hau, M. Quintanilla-Mena, J. Villafuerte, I. Velázquez-Abunader, M.L. Aguirre-Macedo, V.M. Vidal-Martínez, Fluctuating asymmetry of sclerotized structures of Haliotrematoides spp. (Monogenea: Dactylogyridae) as bioindicators of aquatic contamination, Ecological Indicators, 10.1016/j.ecolind.2020.106548, 117, (106548), (2020).
- David Pitt, Stefan Trück, Rob van den Honert, Wan Wah Wong, Modeling risks from natural hazards with generalized additive models for location, scale and shape, Journal of Environmental Management, 10.1016/j.jenvman.2020.111075, 275, (111075), (2020).
- Yanlai Zhou, Exploring multidecadal changes in climate and reservoir storage for assessing nonstationarity in flood peaks and risks worldwide by an integrated frequency analysis approach, Water Research, 10.1016/j.watres.2020.116265, (116265), (2020).
- Allison Freeman, Bruce Desmarais, Portfolio Adjustment to Home Equity Accumulation among CRA Borrowers, Journal of Housing Research, 10.1080/10835547.2011.12092038, 20, 2, (141-160), (2020).
- Shi Li, Yi Qin, Yixiu Liu, Xiaoyu Song, Qiang Liu, Ziwen Li, Estimating the design flood under the influence of check dams by removing nonstationarity from the flood peak discharge series, Hydrology Research, 10.2166/nh.2020.050, (2020).
- Pantelis Samartsidis, Silvia Montagna, Angela R. Laird, Peter T. Fox, Timothy D. Johnson, Thomas E. Nichols, Estimating the prevalence of missing experiments in a neuroimaging meta‐analysis, Research Synthesis Methods, 10.1002/jrsm.1448, 0, 0, (2020).
- H. Vittal, Gabriele Villarini, Wei Zhang, Early prediction of the Indian summer monsoon rainfall by the Atlantic Meridional Mode, Climate Dynamics, 10.1007/s00382-019-05117-0, (2020).
- Leonie Weinhold, Matthias Schmid, Richard Mitchell, Kelly O. Maloney, Marvin N. Wright, Moritz Berger, A Random Forest Approach for Bounded Outcome Variables, Journal of Computational and Graphical Statistics, 10.1080/10618600.2019.1705310, (1-20), (2020).
- Julio Cezar Souza Vasconcelos, Cristian Villegas, Generalized symmetrical partial linear model, Journal of Applied Statistics, 10.1080/02664763.2020.1726301, (1-16), (2020).
- Kirsten S. de Fluiter, Inge A.L.P. van Beijsterveldt, Wesley J. Goedegebuure, Laura M. Breij, Alexander M. J. Spaans, Dennis Acton, Anita C. S. Hokken-Koelega, Longitudinal body composition assessment in healthy term-born infants until 2 years of age using ADP and DXA with vacuum cushion, European Journal of Clinical Nutrition, 10.1038/s41430-020-0578-7, (2020).
- Constanze Hamatschek, Efrah I. Yousuf, Lea Sophie Möllers, Hon Yiu So, Katherine M. Morrison, Christoph Fusch, Niels Rochow, Fat and Fat-Free Mass of Preterm and Term Infants from Birth to Six Months: A Review of Current Evidence, Nutrients, 10.3390/nu12020288, 12, 2, (288), (2020).
- Ana Gonzalez-Blanks, Jessie M. Bridgewater, Tuppett M. Yates, Statistical Approaches for Highly Skewed Data: Evaluating Relations between Maltreatment and Young Adults’ Non-Suicidal Self-injury, Journal of Clinical Child & Adolescent Psychology, 10.1080/15374416.2020.1724543, (1-15), (2020).
- Jue Lin-Ye, Manuel García-León, Vicente Gràcia, María Isabel Ortego, Piero Lionello, Dario Conte, Begoña Pérez-Gómez, Agustín Sánchez-Arcilla, Modeling of Future Extreme Storm Surges at the NW Mediterranean Coast (Spain), Water, 10.3390/w12020472, 12, 2, (472), (2020).
- Liza Toemen, Susana Santos, Arno A W Roest, Meike W Vernooij, Willem A Helbing, Romy Gaillard, Vincent W V Jaddoe, Pericardial adipose tissue, cardiac structures, and cardiovascular risk factors in school-age children, European Heart Journal - Cardiovascular Imaging, 10.1093/ehjci/jeaa031, (2020).
- Linhan Yang, Jianzhu Li, Aiqing Kang, Shuai Li, Ping Feng, The Effect of Nonstationarity in Rainfall on Urban Flooding Based on Coupling SWMM and MIKE21, Water Resources Management, 10.1007/s11269-020-02522-7, (2020).
- Annemarie van der Marel, Jane M Waterman, Marta López-Darias, Social organization in a North African ground squirrel, Journal of Mammalogy, 10.1093/jmammal/gyaa031, (2020).
- Scott D. Roloson, Sean J. Landsman, Raymond Tana, Brendan J. Hicks, Jon W. Carr, Fred Whoriskey, Michael R. van den Heuvel, Otolith microchemistry and acoustic telemetry reveal anadromy in non-native rainbow trout ( Oncorhynchus mykiss ) in Prince Edward Island, Canada , Canadian Journal of Fisheries and Aquatic Sciences, 10.1139/cjfas-2019-0229, (1-14), (2020).
- Josephin Hirschel, Mandy Vogel, Ronny Baber, Antje Garten, Carl Beuchel, Yvonne Dietz, Julia Dittrich, Antje Körner, Wieland Kiess, Uta Ceglarek, Relation of Whole Blood Amino Acid and Acylcarnitine Metabolome to Age, Sex, BMI, Puberty, and Metabolic Markers in Children and Adolescents, Metabolites, 10.3390/metabo10040149, 10, 4, (149), (2020).
- Georgios Papageorgiou, Benjamin C. Marshall, Bayesian Semiparametric Analysis of Multivariate Continuous Responses, With Variable Selection, Journal of Computational and Graphical Statistics, 10.1080/10618600.2020.1739534, (1-14), (2020).
- Elena Reginato, Danila Azzolina, Franco Folino, Romina Valentini, Camilla Bendinelli, Claudia Elena Gafare, Elisa Cainelli, Luca Vedovelli, Sabino Iliceto, Dario Gregori, Giulia Lorenzoni, Dietary and Lifestyle Patterns are Associated with Heart Rate Variability, Journal of Clinical Medicine, 10.3390/jcm9041121, 9, 4, (1121), (2020).
- ROY GAVA, JULIEN M. JAQUET, PASCAL SCIARINI, Legislating or rubber‐stamping? Assessing parliament's influence on law‐making with text reuse, European Journal of Political Research, 10.1111/1475-6765.12395, 0, 0, (2020).
- Simon N. Wood, Inference and computation with generalized additive models and their extensions, TEST, 10.1007/s11749-020-00711-5, (2020).
- Nina M. Pruzinsky, Rosanna J. Milligan, Tracey T. Sutton, Pelagic Habitat Partitioning of Late-Larval and Juvenile Tunas in the Oceanic Gulf of Mexico, Frontiers in Marine Science, 10.3389/fmars.2020.00257, 7, (2020).
- Zhengguo Gu, Wilco H. M. Emons, Klaas Sijtsma, Precision and Sample Size Requirements for Regression-Based Norming Methods for Change Scores, Assessment, 10.1177/1073191120913607, (107319112091360), (2020).
- George Tzougas, Dimitris Karlis, AN EM ALGORITHM FOR FITTING A NEW CLASS OF MIXED EXPONENTIAL REGRESSION MODELS WITH VARYING DISPERSION, ASTIN Bulletin, 10.1017/asb.2020.13, (1-29), (2020).
- Ana María Santana-Piñeros, Yanis Cruz-Quintana, Ana Luisa May-Tec, Geormery Mera-Loor, María Leopoldina Aguirre-Macedo, Eduardo Suárez-Morales, David González-Solís, The 2015-2016 El Niño increased infection parameters of copepods on Eastern Tropical Pacific dolphinfish populations, PLOS ONE, 10.1371/journal.pone.0232737, 15, 5, (e0232737), (2020).
- Jeremy Rohmer, Pierre Gehl, Marine Marcilhac-Fradin, Yves Guigueno, Nadia Rahni, Julien Clément, Non-stationary extreme value analysis applied to seismic fragility assessment for nuclear safety analysis, Natural Hazards and Earth System Sciences, 10.5194/nhess-20-1267-2020, 20, 5, (1267-1285), (2020).
- Aneurin Young, Edward T Andrews, James John Ashton, Freya Pearson, R Mark Beattie, Mark John Johnson, Generating longitudinal growth charts from preterm infants fed to current recommendations, Archives of Disease in Childhood - Fetal and Neonatal Edition, 10.1136/archdischild-2019-318404, (fetalneonatal-2019-318404), (2020).
- Katherine C. Hustad, Tristan Mahr, Phoebe E. M. Natzke, Paul J. Rathouz, Development of Speech Intelligibility Between 30 and 47 Months in Typically Developing Children: A Cross-Sectional Study of Growth, Journal of Speech, Language, and Hearing Research, 10.1044/2020_JSLHR-20-00008, (1-13), (2020).
- Nayara Dornela Quintino, Ester Cerdeira Sabino, José Luiz Padilha da Silva, Antonio Luiz Pinho Ribeiro, Ariela Mota Ferreira, Gabriela Lemes Davi, Claudia Di Lorenzo Oliveira, Clareci Silva Cardoso, Factors associated with quality of life in patients with Chagas disease: SaMi-Trop project, PLOS Neglected Tropical Diseases, 10.1371/journal.pntd.0008144, 14, 5, (e0008144), (2020).
- Stephan M. Funk, Belén Palomo Guerra, Amalia Bueno Zamora, Amy Ickowitz, Nicias Afoumpam Poni, Mohamadou Aminou Abdou, Yaya Hadam Sibama, René Penda, Guillermo Ros Brull, Martin Abossolo, Eva Ávila Martín, Robert Okale, Blaise Ango Ze, Ananda Moreno Carrión, Cristina García Sebastián, Cristina Ruiz de Loizaga García, Francisco López-Romero Salazar, Hissein Amazia, Idoia Álvarez Reyes, Rafaela Sánchez Expósito, John E. Fa, Understanding Growth and Malnutrition in Baka Pygmy Children, Human Ecology, 10.1007/s10745-020-00161-5, (2020).
- Lei Yan, Lihua Xiong, Qinghua Luan, Cong Jiang, Kunxia Yu, Chong-Yu Xu, On the Applicability of the Expected Waiting Time Method in Nonstationary Flood Design, Water Resources Management, 10.1007/s11269-020-02581-w, (2020).
- Almond Stöcker, Sarah Brockhaus, Sophia Anna Schaffer, Benedikt von Bronk, Madeleine Opitz, Sonja Greven, Boosting functional response models for location, scale and shape with an application to bacterial competition, Statistical Modelling, 10.1177/1471082X20917586, (1471082X2091758), (2020).
- Kyriakos Martakis, Christina Stark, Mirko Rehberg, Oliver Semler, Ibrahim Duran, Eckhard Schoenau, Reference Centiles to Monitor the 6-minute-walk Test in Ambulant Children with Cerebral Palsy and Identification of Effects after Rehabilitation Utilizing Whole-body Vibration, Developmental Neurorehabilitation, 10.1080/17518423.2020.1770891, (1-11), (2020).
- Olena Ivanova, Celso Khosa, Abhishek Bakuli, Nilesh Bhatt, Isabel Massango, Ilesh Jani, Elmar Saathoff, Michael Hoelscher, Andrea Rachow, Lung Function Testing and Prediction Equations in Adult Population from Maputo, Mozambique, International Journal of Environmental Research and Public Health, 10.3390/ijerph17124535, 17, 12, (4535), (2020).
- Hendrik van der Wurp, Andreas Groll, Thomas Kneib, Giampiero Marra, Rosalba Radice, Generalised joint regression for count data: a penalty extension for competitive settings, Statistics and Computing, 10.1007/s11222-020-09953-7, (2020).
- Manuel Oviedo-de La Fuente, Celestino Ordóñez, Javier Roca-Pardiñas, Functional Location-Scale Model to Forecast Bivariate Pollution Episodes, Mathematics, 10.3390/math8060941, 8, 6, (941), (2020).
- Tim Richter-Heitmann, Benjamin Hofner, Franz-Sebastian Krah, Johannes Sikorski, Pia K. Wüst, Boyke Bunk, Sixing Huang, Kathleen M. Regan, Doreen Berner, Runa S. Boeddinghaus, Sven Marhan, Daniel Prati, Ellen Kandeler, Jörg Overmann, Michael W. Friedrich, Stochastic Dispersal Rather Than Deterministic Selection Explains the Spatio-Temporal Distribution of Soil Bacteria in a Temperate Grassland, Frontiers in Microbiology, 10.3389/fmicb.2020.01391, 11, (2020).
- See more




