Volume 56, Issue 5
Free Access

Analysis of longitudinal data with drop‐out: objectives, assumptions and a proposal

Peter Diggle

Lancaster University, UK, and Johns Hopkins University School of Public Health, Baltimore, USA

Search for more papers by this author
First published: 27 September 2007
Citations: 30
Daniel Farewell, Department of Epidemiology, Statistics and Public Health, Centre for Health Sciences Research, Cardiff University, Neuadd Meirionnydd, Heath Park, Cardiff, CF14 4YS, UK.
E‐mail: farewelld@cf.ac.uk

Abstract

Summary. The problem of analysing longitudinal data that are complicated by possibly informative drop‐out has received considerable attention in the statistical literature. Most researchers have concentrated on either methodology or application, but we begin this paper by arguing that more attention could be given to study objectives and to the relevant targets for inference. Next we summarize a variety of approaches that have been suggested for dealing with drop‐out. A long‐standing concern in this subject area is that all methods require untestable assumptions. We discuss circumstances in which we are willing to make such assumptions and we propose a new and computationally efficient modelling and analysis procedure for these situations. We assume a dynamic linear model for the expected increments of a constructed variable, under which subject‐specific random effects follow a martingale process in the absence of drop‐out. Informal diagnostic procedures to assess the tenability of the assumption are proposed. The paper is completed by simulations and a comparison of our method and several alternatives in the analysis of data from a trial into the treatment of schizophrenia, in which approximately 50% of recruited subjects dropped out before the final scheduled measurement time.

1. Introduction

Our concern in this paper is with longitudinal studies in which a real‐valued response Y is to be measured at a prespecified set of time points, and the target for inference is some version of the expectation of Y. Studies of this kind will typically include covariates X, which may be time constant or time varying. Frequently, the interpretation of the data is complicated by drop‐outs: subjects who are lost to follow‐up before completion of their intended sequence of measurements. The literature on the analysis of longitudinal data with drop‐outs is extensive: important early references include Laird (1988), Wu and Carroll (1988) and Little (1995), for which the Web of Science lists approximately 200, 170 and 300 citations respectively, up to the end of 2006.

A useful classification of drop‐out mechanisms is the hierarchy that was introduced by Rubin (1976) in the wider context of missing data. Drop‐out is missing completely at random (MCAR) if the probability that a subject drops out at any stage depends neither on their observed responses nor on the responses that would have been observed if they had not dropped out. Drop‐out is missing at random (MAR) if the probability of drop‐out may depend on observed responses but, given the observed responses, is conditionally independent of unobserved responses. Drop‐out is missing not at random (MNAR) if it is not MAR. Note that we interpret MCAR, MAR and MNAR only as properties of the joint distribution of random variables representing a sequence of responses Y and drop‐out indicators R; Little (1995) developed a finer classification by considering also whether drop‐out does or does not depend on covariates X. From the point of view of inference, the importance of Rubin's classification is that, in a specific sense that we discuss later in the paper, likelihood‐based inference for Y is valid under MAR, whereas other methods for inference, such as the original form of generalized estimating equations (Liang and Zeger, 1986), require MCAR for their validity. Note also that, if the distributional models for the responses Y and drop‐out indicators R include parameters in common, likelihood‐based inference under MAR is potentially inefficient; for this reason, the combination of MAR and separate parameterization is sometimes called ignorable, and either MNAR or MAR with parameters in common is sometimes called non‐ignorable or informative. The potential for confusion through different interpretations of these terms is discussed in a chain of correspondence by Ridout (1991), Shih (1992), Diggle (1993) and Heitjan (1994).

Our reasons for revisiting this topic are threefold. Firstly, we argue that in the presence of drop‐outs the inferential objective is often defined only vaguely. Though there are other possibilities, the most common target is the mean response, which we also adopt. However, many possible expectations are associated with Y: in Section 2 we contend that, in different applications, the target may be one of several unconditional or conditional expectations. We also argue that in all applications careful thought needs to be given to the purpose of the study and the analysis, with recognition that drop‐out leads to missing data but should not be considered solely as an indicator of missingness. The common notation Y=(Yobs,Ymiss) blurs this distinction. The complexity of some of the models and methods that are now available in the statistics literature may obscure the focus of a study and its precise objective under drop‐out. For this reason, we use as a vehicle for discussion the very simple setting of a longitudinal study with only two potential follow‐up times and one drop‐out mechanism. A second but connected issue is that the assumptions underlying some widely used methods of analysis are subtle; Section 3 provides a discussion of these assumptions and an overview of the development of some of the important methodology. We discuss what can and cannot be achieved in practice, again by using the two‐time‐point scenario for clarity. Our third purpose in this paper is to offer in Section 4 an approach that is based on dynamic linear models for the expected increments of the longitudinal process. The assumptions on which we base our models are easily stated and doubly weak: weak with respect to both longitudinal and drop‐out processes. None‐the‐less, all methods for dealing with missing data require, to some extent, untestable assumptions, and ours is no exception. However, we are willing to make such assumptions in the following circumstances. Firstly, the targets for inference are parameters of a hypothetical drop‐out‐free world that describes what would have happened if the drop‐out subjects had in fact continued. Secondly, any unexplained variability between subjects exhibits a certain stability before drop‐out. Thirdly, such stability is maintained beyond each drop‐out time by the diminishing subset of continuing subjects.

The first point is discussed in Section 2 and the ‘stability’ requirement of the next two points is defined formally in Section 4 as a martingale random‐effects structure. Section 4 also presents graphical diagnostics and an informal test procedure for critical assessment of this property. Our methods are quite general but for discussion purposes we return to the two‐time‐point scenario in Section 5, before demonstrating the methods through simulations in Section 6. Section 7 describes a comparative analysis of data from a trial into the treatment of schizophrenia. The paper closes with brief discussion in Section 8. Appendix A describes an implementation of our proposal in the S language.

Our topic can be regarded as a special case of a wider class of problems concerning the joint modelling of a longitudinal sequence of measured responses and times to events. Longitudinal data with drop‐out can formally be considered as joint modelling in which the time to event is the drop‐out time as, for example, in Henderson et al. (2000). In Section 7, we reanalyse the data from their clinical example to emphasize this commonality and to illustrate our new approach. For recent reviews of joint modelling, see Hogan et al. (2004) or Tsiatis and Davidian (2004).

Under our new approach, estimators are available in closed form and are easily interpretable. Further, estimation is computationally undemanding, as processing essentially involves a least squares fit of a linear model at each observation time. This is in contrast with many existing approaches to drop‐out prone data where, in our experience, the computational load of model fitting can be a genuine obstacle to practical implementation when the data have a complex structure and there is a need to explore a variety of candidate models.

2. Inferential objectives in the presence of drop‐out

As indicated in Section 1, we consider in this section a study involving a quantitative response variable Y, which can potentially be measured at two time points t=1,2 but will not be measured at t=2 for subjects who drop out of the study. We ignore covariate effects and focus on estimation of μt=E(Yt), though similar arguments apply to the full distributions of the response variables. We emphasize that this simple setting is used only to illustrate underlying concepts without unnecessary notational complication. The general thrust of the argument applies equally to more elaborate settings.

At time 1 the response is observed for all subjects, but at time 2 the response may be missing owing to drop‐out. Leaving aside for the moment the scientific purpose of the study and concentrating on statistical aspects, it is tempting to begin with the model
image(1)

The parameter μ1 is the population mean at time 1. Writing down model (1) invites a similar interpretation for μ2. In fact, the apparently straightforward adoption of model (1) brings with it some interesting but usually unstated or ignored issues.

For the moment we ignore context and consider four abstract random variables, which we shall call Y1,Y2a,Y2b and R, the last of which is binary. Our primary interest is in the expectations of the Y‐variables, and we write
image(2)

In expression (2), E(Z1)=E(Z2a)=E(Z2b)=0, 𝒮 denotes a set of conditioning variables and we allow π(·) to depend arbitrarily on 𝒮. We make no assumption of independence between Z1,Z2a and Z2b, and for the unconditional case 𝒮=Ø we write π=π(Ø)=P(R=0). By construction, the parameters μ1, μ2a and μ2b are the marginal expectations of Y1, Y2a and Y2b respectively.

In the context of longitudinal data with drop‐outs, subjects with R=1 are the completers, who are denoted group 𝒞. For each completer, Y1, Y2a and R are observed and have the obvious interpretations as the responses at times 1 and 2 together with an indicator of response, whereas Y2b is an unobserved counterfactual, representing the value of the response that would have been observed if the subject had in fact dropped out.

The drop‐outs, group 𝒟, are those subjects who have R=0. These subjects experience the event of dropping out of the study, which in different contexts may mean discontinuation of treatment, cessation of measurement or both. If drop‐out refers only to the discontinuation of treatment, then Y2b is the observed response at time 2, and Y2a the counterfactual that would have been observed if the subject had continued treatment. This situation, where drop‐out does not lead to cessation of measurement, is one which we discuss no further. Throughout the remainder of the paper, we are concerned with the case when R=0 does correspond to cessation of measurement, and consequently neither Y2a nor Y2b is observed for any subject in group 𝒟. In this case, Y2b is the extant, but unobserved, longitudinal response at time 2 and Y2a is the counterfactual that would have been observed if the subject in question had not dropped out.

In this framework we make explicit the possibility that the act of dropping out can influence the response, rather than simply lead to data being missing. In other words, we separate the consequence of dropping out from the observation of that consequence. At least conceptually, the events ‘avoiding drop‐out’ and ‘observing Y2a’ are considered to be distinct.

The above is reminiscent of the usual framework for causal inference, as described for instance by Rubin (1991, 2004), in which R would be a binary treatment assignment or other intervention indicator. However, there are three important differences. The most obvious is that with drop‐out we never observe Y2b, whereas in causal inference it would be observed for each subject in group 𝒟. The second difference is that, assuming no initial selection effect, in the longitudinal setting we observe Y1 for all subjects, and this can be exploited in inference through assumed or estimated relationships between responses before and after drop‐out. The third difference is that we assume R to be intrinsic to the subject rather than an assigned quantity such as treatment, and between‐subject independence is sufficient for us to avoid the need to discuss assignment mechanisms.

In particular applications we need to consider the scientific objective of the study and consequent target for inference. At time t=1 we can easily estimate μ1=E(Y1) by standard techniques. Our focus will be the target for estimation at time t=2, which we assume can be expressed as some property of a random variable Y2, typically E(Y2). We discuss this within the specific setting of model (2).

2.1. Objective 1: realized second response

The first possible target for inference that we discuss is the realized, non‐counterfactual, second response
image(3)
which is unobserved for subjects in group 𝒟. Further progress will therefore depend on the strong and untestable assumption that Y2a=Y2b. This assumption seems to be implicit in most published work and may be reasonable in circumstances where drop‐out is deemed to have no material effect on the measurement other than causing it to be missing. Applied uncritically, however, this can result in misleading inference about Y2. For example, drop‐out might be because of death, in which case Y2b could be assigned an arbitrary value such as 0 and the definition of Y2 above is, for practical purposes, meaningless.

In contrast, the data that we analyse in Section 7 come from a longitudinal randomized clinical trial of drug treatments for schizophrenia, in which drop‐out implies discontinuation of the assigned drug and the response could have been (but in fact was not) measured after drop‐out. In this setting, Y2 as defined at expression (3) is readily interpretable as the intention‐to‐treat response.

2.2. Objective 2: conditional second response

A second possible target for inference is the response at time t=2 conditional on not dropping out, or equivalently
image

Only complete cases, group 𝒞, contribute to inference, which is therefore always conditional on R=1. This is perfectly proper if the objective is to study the response within the subpopulation of subjects who do not drop out.

In the schizophrenia example, some subjects were removed from the study because their condition did not improve. Objective 2 would therefore be appropriate in this context if interest were confined to the subset of subjects who had not yet been removed from the study owing to inadequate response to treatment.

2.3. Objective 3: hypothetical second response

Our third potential target for inference, again unobserved for group 𝒟 subjects, is
image
which is appropriate if scientific interest lies in the (possibly hypothetical) response distribution of a drop‐out‐free population. We note that this is analogous to the usual estimand in event history analysis, with drop‐out equivalent to censoring. The assumption Y2a=Y2b makes objectives 1 and 3 equivalent.

The essential difference between the interpretations of Y2 under objectives 2 and 3 is between the marginal and conditional distributions of the response at time 2. This can be substantial, as would be the case if, for example, drop‐out occurs if and only if Z2a<0. This might seem an extreme example, but it could never be identified from the observed data.

It is important that the objectives be clearly stated and understood at the outset of a study, especially for regulatory purposes. There are similarities with distinguishing intention‐to‐treat and per‐protocol analyses (Sommer and Zeger, 1991; Angrist et al., 1996; Little and Yau, 1996; Frangakis and Rubin, 1999) and with causal inference in the presence of missing data or non‐compliance quite generally (Robins, 1998; Peng et al., 2004; Robins and Rotnitzky, 2004). The hypothetical second response Y2a will be our inferential target for the analysis that we present in Section 7 for the schizophrenia data. We argue that in this setting, where drop‐out need not be related to an adverse event, clinical interest genuinely lies in the hypothetical response that patients would have produced if they had not dropped out. This is likely to be of greater value than the realized or conditional second responses, since treatment performance is of more concern than subject profiles. We emphasize, however, that this need not always be so, and that in some circumstances a combination of objectives may be appropriate. For example, Dufoil et al. (2004) and Kurland and Heagerty (2005) separately discussed applications in which there are two causes of drop‐out: death and possibly informative loss to follow‐up. In these applications the appropriate target for inference is the response distribution in the hypothetical absence of loss to follow‐up but conditional on not dying, thus combining objectives 2 and 3. In other applications it is quite possible that a combination of all three objectives may be appropriate.

3. Approaches to the analysis of longitudinal data with drop‐out

We now illustrate in the context of model (2) some of the variety of approaches that have been proposed for the analysis of longitudinal data with drop‐out. We do not attempt a complete review (see Hogan and Laird (1997a,b), Little (1998), Hogan et al. (2004), Tsiatis and Davidian (2004) or Davidian et al. (2005)) but hope to give a flavour of the broad classes of methods and their underlying assumptions.

3.1. Complete case

Complete‐case analysis is probably the simplest approach to dealing with drop‐outs, as we simply ignore all non‐completers. As discussed earlier, this is appropriate for objective 2, or in more formal language when our interest lies in the conditional distribution [Y1,Y2a|R=1]. The relevant estimator within model (2) is
image
which estimates
image

3.2. Pattern–mixture

A complete‐case analysis forms one component of a pattern–mixture approach (Little, 1993), in which we formulate a separate submodel for each of [Y1|R=0] and [Y1,Y2a|R=1], perhaps with shared parameters. From this, we can obtain valid inference for the marginal [Y1] by averaging, but again only conditional inference for [Y2a|R=1], as with complete‐case analysis. The pattern–mixture approach is intuitively appealing from the perspective of retrospective data analysis, in which context it is natural to compare response distributions in subgroups that are defined by different drop‐out times. From a modelling perspective it is also natural if we regard the distribution of R as being determined by latent characteristics of the individual subjects. In its most general form, the pattern–mixture approach is less natural if we regard drop‐out as a consequence of a subject's response history, because it allows conditioning on the future. However, Kenward et al. (2003) discussed the construction of pattern–mixture specifications that avoid dependence on future responses.

3.3. Imputation methods

Imputation methods implictly focus on objective 3, sometimes adding the assumption that Y2a=Y2b, in which case objectives 1 and 3 are equivalent.

3.3.1. Last observation carried forward

The last observation carried forward (LOCF) method imputes Y2a by Y1 for each subject in group 𝒟. Writing inline image, the implied estimator for the mean response at time 2 is inline image, where inline image is the mean at time 1 for group 𝒟. The estimator is consistent for
image
and hence is not obviously useful. The LOCF method is temptingly simple and is widely used in pharmaceutical trials, but it has attracted justifiable criticism (Molenberghs et al., 2004).

3.3.2. Last residual carried forward

A variant of the LOCF method would be to carry forward a suitably defined residual. Suppose, for example, that we define
image
The implicit estimator is then
image(4)
which is consistent for μ2a+E(Z2a|R=1)−(1−π) E(Z1|R=1). Typically, if completers were high responders at time 1, then we might expect the same to apply at time 2, and vice versa. The variables Z1 and Z2a would then have the same sign. The expectation of inline image will be closer to μ2a than the expectation of inline image, which is a desirable shift from the complete‐case estimand if μ2a is the target for inference.

For these reasons the last residual carried forward method must be preferable to the LOCF approach as a means of overcoming potentially informative drop‐out, but in our opinion it does not provide an adequate solution to the problem. We describe it here principally to highlight two important points. Firstly, the unspoken question underlying the estimator (4) is ‘how unusual were the completers at time 1?’. If they were unusual, then we presume that this may also have been true at time 2, and consequently adjust the observed time 2 average accordingly. Second, this adjustment is downweighted by a factor inline image. We observe, anticipating results in Section 4, that in our hypothetical drop‐out‐free universe π=0, suggesting the estimator inline image as another candidate.

3.3.3. Multiple imputation

One of several possible criticisms of both the LOCF and the last residual carried forward methods is that, at best, they ignore random variation by imputing fixed values. Hot deck imputation addresses this by sampling post‐drop‐out values from a distribution; in principle, this could be done either by sampling from an empirical distribution, such as that of the observed values from other subjects who did not drop out but had similar values of available explanatory variables, or by simulating from a distributional model. Multiple‐imputation methods (Rubin, 1987) take this process one step further, by replicating the imputation procedure to enable estimation of, and if necessary adjustment for, the component of variation that is induced by the imputation procedure.

3.4. Missing at random: parametric modelling

Any assumed parametric form for the joint distribution [Y1,Y2a,R] cannot be validated empirically, because we can check only the marginal [Y1] and conditional [Y1,Y2a|R=1] distributions. The assumption that drop‐out is MAR is useful because it allows one part of the joint distribution to remain unspecified. This assumption assumes that the probability of drop‐out does not depend on the outcome at time 2 given the value at time 1, whence π(Y1,Y2a,Y2b) simplifies to π(Y1). In general this assumption is untestable, but if we combine it with a parametric model for [Y1,Y2a] we obtain the beguiling result that likelihood inference is possible without any need to model π(Y1). The likelihood contribution in the 𝒞 group is
image
whereas in the 𝒟 group it is just π(Y1)[Y1]. The combined likelihood is thus L=LR|YLY, where
image

The factorization [Y,R]=[R|Y][Y] is usually called a selection model (e.g. Michiels et al. (1999)), although we prefer the term selection factorization, to contrast with the pattern–mixture factorization [Y,R]=[Y|R][R], and to emphasize the distinction between how we choose to model the data and how we subsequently conduct data analysis.

As an illustration, suppose that (Z1,Z2a)′ is distributed as N(0,σ2V), with
image(5)
Then the maximum likelihood estimator of μ2a under MAR drop‐out is
image(6)
which again adjusts the observed time 2 sample mean according to how unusual the fully observed group were at time 1, with shrinkage. Once more we call attention to this estimator, and note an interpretation of the estimator inline image as being appropriate when within‐subject variability is small (ρ→1).

Parametric modelling under the combined assumption of MAR drop‐out and separate parameterization has the obvious attraction that a potentially awkward problem can be ignored and likelihood‐based inference using standard software is straightforward. A practical concern with this approach is that the ignorability assumption is untestable without additional assumptions. A more philosophical concern arises if, as is usually so, the data derive from discrete time observation of an underlying continuous time process. In these circumstances, it is difficult to imagine any mechanism, other than adminstrative censoring, under which drop‐out at time t could depend on the observed response at time t−1 but not additionally on the unobserved response trajectory between t−1 and t.

3.5. Missing at random: unbiased estimating equations

If interest is confined to estimating μ2a, or more generally covariate effects on the mean, then an alternative approach, which is still within the framework of MAR drop‐out, is to model π(Y1) but to leave [Y1,Y2a] unspecified.

Under MAR drop‐out we can estimate the probability of drop‐out consistently from the observed data: we need only R and Y1 for each subject, both of which are always available. This leads to an estimate inline image of drop‐out probability, often via a logistic model. The marginal mean of Y2a can now be estimated consistently by using a weighted average of the observed Y2a, where the weights are the inverse probabilities of observation (Horvitz and Thompson, 1952; Robins et al., 1995):
image(7)

Use of equation (7) requires inline image to be strictly positive for all subjects, and it encounters difficulties in practice if this probability can be close to 0. This will not often be a material restriction within the current simplified setting, but it can be problematic in more complex study designs with high probabilities of drop‐out in some subgroups of subjects.

3.6. Missing not at random: Diggle–Kenward model

Diggle and Kenward (1994) discussed a parametric approach to the problem of analysing longitudinal data with drop‐outs, based on a selection factorization. In the special case of model (2), the Diggle and Kenward model reduces to (Z1,Z2)′∼N(0,σ2V) with V as in equation (5), and
image(8)
with the tacit assumption that Y2=Y2a=Y2b. Drop‐out is MAR if γ0=0 and MCAR if γ0=γ1=0. The model therefore maps directly onto Rubin's hierarchy, and in particular missingness at random is a parametrically testable special case of a missingness not at random model. Although the likelihood does not separate in the same way as under parametric missingness at random, likelihood inference is still possible by replacing π with its conditional expectation, which is derived from the conditional distribution of Y2 given Y1. The price that is paid for this facility is that correct inference now depends on two untestable modelling assumptions, the normal distribution model for (Y1,Y2) and the logistic model for drop‐out (Kenward, 1998). There is no closed form for the estimator of μ2a.

3.7. Missing not at random: random effects

Under the Diggle and Kenward model the probability of drop‐out is directly determined by the responses Y1 and Y2, again assuming that Y2a=Y2b. If measurement error contributes substantially to the distribution of Y, a random‐effects model may be more appealing. In this approach, the usual modelling assumption is that Y and R are conditionally independent given shared, or more generally dependent, random effects. See, for example, Wu and Carroll (1988), Little (1995), Berzuini and Larizza (1996), Wulfsohn and Tsiatis (1997), Henderson et al. (2000) and Xu and Zeger (2001). A simple model for our simple example is
image
with independence between U, ɛ1 and ɛ2. Models of this type are in general MNAR models, because random effects are always unobserved and typically influence the distribution of Y at all time points. It follows that the conditional distribution of the random effects, and hence the probability of drop‐out given Y, depends on the values of Y at all time points, and in particular on values that would have been observed if the subject had not dropped out.

For maximum likelihood estimation for the simple model above, the shared effect U can be treated as missing data and methods such as the EM or Markov chain Monte Carlo algorithms used, or the marginal likelihood can be obtained by numerical integration over U, and the resulting likelihood maximized directly. Implementation is computationally intensive, even for this simple example, and there is again no closed form for inline image.

Models of this kind are conceptually attractive, and parameters are identifiable without any further assumptions. But, as with the Diggle–Kenward model, the associated inferences rely on distributional assumptions which are generally untestable. Furthermore, in our experience the computational demands can try the patience of the statistician.

3.8. Missing not at random: unbiased estimating equations

A random‐effects approach to joint modelling brings yet more untestable assumptions and we can never be sure that our model is correct for the unobserved data, although careful diagnostics can rule out models that do not even fit the observed data (Dobson and Henderson, 2003). Rotnitzky et al. (1998), in a follow‐up to Robins et al. (1995), argued strongly for a more robust approach, on the assumption that the targets for inference involve only mean parameters. They again left the joint distribution of responses unspecified but now modelled the drop‐out probability as a function of both Y1 and Y2a, e.g. by the logistic model (8). As applied within the simple framework of model (2), the most straightforward version of the procedure of Rotnitzky et al. (1998) is two stage: first, estimate the drop‐out parameters from an unbiased estimating equation; second, plug drop‐out probability estimates into another estimating equation.

For example, the drop‐out parameters α, γ0 and γ1 in equation (8) might be estimated by solving
image(9)
where φ(Y1) is a user‐defined vector‐valued function of Y1. As there are three unknowns in our example, φ(Y1) needs to be three dimensional, such as inline image. Since we need only π(Y1,Y2a) in the fully observed group, all components of equation (9) are available, and for estimation there is no need for assumptions about Y2b. Assumptions would, however, be needed for estimands to be interpretable. Rewriting equation (9) as
image
it is easy to see that the equation is unbiased by taking conditional expectations of the indicator functions given (Y1,Y2a).

At the second stage, the newly obtained estimated drop‐out probabilities are plugged into an inverse‐probability‐weighted estimating equation to give

image

Rotnitzky et al. (1998) indicated that efficiency can be improved by augmenting the estimating equation for μ2a by a version of equation (9) (with a different φ) and simultaneously solving both equations for all parameters. Fixed weight functions may also be introduced as usual. They also argued that estimation of the informative drop‐out parameter γ0 will be at best difficult and that the validity of the drop‐out model cannot be checked if γ0≠0. Their suggestion is that γ0 be treated as a known constant but then varied over a range of plausible values to assess sensitivity of inferences for other parameters to the assumed value of γ0.

Carpenter et al. (2006) compared inverse probability weighting (IPW) methods with multiple imputation. In particular, they considered a doubly robust version of IPW, which was introduced by Scharfstein et al. (1999) in their rejoinder to the discussion and which gives consistent estimation for the marginal mean of Y2a provided that at most one of the models for R or for Y2a is misspecified. Their results show that doubly robust IPW outperforms the simpler version of IPW when the model for R is misspecified, and outperforms multiple imputation when the model for Y2a is misspecified.

3.9. Sensitivity analysis

Rotnitzky et al. (1998) are not the only researchers to suggest sensitivity analysis in this context. Other contributions include Copas and Li (1997), Scharfstein et al. (1999, 2003), Kenward (1998), Rotnitzky et al. (2001), Verbeke et al. (2001), Troxel et al. (2004), Copas and Eguchi (2005) and Ma et al. (2005).

Sensitivity analysis with respect to a parameter that is difficult to estimate is clearly a sensible strategy and works best when the sensitivity parameter is readily interpretable in the sense that a subject‐matter expert can set bounds on its reasonable range; see, for example, Scharfstein et al. (2003). In that case, if the substantively important inferences show no essential change within the reasonable range, all is well. Otherwise, there is some residual ambiguity of interpretation.

Most parametric approaches can also be implemented within a Bayesian paradigm. An alternative to a sensitivity analysis is then a Bayesian analysis with a suitably informative prior for γ0.

3.10. Conclusions

Existing approaches to the analysis of longitudinal data subject to drop‐out may, if only implicitly, be addressing different scientific or inferential objectives. In part this may be because methods and terminology that are designed for general multivariate problems with missing data do not explicitly acknowledge the evolution over time of longitudinal data. In the next section we offer an alternative, which we believe is better suited to the longitudinal set‐up and which borrows heavily from event history methodology. We consider processes evolving in time and propose a martingale random‐effects model for the longitudinal responses, combined with a drop‐out mechanism that is allowed to depend on both observed and unobserved history, but not on the future. The martingale assumption formalizes the idea that adjusting for missing data is a defensible strategy provided that subjects’ longitudinal response trajectories exhibit stability over time. Our drop‐out model is formally equivalent to the independent censoring assumption that is common in event history analysis; see, for example, Andersen et al. (1992). We do not claim that the model proposed is universally appropriate nor suggest that it be adopted uncritically in any application. We do, however, offer some informal diagnostic procedures that can be used to assess the validity of our assumptions.

4. Proposal

4.1. Model specification

4.1.1. Longitudinal model

We suppose that τ measurements are planned on each of n independent subjects. The measurements are to be balanced, i.e. the intended observation times are identical for each subject, and without loss of generality we label these times 1,…,τ. For the time being, let us suppose that all n subjects do indeed provide τ measurements. In the notation of Section 2, Ya is therefore observed for every subject at every observation time, and Yb is counterfactual in every case.

We presume that covariates are also available before each of the τ observation times. These we label Xa, noting that in theory there are also counterfactual covariates Xb: the values of covariates if a subject had dropped out. We understand Xa to be an n×p matrix process, which is constant if only base‐line covariates are to be used, but potentially time varying and possibly even dependent on the history of a subject or subjects. Note that we shall write Xa(t) for the particular values at time t, but that by Xa without an argument we mean the entire process, and we shall follow this same convention for other processes.

At each observation time t we acknowledge that the underlying hypothetical response may be measured with zero‐mean error ɛa(t). We assume that this process is independent of all others and has the property that ɛa(s) and ɛa(t) are independent unless s=t. We make no further assumptions about this error process, and in particular we do not insist that its variance is constant over time.

We denote the history of the hypothetical response processes Ya, the potentially counterfactual covariates Xa and the measurement error process ɛa, up to and including time t, by
image
We are not particularly interested in how the covariates Xa(t) are obtained, but for estimation we shall require that they become known at some point before time t: possibly this is at time t−1, or at time 0 for base‐line covariates. It is useful to formalize this requirement by way of the history
image
which can be thought of as all information pertaining to Xa, Ya and ɛa that is available strictly before time t. Since rGt contains information about exogenous covariates and measured responses, functions of either or both may be included in the matrix Xa, allowing considerable flexibility in the specification of a model.

We argue that the expected increments in Ya are a natural choice for statistical modelling. Asking ‘What happened next?’ allows us to condition on available information such as the current values of covariates and responses. Later, it will also be useful to condition on the presence or absence of subjects.

For convenience, we set Xai(0)=Yai(0)=ɛai(0)=0 for all i, adopting the notation of continuous time processes to avoid complicated subscripts. It is possible to specify a mean model for the hypothetical response vector Ya=(Ya1,…,Yan)′ in terms of the discrete time local characteristics
image
of the process (Aalen, 1987). The local characteristics capture the extent to which the vector process Ya is expected to change before the next observations are recorded. Local characteristics are a generalization of the intensity of a counting process. It is often possible to specify the local characteristics in terms of linear models, and in this paper we consider models of the form
image(10)
for t=1,…,τ. Setting aside for one moment the issue of measurement error, we have a linear (also referred to as additive) model Xa(t) β(t) for the expected increment EYa(t)|rGt}. Linear models on the increments of a process were proposed in the counting process literature by Aalen (1978), and more recently by Fosen et al. (2006b) for a wider class of stochastic processes. Since a different model is specified at each time, linear models on increments can be quite general and may incorporate random intercepts, random slopes and other, more complicated, structures. We denote by β the deterministic p‐vector of regression functions representing the effects on the local characteristics of the covariates Xa. Recall once again that β represents the hypothetical effects of covariates, assuming that drop‐out does not occur. Since β is an unspecified function of time, equation (10) can be thought of as a kind of varying‐coefficient model (Hastie and Tibshirani, 1993). This type of approach for longitudinal data has been taken by others: see for example Lin and Ying (2001, 2003) or Martinussen and Scheike (2000) and Martinussen and Scheike (2006), chapter 11. The crucial distinction between their work and ours is that it is the increments, not the measured responses, that are the subject of our linear model. We then accommodate measurement error by noting that, before time t, no information is available about ɛa(t), so the expected change in measurement error is simply −ɛa(t−1), which is known through rGt.
Incremental models correspond, on the cumulative scale, to models where the residuals form a kind of random walk, which can be thought of as additional random effects. To see this, the notion of a transform from the theory of discrete stochastic processes is required. Defining the cumulative regression functions B(t) by inline image, with B(0)=0, the transform of B by Xa, denoted Xa·B, is given by
image
and forms part of the compensator, or predictable component, of Ya. Note that Xa·B differs from the ordinary matrix product XaB and is the discrete time analogue of a stochastic integral. The transform thus captures the cumulative consequences of covariates Xa and their effects β, both of which may vary over time.

The residual process is Ma=YaXa·Bɛa. This process has a property that makes it a kind of random walk: it takes zero‐mean steps from a current value to a future value. More formally, for s  t we have that E{Ma(t)|rGs}=Ma(s), and the process is thus a martingale. Model (10) may therefore be appropriate when, having accounted for fixed effects and measurement error, the random effects can be modelled as a martingale.

Although their conditional mean properties may seem restrictive, martingales represent, from the modeller's perspective, a wide range of processes. Neither continuity nor distributional symmetry is required of Ma, and for our purposes its variance need only be constrained to be finite. Further, the variance of the martingale increments may change over time. Serial correlation in the Ma‐process induces the same in the Ya‐process, which is often a desirable property in models for longitudinal data.

The linear increments model is, on the cumulative scale, a random‐effects model for Ya of the form
image

The sample vector of martingale random effects is free to be, among other things, heteroscedastic, where the variance of a martingale may change over time and between subjects, and completely non‐parametric, since the distribution of a martingale need not be specified by a finite dimensional parameter. We reiterate, however, that martingale residuals impose a condition on the mean of their distribution given their past. This single condition, of unbiased estimation of the future by the past, is sufficiently strong to be easily dismissed in many application areas—though we note that this can often be overcome by suitable adjustment of the linear model. It seems to us that in many applications an underlying martingale structure seems credible, at least as a first approximation. We reiterate that the linear model may be adapted to include summaries of previous longitudinal responses if appropriate. Including dynamic covariates, e.g. summaries of the subject trajectories to date, may sometimes render the martingale hypothesis more tenable, although the interpretation of the resulting model is problematic if observed trajectories are measured with appreciable error.

We have shown that models for the hypothetical response Ya can be defined in terms of linear models on its increments, and that such models are quite general. At no extra cost, these comprise subject‐specific, martingale random effects. We do not discuss in detail the full generality of this approach; instead, we now turn to the problem of drop‐out.

4.1.2. Drop‐out model

Unfortunately, not all the hypothetical longitudinal responses Ya are observed. Rather, subject i gives rise to 1 Ti  τ measurements, i.e. we observe Yai(1),…,Yai(Ti). Although both the hypothetical responses Yai(Ti+1),…,Yai(τ) and the realized responses Ybi(Ti+1),…,Ybi(τ) go unobserved, we restrict our assumptions to the former.

We can also consider drop‐out as a dynamic process. Let Ri denote an indicator process that is associated with subject i, with Ri(t)=1 if subject i is still under observation at time t, and Ri(t)=0 otherwise. We let rRt be the history of these indicator processes up to time t. We do not distinguish between competing types of drop‐out, for instance between administrative censoring, treatment failure or death, because we need not do so to make inferences regarding the hypothetical responses Ya.

Like the covariate processes, we assume that the drop‐out processes are predictable, in the sense that Ri(t) is known strictly before time t. More formally, we shall denote by rRt the information that is available about drop‐out before time t, and assume that Ri(t) ∈ rRt. Although in this instance it follows that rRt=rRt, it is useful to distinguish notationally between information that is available at these different points in time. We think of Ri as a process in continuous time, but in practice we are only interested in its values at discrete time points. Predictability is a sensible philosophical assumption, disallowing the possibility that drop‐out can be determined by some future, unrealized, event. Note that this does not preclude the possibility that future events might depend on past drop‐out.

The second important requirement that we impose on the processes Ri is that of independent censoring. This terminology, though standard in event history analysis, suggests more restrictions than are in fact implied. We give the formal definition and then discuss its implications for drop‐out in longitudinal studies. Recall that rRt is the history of the drop‐out process before time t. Censoring (or drop‐out) is said to be independent of the hypothetical response processes Ya if, and only if,
image
(Andersen et al. (1993), page 139). Independent censorship says that the local characteristics of Ya are unchanged by additional information about who has been censored already, or by knowledge of who will, or will not, be observed at the next point in time. Fundamentally, this assumption ensures that the observed increments remain representative of the original sample of subjects, if drop‐out had not occurred. This requirement is similar in spirit to the sequential version of MAR drop‐out (Hogan et al. (2004), after Robins et al. (1995)), which states that
image

We emphasize that independent censoring is a weaker assumption than sequential MAR drop‐out, since the former conditions on the complete past, and not just the observed past, and so allows drop‐out to depend directly on latent processes. Moreover, it is a statement about conditional means, whereas the assumption of sequential missingness at random concerns conditional distributions.

Having laid out our assumptions concerning the drop‐out process, we make a few comments on what has not been assumed. We have not specified any model, parametric or otherwise, for the drop‐out process. Consequently, the drop‐out process may depend on any aspect of the longitudinal processes, e.g. group means, subject‐specific time trends or within‐subject instability. The only requirement is that this dependence is not on the future behaviour of Ya. Though often plausible, this is usually untestable.

4.1.3. Combined model

As we have already discussed, our target for inference will be the hypothetical effects of covariates supposing, contrary to fact, that subjects did not drop out of observation. More explicitly, we seek to make inference about β in the local characteristics model,
image
for the hypothetical response Ya, drawing on the Ti observed covariates Xai(1),…,Xai(Ti) and responses Yai(1),…,Yai(Ti) for every i.
Recall that Ri is an indicator process, 1 if subject i is still under observation. We shall write
image
for the diagonal matrix with the Ri(t) along the diagonal. We claim that the processes R, X=RXa and Y=R·Ya are all fully observed. Clearly, R is observed; RXa (the ordinary matrix product of these processes) is observed since, whenever Xa is unobserved, R=0. Recall that R·Ya is the transform of Ya by R, and is defined by
image
So R·Ya is the process Ya whose individual elements are stopped, i.e. held constant, after the time Ti of their last observations. Hence this process, also, is observable. We denote the history of the observed data X, Y and R as
image
and define rFt=rFt−1∪{X(t),R(t)}. The following model is induced for the observed longitudinal responses Y:
image(11)
where ɛ=R·ɛa. This equality may be derived directly from the linear model for the local characteristics of Ya, the fact that R is predictable and the independent censoring assumption. The key point is that the same parameters β appear in the local characteristics of both Y and Ya, and hence are estimable from observed data. These parameters represent the effects of covariates on the expected change in hypothetical longitudinal response at a given time and so will often have scientific relevance. In Section 4.2 we demonstrate how to estimate these parameters.

4.2. Model fitting

4.2.1. Estimation

To estimate β=(β1,…,βp)′ we seek a matrix‐valued process X having the property that XX=I. However, owing to drop‐out such a process does not always exist. Let rT={t: det {X′(t) (t)}≠0}, the set of times t at which the matrix X′(t) (t) is invertible. This rT is a random set over which estimation may be reasonably undertaken, often an interval whose upper end point is reached only when very few subjects remain under observation. On rT the matrix {X′(t) (t)}−1X′(t) exists, making the process X given by
image
well defined. So on rT our estimate
image
of β(t) is just the ordinary least squares (OLS) estimate of this parameter, based on all available increments. Outside rT we simply have inline image. This leads to the estimator inline image of B that is given by
image(12)

Thus we set inline image, the transform of Y by X. So defined, inline image is an estimator of B on rT; specifically, it estimates BrT=1rT·B, and there may be some small bias in estimating B. Estimation of BrT is reasonable in the present context of varying sample sizes and covariates, and is, in fact, all that can be expected of a non‐parametric technique. Without parametric interpolation, there may be time points about which the data can say nothing.

This estimator is again due to Aalen (1989) in the setting of event history analysis, and to Fosen et al. (2006b) for more general continuous time processes. It is straightforward to show that inline image is unbiased for 1rT(t) β(t):
image

Therefore, inline image is unbiased for BrT. What we have done is to mimic Aalen's unbiased estimator, and to show that measurement error does not affect this unbiasedness.

The estimator inline image is essentially a moment‐based estimator of B. It sums the least squares estimates of β based on the observed increments. Crucially, nowhere do we require Y and R to be independent. We rely on an assumption that hypothetical random effects are martingales, and if this assumption breaks down then so does unbiasedness. Each surviving subject is thought to have a mean 0 step in their random effects; non‐zero expected increments in the random effects cannot be distinguished from a change in population mean.

4.2.2. Inference

Inference is discussed in Farewell (2006). Estimators of the finite sample and asymptotic variances of inline image are not so readily derived as in the corresponding theory of event history analysis. Counting processes behave locally like Poisson processes (Andersen et al., 1992), having equal mean and variance, but this result does not hold in generality. Moreover, error ɛa in the measurement of the hypothetical variable leads to negatively correlated increments in inline image and results in a complex pattern of variability. However, computing time occupied by parameter estimation is negligible, so we recommend the use of the bootstrap for inference about B. Farewell (2006) provides a result that inline image is √n consistent for B with a Gaussian limiting distribution. He also gives an approximation that, in the absence of measurement error, justifies a simple calculation using OLS regression, as outlined in Appendix A. In the application to follow, we use the bootstrap distribution for inline image.

4.3. Diagnostics

Most diagnostic tools are based in some way on the estimated residuals from a fitted model. In the current setting the residuals are Z=M+ɛ and may be estimated by
image
where H=XX is the hat matrix of OLS and YrT=1rT·Y. Standard residual plots, e.g. of inline image against fitted values or covariates, should reveal systematic misspecifications of the model for the mean response but need not show the usual random scatter since we do not assume homogeneity of variances, either between or within subjects.

One simple diagnostic that is tailored to the martingale assumption is a scatterplot of increments in the residuals, inline image, against inline image. In the absence of measurement error, a plot of this kind should show no relationship. Substantial measurement error would induce a negative association, in which case the fit would be improved by including inline image as a covariate at time t.

We also propose two new diagnostic tools, as follows. The first is a graphical check of the martingale structure of the random effects and exploits the fact that, for t>1,
image(13)

This result is easily proved, since martingales have uncorrelated increments and the errors ɛ are mutually independent. The point about equation (13) is that the empirical version of the left‐hand side can be evaluated at each measurement time, whereas the expression on the right‐hand side shows that the corresponding theoretical quantity is constant over time. Hence, a plot of inline image against t has diagnostic value, with departures from a straight line with zero slope indicating unsuitability of model (11).

Clearly, similar plots can be derived based on the observation that
image
for all 1  s<t, where the above diagnostic corresponds to choosing s=1. What is less clear is how much additional information is provided by such plots, since the plots are closely related.
We supplement this covariance diagnostic plot with an informal test statistic. Writing inline image for the final value that is assumed by the process inline image, we have in particular that
image
Therefore inline image, and for large n the approximation
image(14)
holds. Large absolute values of this statistic constitute evidence against the martingale hypothesis. In practice, we use the bootstrap variance in place of its theoretical equivalent in the denominator.

4.4. Summarizing remarks

In summary, our model is
image
for t=1,…,τ. The observed data are R, X=RXa and Y=R·Ya. We assume that
image
and our estimator for B is
image

Appendix A illustrates how this can be implemented by using standard statistical software.

5. Simple example revisited

For further discussion we return to the simple two‐time‐point example that was used in Sections 2 and 3. Mixing the notation of the previous sections, our hypothetical longitudinal model can formally be expressed as
image
and the independent censoring assumption asserts that
image
Written using more traditional modelling notation, these assumptions are satisfied if
image(15)
image(16)
image(17)
and
image(18)
Under assumptions (15)–(18), our least squares estimator (12) is given by
image(19)
and is unbiased for μ2a.

Consider now the assumptions that lead to the unbiasedness of inline image. Equation (15) is unremarkable; equation (16) is for the possibly counterfactual drop‐out‐free response Y2a, as we have argued for objective 3. The zero‐mean assumptions in condition (17) are needed to give μ1 and μ2a interpretations as drop‐out‐free population means, which are the parameters of interest. Note, though, that we do not require M1 and M2a to be independent. Equation (18) provides our key assumption, that the subject‐specific random effects have zero‐mean increments, conditional on that subject's observed history. It is this assumption that we test with our diagnostic in Section 4.3. An untestable consequence of equation (18), taken together with condition (17), is that the subject‐specific random effects also have zero‐mean increments conditional on dropping out.

Equations (15)–(18) completely specify the model and it is perhaps worth restating what has not been assumed. There are no distributional statements about either the random effects or the measurement errors, and there is no assumption of identical distributions across subjects. There are no statements whatsoever about Y2b, what happens after drop‐out. Importantly, we have not made any further assumptions on the drop‐out probability π(·). This does not mean that π(·) is entirely unrestricted: condition (18) holds if, and only if,
image(20)
where Δ=M2aM1. Examples that satisfy the above condition include a random‐intercept model in which Δ=0, with any π(·), an independent censoring drop‐out model in which π(M1,Δ)=π(M1), with any Δ for which E(Δ|M1)=0, and any π(M1,Δ) that is an even function of Δ, taken together with any zero‐mean, symmetric distribution [Δ|M1].

None of these examples are missingness at random models, since in every case π(Y1,Y2a)≠π(Y1). Notwithstanding this comment, in the first two examples we have drop‐out probability depending only on the most recent random effect M1. In this sense our assumptions are similar to sequential missingness at random (Hogan et al., 2004), with the additional assumption of martingale random effects. Nevertheless, and as the third example illustrates, it is possible to construct a variety of models for which π(M1,Δ)≠π(M1) yet condition (20) remains true.

6. Simulations

We demonstrate the use of the covariance diagnostics in two simulation studies. Pitting a martingale random‐effects process against a popular non‐martingale alternative, we report the estimated power and type I error rates of the informal test (14) and illustrate the suggested covariance plots.

6.1. Scenario 1

The first simulation scenario mimics the schizophrenia example that is to be considered in Section 7, though with just one treatment group and so no covariates. Measurements are scheduled at weeks (w1,…,w6)=(0,1,2,4,6,8).

Let U0,U1,U2,… be independent zero‐mean Gaussian n‐vectors, which we use to construct two random‐effects processes. Put Sa(0)=Ma(0)=0, and for non‐negative t define
image

Then Sa is a random intercept and slope process, of the kind that was described by Laird and Ware (1982), whereas Ma is a martingale. We take inline image and inline image and choose the variances of the further values to ensure that V{Sa(t)}=V{Ma(t)}. This set‐up allows us to compare these two types of random‐effects process with, as far as is possible, all else being equal.

The responses are now defined as
image
with inline image, and independence between time points. The probabilities of drop‐out between times t and t+1 are logistic with linear predictors αt+γta(t) and αt+γta(t) for YS and YM respectively.
For each of n=125,250,500,1000 we took 1000 simulations from this model. We used μ1=…=μ6=0 and chose the other parameter values to correspond roughly to the schizophrenia data: inline image, inline image, inline image and
image

This led to about 50% drop‐out in each model, spread over time points 2–5, with only about 1% of subjects dropping out after just one observation. Each data set was analysed by using our linear increments (LI) approach, an IPW estimating equation approach and by fitting a multivariate normal distribution with unstructured within‐subject covariance matrix (method UMN). Both the IPW and the UMN methods included the incorrect assumption that drop‐out is MAR. For IPW we used response at time t−1 as covariate in a logistic model for drop‐out at time t. No drop‐out model is needed for UMN under MAR.

Table 1 summarizes results at n=500. There was severe downward bias in the observed mean values (OLS) for each of inline image and inline image and this is only partly corrected by the misspecified IPW or UMN methods. The LI fit to YM shows no bias, as expected, and confidence interval coverage is good. The observed mean bias was improved but not removed when our method is used on YS, unsurprisingly given that the model is then also misspecified. Usually such misspecification would be detected by the diagnostics. For example, box plots of the residual covariances (Fig. 1) suggest good diagnostic power for distinguishing the models and this is confirmed by the performance of the test statistic (14), for the variance of which we used 100 bootstrap samples for each data set (Table 2).

Table 1. Estimated mean responses and standard errors SE for scenario 1 using observed data without correction for drop‐out (OLS), with IPW or a multivariate normal model with unstructured covariance matrix (UMN), both of which falsely assume that drop‐out is MAR, and under the LI method†
Method Results for the following value of w:
0 1 2 4 6 8
Y M OLS Mean 0.00 −0.30 −2.75 −4.34 −10.61 −19.41
SE 0.77 0.78 0.77 0.91 1.32 1.89
IPW Mean −0.03 −0.03 −1.12 −2.25 −6.10 −13.17
SE 0.78 0.81 0.84 1.12 2.04 2.83
UMN Mean −0.02 −0.02 −0.53 −1.80 −6.00 −12.90
SE 0.77 0.77 0.85 0.91 1.44 1.83
LI Mean 0.00 −0.02 −0.02 0.01 0.05 0.02
SE 0.77 0.78 0.89 0.97 1.55 2.05
Cov (%) 96.4 94.1 95.2 94.3 94.8 94.6
Y S OLS Mean −0.01 0.26 −2.90 −5.08 −12.95 −22.38
SE 0.79 0.82 0.83 1.06 1.11 1.34
IPW Mean 0.01 −0.17 −1.25 −2.84 −8.06 −15.67
SE 0.79 0.82 0.97 1.16 1.68 1.83
UMN Mean 0.01 −0.15 −0.75 −2.38 −7.12 −13.45
SE 0.79 0.82 0.89 1.12 1.16 1.39
LI Mean −0.01 0.02 −0.16 −0.98 −3.61 −7.81
SE 0.79 0.82 0.93 1.20 1.18 1.44
Cov (%) 94.8 95.7 94.1 85.9 19.8 0.1
  • †The coverage Cov of nominal 95% confidence intervals under LI is also included. The sample size was n=500, and results were averaged over 1000 simulations.
image

Box plots of inline image based on 1000 simulations under scenario 1 at sample size n=500: (a) true martingale structure YM; (b) Laird–Ware random intercept and slope structure YS

Table 2. Estimated size and power of the diagnostic test, based on simulation results
Scenario Results for the following values of n:
125 250 500 1000
1 Power 0.307 0.530 0.766 0.980
Type I error 0.056 0.056 0.053 0.059
2 Power 0.147 0.241 0.390 0.686
Type I error 0.056 0.059 0.045 0.052

6.2. Scenario 2

For the next simulation we introduce covariates and change the drop‐out model. As well as an intercept term we include a time constant Bernoulli(0.5) covariate and a time varying covariate, independently distributed as inline image at each time point. In the notation of Section 4, the corresponding cumulative regression functions are taken to be
image
We add to the mix some error in measurement ɛ, arising according to a t‐distribution on ν degrees of freedom and scaled by a factor σ, i.e. inline image. The final measurement times T1,…,Tn are determined by the relationship
image
so that 1  Ti  7 for each i.
We defined
image

The parameters were taken to be inline image This gave approximately 25% drop‐out, roughly evenly spread over times 2–6. Again 100 bootstrap samples were drawn to compute variances for the test statistic (14).

Mean estimates of B for sample size n=500 using both YM and YS are shown in Fig. 2, together with the true values and ±2 empirical standard errors around the YM‐estimates. Bootstrap standard errors matched the empirical values closely. Standard errors derived from asymptotic results, which avoid the need to bootstrap but at the expense of assuming negligible measurement error, were slightly conservative, overestimating typically by about 5%. As expected there was no evidence of bias for our increment‐based estimates of B based on YM. Estimates from the misspecified model for YS were also good for B2 and B3; in fact so close that the lines in the plots are hardly distinguishable. There was, however, bias for the intercept B1. Identification of the random‐effect structure through residual covariances was more difficult than for scenario 1, causing some loss of power for the test statistic (Table 2).

image

Summary of estimates inline image for scenario 2, at sample size n=500: mean dynamic estimates from YM (——) and YS (– – –) together with true values (·······)

7. Analysing data from a longitudinal trial

We now describe an application of the methods of Section 4 to data from the schizophrenia clinical trial that was introduced earlier. The trial compared three treatments: a placebo, a standard therapy and an experimental therapy. The response of interest, PANSS, is an integer ranging from 30 to 210, where high values indicate more severe symptoms. A patient with schizophrenia entering a clinical trial may typically expect to score around 90.

Of the 518 participants, 249 did not complete the trial, among whom 66 dropped out for reasons that were unrelated to their underlying condition. The remaining 183 represent potentially informative drop‐out, though we emphasize that our new approach does not need to distinguish these from the non‐informative drop‐outs. We mention them only because we shall refer to other procedures that draw such a distinction.

The goal of the study was to compare the three treatments with respect to their ability to improve (reduce) the mean PANSS‐score. The patients were observed at base‐line (t=1) and thereafter at weeks 1, 2, 4, 6 and 8 (t=2,3,4,5,6) of the study. The only covariates used here are treatment groups. The dotted curves in Fig. 3 show for reference the observed mean response at each time in each treatment group, calculated in each case from subjects who have not yet dropped out. Hence, the plotted means estimate conditional expectations of the PANSS‐score (objective 2), which are not necessarily the appropriate targets for inference.

image

Estimated PANSS mean values under OLS (·······) and our dynamic linear approach (——): the topmost curves correspond to the placebo group, the middle curves to the standard treatment group and the lowest curves to the experimental treatment group

Fig. 3 displays the pronounced differences between the OLS estimates and their dynamic linear counterparts. The OLS estimates invite the counter‐intuitive conclusion that, irrespective of treatment type, patients’ PANSS‐scores decrease (improve) over time. By contrast, our increment‐based estimator suggests that this is a feature of informative drop‐out, and that patients on the placebo do not improve over time; in fact, there is even a suggestion that their PANSS‐scores increase slightly. The levelling out of treatment effects over time that is seen under our new approach is also unsurprising.

In Fig. 4 and Table 3 we compare the dynamic linear fits with those which were obtained under four other approaches. Fig. 4 shows the estimated means for each treatment group whereas Table 3 gives for standard treatment the estimated mean change in response between the beginning and end of the study, together with the effect of placebo or experimental treatment on this quantity. The other approaches are as follows:

image

Estimated PANSS mean values for (from top to bottom pairs of curves, in every case) the placebo, standard and experimental groups (‐ ‐ ‐ ‐ ‐, estimates generated under methods (a)–(d) in the text; —–, estimates under the dynamic linear approach): (a) method UMN; (b) Dobson and Henderson's (2003) method; (c) IPW method; (d) method DYN

Table 3. Effect of treatment on change in mean response (week 8 minus week 0) under the LI approach (12), OLS with an independence assumption and methods (a)–(d) described in the text†
Treatment Results for the following methods:
LI OLS (a) (UMN) (b) (Dobson and Henderson, 2003) (c) (IPW) (d) (DYN)
S −5.10 −19.12 −9.90 −5.34 −6.22 −8.29
(3.49) (3.43) (3.06) (2.94) (7.72) (3.21)
P – S 13.04 6.01 11.01 13.66 12.37 12.42
(5.32) (5.01) (4.49) (5.29) (8.82) (4.82)
E – S −7.07 −1.43 −4.89 −5.97 −8.18 −5.40
(3.80) (3.86) (3.38) (3.37) (7.83) (3.73)
  • †‘S’ represents the standard treatment, ‘P’ placebo and ‘E’ the experimental treatment. Standard errors are in parentheses.
  • (a)

    maximum likelihood estimation under a multivariate normal model with unstructured covariance matrix (method UMN) (this approach assumes that drop‐out is MAR);

  • (b)

    a quadratic random‐effects joint longitudinal and event time informative drop‐out model that was fitted by Dobson and Henderson (2003) using EM estimation, as suggested by Wulfsohn and Tsiatis (1997) (Dobson and Henderson compared four random‐effects structures and concluded that, between these, the model that is used here with random intercept, slope and quadratic terms ‘is strongly preferred by likelihood criteria, even after penalizing for complexity’);

  • (c)

    an IPW estimating approach as described by Robins et al. (1995), with a logistic drop‐out MAR model;

  • (d)

    a second martingale fit (DYN) in which residuals at time t are included as covariates for the increments between t and t+1, along the lines of the dynamic covariate approaches for event history analyses as described by Aalen et al. (2004) and Fosen et al. (2006a).

There are broad similarities between our increment‐based estimates and any of approaches (a)–(d) but some differences are worth noting. Method (a) gives a smaller adjustment to the observed means than the others, whereas method (c) adjusts almost as much as our linear increment fits. Both of these are missingness at random models. Method (b) assumes a Gaussian response but method (c) has no modelling assumptions for the responses, a gain that is obtained at the expense of an increase in standard errors. Method (d) leads to estimates that are comparable with the fit that is obtained by using only exogenous covariates, albeit slightly closer to the observed means. Method (b), the quadratic random‐effects model, gives estimates that are close to those obtained by using our new approach. Method (b) took several days of computing time to fit, whereas estimates for other models can be obtained quickly, our linear increment models in particular. The availability of a closed form estimator (12) meant that the 1000 bootstrap simulations that were needed to compute the standard errors were completed in under 10 s on an unremarkable laptop computer. In Appendix A, we demonstrate briefly one way in which our dynamic linear models may be implemented by using standard software.

It is interesting to recall that, in approach (b), Dobson and Henderson (2003) modelled the drop‐out process explicitly and distinguished censoring due to inadequate response from other censoring events; neither is necessary under our proposed approach. Given the similarities between our dynamic linear results and those of method (b), the Dobson and Henderson assumption that these other events are uninformative about PANSS seems to be justified.

The diagnostics proposed may be illustrated by using these data. Having computed inline image, it is straightforward to extract inline image. Fig. 5 shows inline image against inline image at each time point and provides some evidence that our original model is misspecified. Fig. 5(a) for week 1 clearly indicates a weak negative association, which is consistent with measurement error in the response. The effect is less marked in later weeks. As discussed in Section 4.3, this suggests considering inclusion of inline image as an additional covariate in the model for increments at time t, which is approach (d) above. Fig. 4(d) shows that the fitted mean response profiles are not materially affected by the misspecification that is indicated by Fig. 5.

image

PANSS data: residual increments inline image plotted against inline image: (a) week 1; (b) week 2; (c) week 4; (d) week 6; (e) week 8

Box plots illustrating the bootstrap distribution of the diagnostic inline image are shown in Fig. 6. The plot includes results for t=1 to exhibit the magnitude of the independent noise terms. Since the covariance is expected to be constant only for t>1, for diagnostic purposes the first box plot may be safely ignored. On the basis of the remaining box plots, derived from 1000 bootstrap samples, there is evidence of a downward trend in the diagnostic. However, this is mild, and the informal test statistic (again based on 1000 bootstrap samples) is −1.61, corresponding to a p‐value of about 0.1. Together, the diagnostics suggest that departures from the model are sufficiently small to be of little concern.

image

PANSS data: box plots of inline image from 1000 bootstrap samples; for a correctly specified model the mean values for t>1 should be equal

8. Discussion

Many approaches to the analysis of longitudinal data with drop‐out begin with the idea of vectors of complete data Y, observed data Yobs and missingness indicators R. We have argued that this set‐up can be too simple, as it does not recognize that drop‐out can be an event that occurs in the lives of the subjects under study and that can affect future responses. Distributions after drop‐out may be different from those that would have occurred in the absence of that event, an extreme example being when drop‐out is due to death. Another might be when drop‐out is equivalent to discontinuing a treatment. Thus there is no well‐defined complete‐data vector Y and we are led into the world of counterfactuals, as described for the two‐time‐point example of Section 2, and the need for careful thought about objectives and targets for inference. An exception is when inference is conditional on drop‐out time (objective 2) and hence based only on observed data. Otherwise, untestable assumptions of one form or another are required for inference. In this paper we consider interest to lie in the drop‐out‐free response Ya and make the two key assumptions of independent censoring and martingale random effects.

In our view, the analysis of longitudinal data, particularly when subject to missingness, should always take into account the time ordering of the underlying longitudinal processes. Often, the drop‐out decision is made between measurement times, and we acknowledge this by insisting that the drop‐out process be predictable, while allowing it to depend arbitrarily on the past. Subsequent events could be affected by the drop‐out decision, and in this sense drop‐out could be informative about future longitudinal responses. We reiterate that we do not require all future values to be independent of the drop‐out decision: the realized response is free to depend on this decision. Nor is the required independence unconditional: our assumption is that, given everything that has been observed, drop‐out status gives no new information about the mean of the next hypothetical response. This is a weaker and, to us, more logical assumption than the standard MAR form. Ultimately, however, both the missingness at random and the independent censoring assumptions share the same purpose: to enable inference by making assumptions about the drop‐out process. MAR enables inference using the observed data likelihood, whereas independent censoring enables inference using the observed local characteristics.

What is therefore important is that all relevant information in rFt should be included in the model for the next expected increment. For example, Fig. 5 suggested inclusion of the previously observed residual as a covariate for current increments. A similar approach might be used to simplify variance estimation, or if there are subject‐specific trends, as in a random‐slope model. Aalen et al. (2004) advocated an equivalent approach in dynamic linear modelling of recurrent event data. We note also the argument in Fosen et al. (2006a) that use of residuals inline image rather than Y helps to preserve the interpretation of exogenous covariate effects.

Modelling the local characteristics acknowledges the time ordering in longitudinal data analysis, naturally accounting for within‐subject correlation and possibly history‐dependent drop‐out. These features can all be accommodated through linear models on the observed increments of the response process. At no great loss of understanding, the applied statistician could think of our procedure as ‘doing least squares on the observed response increments, then accumulating’, to draw inference about the longitudinal features that a population would have exhibited, assuming that no‐one had dropped out.

Thus far, we have assumed a balanced study design, by which we mean a common set of intended measurement times for all subjects. A natural extension is to unbalanced study designs. It would also be of interest to consider more complicated random‐effects models for the increments of a longitudinal process, potentially gaining efficiency but requiring additional parametric assumptions. We have not so far explored this option; nor the important but challenging possibility of developing sensitivity procedures for our approach.

Acknowledgements

The authors are grateful for the detailed comments and helpful advice of all referees for the paper. Peter Diggle is supported by an Engineering and Physical Sciences Research Council Senior Fellowship. Daniel Farewell's research was carried out during his Medical Research Council funded studentship at Lancaster University. Robin Henderson is grateful for valuable discussions with Ørnulf Borgan and Niels Keiding at the Centre for Advanced Study, Oslo.

    Appendix

    Appendix A: Fitting dynamic linear models by using standard software

    Least squares equations can be solved, and hence our proposed models fitted, in virtually all software for statistical computing. We note, reflecting our own computing preferences, that this is particularly straightforward by using the lmList command from the nlme package (Pinheiro and Bates, 2000) in R or S‐PLUS. For example, to fit the dynamic linear models of Section 4 to the schizophrenia data, we constructed a data frame schizophrenia, having columns i (a unique identifier), time (running from 1 to Ti for each i), treat (a factor indicating the treatment regime) and PANSS. This last column stores the change in PANSS that is associated with the given subject and time point, i.e. it contains ΔYi(1),…,ΔYi(Ti) for every i. Then
    image
    returns an object containing a list of estimates inline image of β(t) for each t ∈ rT, which may be extracted by way of the coef method. The cumulative sum of these estimates
    image
    yields inline image. Additionally, estimated standard errors
    image
    can be extracted from the fitted model if measurement error is thought to be negligible. These estimates (squared) may be summed
    image
    to yield an estimate of inline image without the need for bootstrapping.

    Discussion on the paper by Diggle, Farewell and Henderson

    Joseph W. Hogan (Brown University, Providence)

    Diggle, Farewell and Henderson deserve congratulations for a wide‐ranging and thought provoking paper on a common but still somewhat vexing problem. Among the many contributions that are made in this paper, three deserve attention.

    • (a)

      The authors directly confront the question of defining the target of inference, using potential outcomes to formalize definitions. The importance of carefully defining the estimand cannot be overstated; drop‐out occurs for many reasons, and the consequent missing data cannot necessarily be assumed to arise from a common distribution. In some cases, as with death, ‘missing data’ do not exist.

    • (b)

      The authors make use of stochastic process machinery to formulate a semiparametric shared parameter model that is identified solely through moment restrictions. This is a welcome contribution. It is natural to view the full data as a two‐dimensional stochastic process {Y(t),R(t):t0}; associated models and inferential methods are highly appropriate and lend important insights (see also Lin and Ying (2001) and Tsiatis and Davidian (2004)). Shared parameter models tend to rely heavily on untestable distributional assumptions for random effects (e.g. normality). The model that is given by equations (l5) and (16), where M(t) is the ‘random effect’, requires only moment assumptions (17) and (18) for identification.

    • (c)

      Diggle and his colleagues contribute a comprehensive comparative analysis of real data, using six different methods, allowing readers to consider carefully the underlying assumptions of each method and their effect on inferring the full data distribution.

    My comments relate to the first and third of these.

    Defining the target of inference

    The authors define the target of inference by using potential outcomes (counterfactuals). In Section 2, it is argued that Y2 may be altered by the act of dropping out; hence Y2=RY2a+(1−R)2b, where the realized response at time 2 is Y2a if the participant remains in the study, and Y2b if she drops out. The full data are (Y1,Y2a,Y2b,R). This framework enables articulation and criticism of the modelling objective. It also invites a comparison with the more familiar application of potential outcomes to causal inference.

    In that context, the full data for each individual are (Y0,Y1,T), where the outcome is Y1 if a treatment is received, Y0 if not, and T ∈ {0,1} indicates actual receipt of treatment. Inference about causal effects such as θ=E(Y1Y0) is a missing data problem because only T and YT=TY1+(1−T)Y0 are observed; Y1−T is missing. Causal parameters are identified by placing untestable constraints on the joint distribution of (Y0,Y1,T) and possibly confounders or instrumental variables; see Angrist et al. (1996) and discussants for examples. Similarly, the use of potential outcomes to define the full data by Diggle and his colleagues implicitly requires the analyst to specify or constrain the joint distribution of (Y1,Y2a,Y2b,R). The linear increments method confines attention to (Y1,Y2a,R); however, I could not ascertain whether assumptions about Y2b are required in general, or whether the linear increments method can be used to infer aspects of Y2b.

    On a more conceptual note, one can plausibly argue that potential outcomes are well defined when viewed as inherent characteristics of each individual, e.g. response if treatment is taken, and response if not taken (but see Dawid (2000) and discusssion for a range of viewpoints). In this paper, it is not clear whether the potential outcomes can be viewed as inherent characteristics, or whether they are metaphysical: can we easily conceptualize Y2b for a person who does not drop out, or Y2a a person who does?

    A role for sensitivity analyses?

    The authors briefly mention sensitivity analyses, but I believe that the issue warrants a closer look. Any full data model fit to incomplete data extrapolates the missing data under some set of untestable assumptions. The extrapolation is explicit for some models, and less so for others. Although mixture models were not used to analyse the trial data, they are well suited to assessing sensitivity to assumptions about the missing data mechanism. The mixture model factorization is f(yobs,ymis,r)=f(yobs,ymis|r) f(r), where ω=(θ,φ) is the parameter indexing the full data distribution. In many cases, mixture models admit

    • (a)

      a partition (θI,θNI) of θ into its identified and non‐identified elements and

    • (b)

      a closed form factorization of the mixture components f(yobs,ymis|r) into an unidentified extrapolation model fθI,θNI(ymis|yobs,r) and an identified observed data model fθI(yobs|r), i.e.

      image(21)

    The implications of equation (21) are that

    • (a)

      the fit of the model to observables can be checked and

    • (b)

      assumptions about the unobservables that are encoded via fixed values or prior distributions for θNI will not affect the fit to the observables.

    To illustrate, consider the simple case where Y=(Y1,Y2)T,Y2 may be missing and R is the missing data indicator, The target of inference is E(Y2). Using notation from equation (19), R=𝒞 if Y2 is observed and R=𝒟 if not. A mixture model based only on moment conditions can be specified as
    image(22)
    for r ∈ {𝒞,𝒟}. Clearly inline image, and inline image. Contrary to the assertion in Section 3.2, this model—and all pattern–mixture models—parameterizes the distribution of the full data (Y1,Y2,R), and not just that of observables. Constraints are then imposed to identify E(Y2|Y1,R=𝒟). Indeed, the transparency of pattern–mixture models with respect to model identification can be seen as a virtue in missing data settings (Little, 1995). From expression (22),
    image(23)
    We can reparameterize in terms of ‘sensitivity parameters’τ=(τ0,τ1)T, such that inline image and inline image; here θ1 remains the same but θN1=τ. It follows that
    image
    Because the data provide no information about τ, the analyst must supply either specific values or a plausible range for τ to identify or bound E(Y2). (By adopting a Bayesian perspective it is possible to represent assumptions—and uncertainty about them—by using a prior distribution (Rubin, 1977). In the foregoing, missingness at random (MAR) corresponds to a point mass prior (and posterior) at (τ0,τ1)=(0,0). Then a ‘sensitivity analysis’—i.e. repeating the analysis along a continuum of τ—is a summary of conditional posteriors at an infinite number of point mass priors.) Clearly the missing data mechanism is MAR if and only if τ=0. From equation (23), a consistent estimate of E(Y2) under MAR is
    image

    More generally, inline image provides estimates of or bounds on E(Y2) that are consistent with the observed data. Derivatives of inline image with respect to τ measure sensitivity to departures from MAR, With more measurement times, the dimension of τ obviously increases, but simplifications can be made while preserving the structure in model (21) and maintaining lack of identifiability (e.g. assuming that τ is constant over time, or assuming that departures from MAR are confined to first‐order serial dependence parameters). These principles also can be applied when drop‐out is continuous (Hogan et al., 2004).

    If inline image and τ0=τ1=0, then inline image coincides with the linear increments estimator that is given by equation (19). In general it seems clear that the linear increments model admits missingness not at random mechanisms, but without a decomposition like model (21) it is difficult to understand specifically how it departs from MAR, how missing data are extrapolated from observed data and whether parameterizations for sensitivity analysis can be easily developed.

    Summary

    In my own experience, articulating model assumptions (and their limitations) to collaborators who generate the data and decision makers who interpret the analyses is important. The authors clearly share this view and have gone to great lengths to clarify assumptions and objectives. They also have considerable experience and depth of knowledge in both theory and application of models for missing data, and I look forward to reading their insights about the issues that are raised in the discussion. It is an honour and a pleasure to propose a vote of thanks.

    James Carpenter (London School of Hygiene and Tropical Medicine)

    The first part of the paper rightly highlights the principle of separating assumptions from analysis. Assumptions come first and should be as accessible as possible; statistical methods should be principled and give valid results under the assumptions. Sensitivity analysis is important.

    My concern is that the proposal conflates assumptions and analysis. We can assume a particular missing data mechanism, model as suggested by the authors and end up with an estimate that is valid under quite a different mechanism.

    Suppose that we have data from one treatment arm and no covariates, just base‐line (fully observed) and follow‐up (partially observed). C denotes the group of patients with both measurements. The authors’ basic proposal for estimating the follow‐up mean is
    image(24)

    When will this give valid estimates?:

    • (a)

      always when Y2 is missing completely at random (MCAR);

    • (b)

      not when Y2 is missing at random (MAR);

    • (c)

      not when Y2 is not missing at random (NMAR) unlessY2iY1i are MCAR.

    As we do not know whether Y2 is MAR or NMAR (the diagnostics cannot help us) the authors suggest that we consider adjusting for residuals (see their data analysis). Thus we use complete data and regress Y2Y1 on the residual at time 1, which is equivalent to regressing on Y1. If this gives intercept and slope (inline image, the estimated mean at time 2 is
    image(25)

    However, this estimator is only valid ifY2is MAR (i.e. MCAR givenY1).

    We use a statistical test to choose between expressions (24) and (25). This means that if Y2 is NMAR sometimes we shall use expression (24) when we should not, and the estimate of the mean will be biased if data are NMAR, and vice versa if Y2 is MAR. Simple simulation studies with random intercept and missingness at random mechanisms show that the bias is non‐trivial.

    Thus the missing data mechanism that is needed for valid estimates changes as covariates are introduced or removed from the increment model, in a way that it does not when modelling responses (if assuming data MAR we always use expression (25); the estimator is unbiased whether β=0 or not). So we cannot make statements like ‘Assuming that the data are MAR we perform a valid analysis…’. Instead, the assumption about the missing data mechanism is conflated with the model, yielding biased estimators under the assumptions of data MAR and NMAR. This problem is compounded when we have observations at more than two time points. ‘Summing up’ our increment estimators gives an estimator whose components are valid under a range of conflicting missing data mechanisms. This conflicts with our opening principle.

    A possible generalization emerges by noticing that the proposal relies on a linear transformation of the data into a portion that is fully observed and a portion that is MCAR. In the simple example, this is
    image
    Now there are many possible transformations; for example, with data at three time points and the first two fully observed,
    image(26)

    Using these ‘increments’ we now obtain valid estimates for a much broader class of drop‐out mechanisms (e.g. random intercepts and slopes). One can argue, then, that equation (26) is preferable.

    Diagnostics proposed include a scatterplot of inline imageversusinline image. However, this will often show a strong slope due to regression to the mean, which is easily confirmed by simulating a random‐intercept model with intercept variance four times the error variance. Thus it is of limited use.

    In conclusion I have enjoyed this stimulating paper, but I am left with some nagging doubts. The proposal conflates the analysis model and untestable assumptions about the missing data mechanism in a way that violates the principles that are advocated in the paper and risks confusing all except the most wary: results are generally biased if data are MAR and if data are NMAR, even if the increments have mean 0.

    By contrast, we argue that a data MAR analysis (where [Ymiss|Yobs,R]=[Ymiss|Yobs]) is the natural starting‐point for a per‐protocol analysis (Carpenter and Kenward, 2007). We then look at how robust our conclusions are to departures, either by multiple imputation (chapter 6 of Carpenter and Kenward (2007)), using post‐imputation weighting (Carpenter et al., 2007) or prior information (White et al., 2007).

    Additionally, the residual diagnostic is unreliable and the method cannot naturally handle interim missing data. The method can be seen as one of a class of methods (see equation (26)). However, the underlying missingness mechanisms that these assume are generally implausible relative to final increments or concomitant processes.

    Despite filling my alloted space with criticisms, clearly much is praiseworthy and it gives me great pleasure to second the vote of thanks.

    The vote of thanks was passed by acclamation.

    Hans C. van Houwelingen (Leiden University Medical Center)

    I compliment the authors for an inspiring paper. My attention was immediately drawn by the analysis and graphical presentation of the PNASS‐example. I would have loved to reanalyse the data, but unfortunately they are not available. Therefore, I created my own data set that approximately mimics the example. I created a sample of size n=100 000 with measurements at t=0,1,2,3,4, no covariates and autoregressive data with σ=1, ρ=0.8 and drift μ=0.1t. After the first observation (t=0), values above 1.5 cannot be observed and cause drop‐out, resulting in cumulative drop‐out rates of 8%, 13%, 18% and 23% at t=1,2,3, 4 respectively. My data violate the assumption of the paper because the process is not a martingale and the drop‐out depends on the future. I wondered whether the violations could be visualized at the mean level and not only through the covariance structure.

    I fitted the very simple martingale model with time varying mean increment. The observed means among the ‘survivors’ are −0.0009, −0.0613, −0.0302, 0.0136 and 0.0617; the observed mean increments are 0.0694, 0.0879, 0.1018 and 0.1096 and the cumulative increments are 0.0694, 0.1574, 0.2592 and 0.3688. Fig. 7 shows the reconstruction of the full data means as presented in the paper.

    image

    Reconstruction

    This is a counterfactual picture. I would like to see the link with the observed data. A more informative picture could be Fig. 8. It shows the mean values of the history of all individuals present at t=1,2,3,4. The graph shows the ‘last’ increments in each trajectory as used in the martingale model, but also the increments for those who are ‘alive’ at later measurements. It unmasks the increase over time as well as Fig. 7. It has the flavour of pattern–mixture, but not quite. The martingale model would imply that these lines are parallel. Close inspection shows that they are not perfectly parallel. This could be used for model checking.

    image

    Retrospection

    The graph can also be used to read off predictions one, two, three and four periods ahead. I think that that could be useful for clinical purposes and, again, for model checking. This leads to a ‘predictive’ goodness‐of‐fit check comparing ‘empirical’ mean increments over larger intervals with the ‘model’ value (Fig. 9). This again shows some model violation.

    image

    Predictive increments

    The predictive interpretation of the martingale model and the related goodness‐of‐fit test could be interesting issues for further research.

    Inês Sousa (Lancaster University)

    This paper is an important contribution in the literature of analysis of longitudinal data with drop‐out. The authors have presented a useful discussion on inferential objectives when individuals drop out of the study, distinguishing between unobserved and counterfactual measurements. The most widely used methods are reviewed and categorize according to their different inferential objectives. The model that is proposed brings ideas and methods from survival data analysis into the literature of longitudinal data analysis, when a clear and specified objective is defined. The novelty of the model that is proposed is on modelling increments conditional on all observed history rGt. Therefore, it is possible to simplify E{Ya(t−1)|rGt}=Ya(t−1), and so the subtraction of the error ɛa(t−1) in equation (10) of the paper. Moreover, modelling the conditional distribution on the history makes the realistic assumption that anything now is not determined by something that is unrealized in the future.

    The main difference between a martingale random effect and a stationary random effect, which is commonly used in mixed effects models for longitudinal data, is the expected value of the longitudinal response conditional on data are missing. For the former model, the expected value is the last observed measurement for that individual, whereas for the latter the same expected value is that for the population average.

    I shall then present the results that are obtained for the same data set when a model with stationary random effects is fitted. This is a transformed Gaussian model (Diggle et al., 2007) which assumes a multivariate Gaussian distribution for the vector of longitudinal measurements Y and the logarithm transformation of drop‐out time D,
    image(27)
    where μ=(μY,μD) and the variance matrix Σ is partitioned as
    image(28)

    The fitted model is presented in Fig. 10, for the same mean model as in the paper. The results in Fig. 3 of the paper are comparable with the estimated unconditional mean here in Fig. 10, as this is what we would have observed if no individuals had dropped out. Although confidence intervals are not given in Fig. 3 for the model proposed, these seem to be possible to obtain with standard software.

    image

    Observed and fitted mean response profiles (for each treatment are presented observed means (•), fitted unconditional (——, E[Yj]) and conditional means (·−·−·−, xE[Yj|D> log (tj)]) and approximate 95% pointwise confidence limits for the fitted unconditional means (– – –)): (a) standard treatment; (b) placebo; (c) experimental treatment

    Vanessa Didelez (University College London)

    I congratulate the authors for this stimulating paper.

    They suggest that we consider the (possibly counterfactual) response Y2a if the subject had not dropped out. Which real world quantity does this correspond to? In my view, Y2a is only well defined if a specific intervention to prevent drop‐out can be conceived of—this is also emphasized by the main protagonists of counterfactuals; see Rubin (1978) or Robins et al. (2004). Only then can assumptions about Y2a, such as independent censoring, be justified. Moreover, depending on how we ‘force’ subjects not to drop‐out, there might be different Y2as satisfying different assumptions.

    An alternative formal framework has been suggested by Dawid (2002) (see also Pearl (l993) andLauritzen (2001)): let FR be an intervention indicator, where FR=Ø means that no intervention takes place and drop‐out happens ‘naturally’, and FR=1 means that drop‐out is prevented by a well‐specified mechanism. Then P(R=1|Y1;FR=1)=1, i.e. FR=1 ‘cuts off’ any influence from other variables on drop‐out (illustrated in Figs 11 and 12), whereas P(R=1|Y1;FR=Ø) is the observational distribution of drop‐out. Objective 1 now corresponds to [Y2|FR=Ø], objective 2 to [Y2|R=1;FR=Ø] and objective 3 to [Y2|FR=1].

    image

    Intervention graphs (a) when no intervention takes place and (b) when drop‐out is prevented by an intervention

    image

    Intervention graphs with an unobserved adverse event A (a) when no intervention takes place and (b) when drop‐out is prevented by an intervention

    Assumptions about drop‐out will often be in terms of conditional independence (though independent censoring uses expectation) and may be formalized with graphical models. In Fig. 11, for example, drop‐out implies discontinuation of treatment (more generally T could stand for anything relevant happening after drop‐out), whereas preventing drop‐out, FR=1, ensures that subjects continue treatment as planned. In Fig. 11 we have that in general P(Y2|FR=1)≠P(Y2|FR=0) (or Y2aY2b); however, as can be read off the graph inline image and hence P(Y2|Y1;FR=1)=P(Y2|Y1,R=1;FR=Ø), so that, once we condition on the ‘past’Y1, identification from observed data is still possible. In Fig. 12, drop‐out as well as discontinuation of treatment are caused by an adverse event. Intervening to prevent drop‐out will then not prevent the adverse event and treatment might still be discontinued. Here P(Y2|FR=1)=P(Y2|FR=Ø) (or Y2a=Y2b) and especially P(Y2|Y1;FR=1)≠P(Y2|Y1,R=1;FR=Ø) owing to the ‘confounder’ A.

    Given the analogy to causal reasoning it is not surprising that independent censoring, which is required for identification of objective 3, is very similar to ‘no‐counfounder’ assumptions in causal theories. For instance, the sequential version of the conditional independence in Fig. 11 is called ‘stability’ by Dawid and Didelez (2005) and corresponds to ‘sequential randomization’ (Robins, 1986).

    Axel Gandy (Imperial College London)

    I congratulate the authors on their interesting paper. I would like to discuss the relationship of the diagnostic that is suggested by their formula (13) to transforms of the residual inline image. On the basis of these transforms, diagnostics that are sensitive to specific alternatives can be constructed.

    Checking that the left‐hand side of formula (13) is constant for t2 is equivalent to checking that for t3

    image
    Assuming that the covariates contain an intercept term we have
    image
    for all s. Hence, the left‐hand side of this equation can be estimated by
    image

    where K(1)=K(2)=(0,…,0) and inline image for s3. Thus the diagnostic is based on a transform of inline image.

    Suppose that we have no measurement error, i.e. ɛa(t)=0. Then the process inline image is a martingale. Hence, inline image is a zero‐mean martingale, not only for this specific K, but for any predictable process K. A diagnostic can be based on how far K·Z deviates from 0. By choosing K suitably, we can make it sensitive against specific alternatives, similarly to the approach that was described in Gandy and Jensen (2005) for event history analysis.

    Can a similar construction be made if measurement error is present? In this case inline image need not be a martingale and thus inline image need not be a mean 0 martingale. However, for a slightly smaller class of processes K we can show that inline image for all t. Essentially, besides being predictable, K also needs to satisfy K(t)⊥Δɛ(t). Suppose that, for all t, K(t) is measurable with respect to the σ‐algebra that is generated by
    image

    One can show that inline image is a mean 0 martingale. Furthermore, E[{K·(IHɛrJ}(t)]=0 because K(s)⊥Δɛa(s) for all s. Hence, inline image. Thus directed model diagnostics can be based on a suitable choice of K also in the presence of measurement error.

    N. T. Longford (SNTL, Reading)

    The authors’ emphasis on the objectives of a study is to be commended, although a clear formulation of the objectives of any study is important, even if there is no drop‐out and no longitudinal dimension. It is particularly important for studies that are expensive (ethically, financially or with regard to other resources) and have to be carefully designed. The objectives should be a key consideration in their design stages. In many secondary analyses, the objectives are constructed for the sake of instruction or illustration, often because the original purpose of the study has not been recorded, or may never have been formulated in detail.

    The potential outcomes framework in Section 2 is more complete than is usually encountered in the literature, but it could easily be extended. In addition to Ya and Yb, I would consider Yc, the outcome that would be attained in an ‘operational’ (prescription) scenario, not a clinical trial. I believe that context matters, and therefore YcYb. But the real objective is inference about Yc.

    The rigour that is associated with the study objectives can be ratcheted up indefinitely. For example, in the schizophrenia trial, inference is sought about the population of sufferers. Their good representation in the study cannot be arranged by a sampling design because a sampling frame cannot be constructed. (Informed consent is another obstacle.) In any case, inference is desired about the sufferers in the future, and this population is not realized yet, because some of its members are as yet to contract the condition, and the condition is temporally not stable. Further, the interest may be not in all sufferers, but only those who would in the future be prescribed the treatment. In brief, there are several layers of uncertainty beyond those which are taken into account in the analysis, or those that could reasonably have been.

    The authors propose an efficient estimator, but one might be interested in the efficient pairing of a design and an estimator. Intending to apply their estimator, how does one go about designing a longitudinal study in which drop‐out is expected? Do the established principles suffice? How is the uncertainty about the martingale condition taken into account?

    The conditional independence involved in the missingness at random mechanism is retained by non‐linear transformations. In contrast, the identity of conditional expectations that are involved in martingales is not invariant with respect to non‐linear transformations. So, the scale on which the outcomes are analysed matters. Is this a problem?

    D. R. Cox (Nuffield College, Oxford)

    It is a pleasure to congratulate the authors on their contribution to a challenging topic.

    It is known (Rotnitzky et al., 2000) that for some simple models of informative non‐response both that not directly testable assumptions are involved, as indeed seems inevitable, but also that even if those assumptions are satisfied the estimates have very poor properties near a null hypothesis where in fact non‐response is uninformative. This is because the score vector may be singular. Is broadly similar behaviour possible in the authors’ model?

    The authors’ discussion of objectives is clearly central and raises the design requirement, wherever feasible, to clarify for each individual the reason for missingness, especially since types of missingness may vary systematically between treatment arms.

    The use of a random‐walk‐type error model is elegant but does have rather strong implications if used over extended time periods.

    The following contributions were received in writing after the meeting.

    Odd O. Aalen (University of Oslo)

    This interesting paper gives a fresh view on the analysis of longitudinal data. Such data are very common and are often handled with complex methods that practical researchers may find difficult to carry out. A major issue is that of missing data, which are very common, in fact hardly avoidable in many fields. The following general comments can be made.

    • (a)

      The use of martingale modelling gives a very flexible tool. This means that drop‐out may depend on the past in possibly complex ways, and even allows dependence between individuals. In fact, martingale assumptions may replace the classical assumptions of independence, just like in the counting process approach to event history analysis.

    • (b)

      The paper is also a contribution to removing the artificial distinction between longitudinal data analysis and event history analysis. The connection is made by the parallel between increments in longitudinal data and events in counting processes. In fact, events are a special case of increments, and the realization that methods for event histories may be applied to longitudinal data is of great interest.

    • (c)

      In fact, we often see a mixture of event data and longitudinal data, for instance when covariates or markers are measured repeatedly over time in survival studies. The typical approach is to view the events as the primary focus, and to include the time‐dependent covariates in, say, a Cox model. A better approach would be to consider event processes and the covariate process in parallel, and to treat them on the same level. This gives also a much better understanding of time‐dependent covariates, as demonstrated in Fosen et al. (2006b). The concept of dynamic path analysis that was introduced there, with the attendant graphical analysis, should be equally useful in the setting of the present paper.

    • (d)

      Missing data are often handled by means of inverse probability weighting, which is part of a more general approach to causal modelling where one constructs pseudopopulations that would have been observed in the absence of missing data. Although counterfactual approaches are useful, the construction of pseudopopulations by adjusting for imbalance in other variables or processes may limit the understanding of the process that actually occurs. The approach that is advocated in the present paper, in contrast, is dynamic. An essence of the dynamic view is to analyse the data as they present themselves without construction of hypothetical populations (Aalen et al., 2004; Fosen et al., 2006a, b).

    Daniel Commenges (Université Bordeaux 2)

    The authors propose an approach which focuses on modelling the change rather than the absolute value of the process of interest and this, together with the estimating method, provides a useful tool which has similarities with the proposal of Fosen et al. (2006b).

    Among several issues that are raised by this stimulating paper, I shall focus on that of the continuous or discrete nature of time in statistical models. The model that is presented by the authors is clearly in discrete time, except for the indicator process which is ‘thought’ of in continuous time. I would like to advocate a more realistic point of view which has become classical in automatic theory for a long time but is still uncommon in the biostatistical literature. It consists in separating the ‘model for the system’ and the ‘model for the observation’. Most often the system lives in continuous time whereas observations may, for some events, be in continuous time but are most often in discrete time. The response indicator process Ri(t) in the so‐called time coarsening model for processes observation scheme (Commenges and Gégout‐Petit, 2005; Commenges et al., 2007) can represent continuous or discrete time observations. For instance, if observations are made at t1,t2,…,tm,Ri(t)=0 for all t except for t1,t2,…,tm where it takes the value 1. Generally both the process of interest and the response indicator process are multidimensional so some components may be observed in continuous time whereas others are observed in discrete time. In Commenges and Gégout‐Petit (2005) ignorability conditions are given. An example is that of a medical doctor who decides the date of the next observation of CD4 cell counts and human immunodeficiency virus load on the basis of the observations of these two markers at the current visit. If the CD4 cell counts and viral loads are modelled, this observation scheme is ignorable. An application of this point of view is given in Ganiayre et al. (2007) where a cognitive process living in continuous time is observed in discrete time by both a psychometric test and the diagnosis of dementia.

    Richard Cook and Jerry Lawless (University of Waterloo)

    It is difficult to give convincing general prescriptions for modelling and analysis when the physical processes behind longitudinal data and the nuances of cessation of treatment, drop‐out or termination processes vary so widely across applications. Counterfactuals seem to us unappealing in this setting, though many statisticians would disagree; see the discussion of Dawid (2000). We prefer notation that distinguishes actual outcomes. Excluding the possibility of termination, let X2=1 if the individual remains on treatment and X2=0 otherwise, and R2=1 if their response is observed at the second assessment and R2=0 otherwise. The variable X2 indicates whether the treatment was received and R2 indicates whether the response is observed. We then have E(Y2|R2=1,X2=1), E(Y2|R2=1,X2=0), E(Y2|R2=0,X2=1) and E(Y2|R2=0,X2=0); of course, if R2=0, then we may not know whether X2=0 or X2=1, and Y2 is unobserved. If ‘clinical interest genuinely lies in the hypothetical response that patients would have produced if they had not dropped out’, then presumably we are interested in Y2|X2=1, R2=1, i.e. we must draw comparisons with ‘similar’ subjects who received the treatment and did not drop out.

    We have some other comments.

    • (a)

      Conditional models are most convenient for describing the effects of inclusion in a study, the evolution of responses over time and drop‐out and treatment processes. See Raudenbush (2001) for some interesting discussion in a specific setting.

    • (b)

      Regarding comparisons based on marginal process features, Cook and Lawless (1997, 2002) considered examples involving recurrent and terminating events.

    • (c)

      Information about the observation and drop‐out processes should be collected including, for example, factors that are related to the times that an individual is seen, or why she leaves the study. The drop‐out process and any terminating processes should also be examined and analysed as functions of the treatment and previous process history. This is a necessary step for methods involving inverse probability weights for estimating marginal features, but it also provides insight into the interpretation of marginal features. In some settings it may be worth recruiting fewer subjects to leave funds for tracing drop‐outs.

    • (d)

      As the authors mention, when observations times aj (j=1,2,…) are widely spaced there is a strong likelihood that some losses to follow‐up at aj are not conditionally independent of the process history over [aj−1,aj), given the history up to aj−1. Point (c) is crucial in formulating plausible models as a basis for sensitivity analysis.

    David J. Hand (Imperial College London)

    I was delighted to see this paper. In particular, I was especially pleased to see the first half of the paper, drawing attention to the several research questions which one may wish to address in the context of drop‐outs in longitudinal data. The reason why I was so pleased is that the paper represents a reinforcement of, and detailed exploration of a special case of, a general point that I made in a previous paper (Hand, 1994) that was presented to the Society. I there pointed out that much statistical research fails to pay sufficiently close attention to the real aims of the research, and so risks drawing inappropriate, irrelevant and incorrect conclusions. The authors of the present paper say

    ‘In all applications careful thought needs to be given to the purpose of the study and the analysis’.

    In my paper, I said

    ‘Too much current statistical work takes a superficial view of the client's research question …without considering in depth whether the questions being answered are in fact those which should be asked’.

    I presented a series of detailed examples of this—and, indeed, mentioned drop‐outs in longitudinal data in passing. I hope that the present paper will serve to draw people's attention, not only to the need for precise formulation of the research question in the context of longitudinal data, but also to such problems elsewhere: parallel issues exist in many other situations.

    Haiqun Lin (Yale University School of Public Health, New Haven)

    I compliment Diggle, Farewell and Henderson on this outstanding and enlightening work. It is my great pleasure and honour to contribute to the discussion.

    This paper introduces a novel and elegant discrete time local incremental linear model to target the inference in a counterfactual drop‐out‐free world under balanced longitudinal study. The residual process of the response is a discrete time martingale that can be regarded as distributional‐free, subject‐specific, time varying random effects with heteroscedastic variances.

    Strikingly, the incremental approach does not need to specify a drop‐out model but allows the drop‐out to depend on a past latent process that is related to the response such as a previous martingale residual. This can be regarded as a type of assumption of data missing not at random that is weaker than data missing at random or sequentially missing at random.

    A remarkable result from the method proposed is that the parameters for the counterfactual outcome can be obtained with the ordinary least squares from the observed data with drop‐out. The inverse weighting method of Robins et al. (1995) needs to specify an additional drop‐out model under data sequentially missing at random and the likelihood method accommodates data missing at random but requires a distributional assumption for the response and identical counterfactual outcomes if a subject had dropped out or continued to stay.

    In the model proposed, the dependence of expected increment EYa(t)} on past history of the response in rGt is only through Xa(t) β(t) and therefore the martingale properties of the residual process rely critically on the choice of covariates in X which can include exogenous variables, measured responses and dynamic covariates and on correctly specifying the covariates’ functional forms. Nevertheless, the diagnostic procedures proposed may help in decision regarding X. The model is not designed for interpreting a covariate effect on the response itself. An increment is much less consistent in its direction especially when the response is relatively stable over a period of time, in which case β(t) may be forced to have opposite sign for a same covariate at different ts. However, this is a minor concern if the response trajectory is of major interest rather than the covariate effect.

    For discrete longitudinal responses where the method may not be readily applied, the inverse probability method assuming data sequentially missing at random will be highly desirable if a populational‐average inference under a distribution‐free setting is preferred. With data missing not at random, the joint model would be of great value if a similar result to that obtained by Hsieh et al. (2006) can be established.

    Mary Lunn (University of Oxford)

    The paper proposes a model which carries over some features of the additive model from counting processes with censoring to longitudinal data with drop‐out.

    The two key assumptions are that

    • (a)

      the random effects are martingales and

    • (b)

      drop‐out does not affect the expected mean differences so

      image

    These are of similar import to the assumptions in Andersen et al. (1993). One key point is that it is not assumed that the martingale random effects have constant variability. This can clearly be seen in the first simulation (Fig. 1) where the correctly specified model has increased length in the box plots as time increases. This is less prevalent in the incorrectly specified model using the Laird–Ware random effect as might be expected, although again as expected the median certainly decreases since observations with high positive random effect are more likely to drop out.

    Andersen et al. (1993), page 565, commented that unweighted least squares do not take into account the differing variability in the martingale process and they went on to show that to achieve some kind of optimality a weighted least squares would be preferable. They deduced the form of the weighted estimators. One suspects that this will also be the case in this model for longitudinal data.

    One final incidental comment is that it is not immediately clear why the same drop‐out process was used for both models in the second scenario, but not in the first. This may be a misreading of what was intended here.

    Torben Martinussen (University of Copenhagen)

    I congratulate the authors on this interesting contribution on longitudinal data with missing data. They take the same approach as has been applied successfully to right‐censored survival data, i.e. they formulate the model with parameters of interest in the situation where no censoring (drop‐outs) occur and then make a condition (independent censoring) so that the parameters of the original model can be estimated consistently. The challenge is to consider whether or not this condition is reasonable in specific applications. It is claimed that certain missingness not at random situations are covered but how does this tie in with the requirement of the drop‐out process being predictable? An alternative representation of the condition, in the simple example with two measurement times, is
    image
    where f(·) is used to denote a density function. This condition is clearly fulfilled if f(R|Y1,ɛ1,Y2a) is independent of Y2a. Looking at the above condition more closely, it is seen to be equivalent to
    image

    corresponding to Y2a and R being uncorrelated given (Y1,ɛ1).

    The model for the longitudinal response corresponds to the Aalen additive hazards model (Aalen, 1980) for survival data. Results are formulated for the estimated cumulated coefficients inline image, which is claimed to be √n consistent (Farewell, 2006). For survival data, the Aalen additive hazards model is indeed appealing as it easily accommodates inference for the time‐dependent regression coefficients; McKeague and Sasieni (1994) and Martinussen and Scheike (2006). This is carried out on the basis of the cumulated coefficients as they can be estimated at the usual √n‐rate, which is not so for the regression coefficients. It is not so obvious, however, that the same approach for longitudinal data with drop‐outs is likewise appealing as interaction with time is easily modelled and estimated by using traditional methods, and the estimators for the regression coefficients in this setting converge at the usual √n‐rate. Looking at increments the cumulatives arise naturally, but is anything else gained in this situation by aiming at the cumulated regression coefficients?

    Concerning goodness‐of‐fit tools, it should also be possible to apply the techniques that were described by Lin et al. (1993, 2002) in this setting. For example, to check whether the functional form of a specific covariate (the pth, say) is correctly specified, we may consider the cumulated martingale residual process
    image
    where Kz(t)=(I{X1p(t)z},...,I{Xnp(t)z})T. The resampling technique of Lin et al. (1993, 2002) can be applied using the independent and identically distributed data decomposition of Gp(z):
    image
    where
    image
    with
    image

    Geert Molenberghs (Universiteit Hasselt, Diepenbeek) and Geert Verbeke (Catholic University of Leuven)

    The paper is thought provoking and of interest since, although much has been written about non‐ignorable missingness (Verbeke and Molenberghs, 2000; Molenberghs and Verbeke, 2005; Molenberghs and Kenward, 2007), the authors succeed in presenting a fresh take on the problem, not only through their original taxonomy in terms of counterfactuals, but also by using a novel modelling framework using increments, martingale theory and stochastic processes.

    The authors take a parametric view. Although this agrees with our own inclinations, there also is an increasing volume of semiparametric research, synthesized in Tsiatis (2006). A fine comparison between the semiparametric, doubly robust framework and conventional methods is provided in Davidian et al. (2005). Although a focus on but one framework is common, owing to research interests and the strong but unfortunate dividing lines between the Rubin and Robins camps, every contribution furthering understanding of the relative merits is welcome. This includes connections with causal inference, counterfactuals and instrumental variables. Similarly, it is important to discuss the sensitivities of the various techniques that have been proposed, especially when fully parametric, the implications thereof and (in)formal ways to assess and address such sensitivities. Although these points have been touched on by the authors through illuminating literature reviews, showing the authors’ thorough familiarity with the topic, we are looking forward to, on the one hand, further integration between the parametric and semiparametric schools from the viewpoint of the methods proposed and, on the other hand, to suggestions for sensitivity analysis tools.

    Likewise, further studying the implications of the proposal on the pattern–mixture and shared parameter frameworks, both of which are now touched on only lightly, will enhance understanding; this has a large potential for practical use.

    The authors acknowledge that their work thus far has been confined to the balanced case, meaning that all subjects are measured at a common predetermined set of measurement occasions and that the extension to the unbalanced case is natural. We certainly agree that such extensions would be important and practically useful given the huge volume of observational and other non‐balanced studies. Thanks to the martingale take on the problem, this statement seems warranted and we would be very interested in learning more. Arguably, such extensions would have to distinguish between cases where measurement times are merely unbalanced by design, or where, in addition, the measurement times are genuinely random and potentially contain information about the process of scientific interest.

    Christian B. Pipper and Thomas H. Scheike (University of Copenhagen)

    We enjoyed reading this very stimulating paper.

    Additive time varying models are very useful to obtain a covariate‐dependent description of time dynamics and have been studied in both the regression and the hazard setting. The authors consider discrete time models that are conceptually much simpler than continuous time longitudinal data, in particular when increments are of interest. We find it very relevant to model increments exactly as was done in Martinussen and Scheike (2000) in a continuous time setting. In growth trials for example the velocity of growth is often the natural quantity of interest.

    The authors consider a discrete time setting and make a model for the response variable of the form
    image

    with additional independent measurement error. The observed data are then modelled by introducing at‐risk indicators Ri(t) that are 1 if the subject is under risk and 0 otherwise. We discuss the model specification and the two key assumptions therein:

    • (a)

      the martingale assumptions on the error term;

    • (b)

      the censoring is predictable (see Scheike and Pipper, below).

    The authors suggest that we partition the random variation into random effects and measurement error. This seems ambiguous as long as a more specific error term model is not specified. It is not clear to us what is really gained by this, in particular since the variance estimator of the model is only practically operational in the case without measurement error.

    One problem with the martingale specification of the mean model is that, even though the martingale assumption looks very innocent, we do in fact need to specify correctly how the mean (of the increments) depends on the entire history of past observations and covariates. This can be very difficult in particular if time varying covariates are present but it is also difficult to bring in the information about the past observation of the responses. Therefore it is also of interest to look at more marginal models where one specifies a mean model conditioning only on for example the current covariates such that
    image
    such as suggested by Pepe and Couper (1997).

    Thomas H. Scheike and Christian B. Pipper (University of Copenhagen)

    First we discuss the assumption that the censoring process is predictable. Formally it is assumed that R(t+1) is measurable with respect to ℛt. This implies that there are functions ft so that R(t+1)=ft(R(1),…,R(t)) and by recursion we see that there are deterministic functions gt so that R(t)=gt{R(1)}. Thus the point of drop‐out is known to us at the beginning of the study. In an independent and identically distributed data setting all subjects are thus censored at the same time. This assumption is very restrictive. For the PANSS study this assumption is barely satisfied.

    The assumption of predictable censoring can easily be relaxed and what is needed to recover the original parameters of the responses is something like
    image(29)
    as in Martinussen and Scheike (2006), page 398. Then the observed responses will have mean
    image
    and then the model can be recovered by dividing with an estimate of E{Ri(t)|ℱ(t−)}. This is the inverse probability weighting technique that is used in the example.
    The model will give a description of the time dynamic effects of the covariates. It may be of interest to simplify the model to the semiparametric model
    image

    where the design is partitioned into two parts. In the discrete time setting this is a standard model.

    The increment model specifies that the mean of the response given the history is of the form
    image

    in contrast with a standard regression model that models the mean by X(t) β(t). When the covariates are constants, like treatment groups, the first mean equals X(t) B(t) with inline image. The two models are thus equivalent in this situation.

    I. L. Solis‐Trapala (Lancaster University)

    This stimulating paper begins with a succinct account of existing approaches to the analysis of longitudinal data with drop‐out, encouraging the reader to consider carefully the objectives of the study at hand in their own modelling strategies.

    Firstly, I would like to reflect on the target of inference which motivates the authors’ proposal. Although they state that their target of inference is the mean response, they propose to model the expected increments of the longitudinal process. Indeed, it seems from the context that they are interested in measuring mean contrasts between groups of participants who are assigned to different treatments, rather than mean contrasts within subjects.

    This distinction is briefly highlighted in the discussion section, where it is argued that inclusion of residuals in the mean specification rather than previous responses preserves the interpretation of the effects of the exogenous covariates. For example, in the case of the schizophrenia study, measurement of a direct effect of treatment appears to be of primary scientific interest.

    Secondly, the way that the measurement error is included in the linear model is not entirely clear to me. Intuitively, I would associate a measurement error, rather than a lagged error, with the response increments.

    Thirdly, the authors acknowledge a limitation of their model, namely that it is based on untestable assumptions. This limitation is not specific to their approach, but is well known from other models dealing with missing data. Assuming a martingale for the random effect is, in my opinion, an elegant way of formalizing the key assumption of stability. This reflects that the unobserved increments (due to drop‐out) are assumed to follow a process that is similar to that observed in the past.

    Jeremy M. G. Taylor (University of Michigan, Ann Arbor)

    I agree with the authors that for some scientific applications involving longitudinal data it makes sense for the targets of inference to be parameters of a hypothetical drop‐out‐free world, whereas in other applications this may not make sense. A difficult question is whether we can consider a hypothetical drop‐out‐free world, when drop‐out is due to death. In cancer research a frequently used experiment is one in which tumours grow in laboratory mice, and the response variable is the size of the tumour at 12 months say. Such experiments may include both planned early sacrifices and sacrifices to prevent suffering in animals in which the tumour has grown large. I would be interested in hearing the authors’ view on whether it is still reasonable to consider a hypothetical drop‐out‐free world in this setting when evaluating the mean tumour size at 12 months.

    The Diggle, Farewell and Henderson longitudinal model raises the issue of ‘what is a statistical model?’. One view, which is implicit in their specification, is that a model should be a plausible approximation to the mechanism that gave rise to the observations. Under this viewpoint, it should be a principle that the observations and drop‐out at a certain time cannot depend on the future. But if a model is simply viewed as a way to describe data, using a small number of parameters, then this principle seems less pertinent.

    In longitudinal models distinction is made between subject‐specific, population‐average and transition models. The increments model of the authors has the flavour of a transition model. In continuous time, modelling increments in the response generalizes to that of modelling slopes. This then bears some resemblance to some of our previous work (Taylor et al., 1994). We assumed that the expected slope at time t evolved according to an Ornstein–Uhlenbeck process. This leads to a model for the measured response of the form Yit=X(t)β+ai+Wi(t)+eit where inline image and Wi(t) is an integrated Ornstein–Uhlenbeck process.

    The good efficiency properties of the approach of Diggle and his colleagues compared with fully parametric joint modelling was interesting, but somewhat surprising to me. The very poor efficiency of the inverse probability weighting approach was also striking. Have the authors found similar efficiency comparison results in other applications and in simulations?

    D. Zeng and D. Y. Lin (University of North Carolina, Chapel Hill)

    We congratulate the authors on a clever and intriguing piece of work. The time‐specific conditional mean models avoid the ambiguity of counterfactual response after drop‐out, which can be an issue in joint modelling. Joint models, however, are useful for prediction and amenable to efficient estimation. We pose two questions.

    • (a)

      Since the model is conditional on the response history, is β(t) in equation (10) the most relevant quantity?

    • (b)

      Are there concrete examples that the drop‐out process satisfies the assumptions of Section 4.1.2 but violates missingness at random?

    We offer a simple approach to inference. For i=1,…,n and t=1,…,τ, let Yi(t) be the response of the ith subject at time t and Xi(t) be the corresponding p×1 vector of covariates. The estimator inline image solves the equation
    image
    where ΔYi(t)=Yi(t)−Yi(t−1). Since the estimation function is a sum of independent zero‐mean random vectors, standard asymptotic arguments entail that inline image is asymptotically (multivariate) normal and the covariance matrix between inline image and inline image can be estimated by the sandwich estimator
    image(30)

    We then estimate the covariance matrix of inline image on 𝒯 by inline image, where inline image is the pt×pt sandwich covariance matrix estimator for inline image based on expression (30) and I is the p×p identity matrix. Thus, we can make inference about B(t) by using standard procedures for normal statistics. Since it is a very simple function of data, the sandwich estimator should provide accurate variance estimation in finite samples. It is not necessary to use the bootstrap, although the above arguments imply that the bootstrap is valid.

    The authors replied later, in writing, as follows.

    We thank all the discussants for their helpful and constructive comments and apologize if we have overlooked any of these in our reply. We have grouped our response under three headings: objectives, general modelling issues, including sensitivity and diagnostic checking, and issues that are specific to our proposed model class and its possible extensions.

    Objectives

    We agree with Hand that careful consideration of objectives is always important, and not at all specific to longitudinal studies. We suspect that all statisticians would agree, but that not enough statistics degree syllabuses give this topic the attention that it deserves.

    Hogan, Cook and Lawless, Molenberghs and Verbeke, and Didelez all comment on the link between our discussion of potential outcomes and the wider topic of causal inference. Hogan asks whether our potential, but unrealized, outcomes are inherent characteristics of the subjects to whom they belong, or purely metaphysical. A partial answer is that this depends on the context. For the data that are analysed in Section 7 of our paper, and borrowing from Didelez's comments, we can easily conceive of an intervention, albeit an unethical one, that would prevent drop‐out. Perhaps a better example is long‐term follow‐up of dialysis patients, with drop‐out corresponding to transplantation. Kidney function in the absence of transplant is definitely a legitimate target for inference. In cases of this kind, our discussion simply makes explicit what is often glossed over—that any analysis treating drop‐out as ignorable is, nevertheless, making untestable assumptions about things that, by definition, cannot be observed. In some other contexts, most obviously when drop‐out equates to natural death, any inference about a hypothetical drop‐out‐free population is of dubious practical relevance. Nevertheless, our view is that this need not preclude including a potentially infinite sequence of measurements as part of a joint model for measurements and time of death. In answer to Taylor's question concerning animal experimentation, planned sacrifices are missing completely at random (MCAR), whereas sacrifices in response to an observed large tumour size are missing at random (MAR). Hence, in conventional terms both kinds of drop‐out are ignorable. However, simply to conduct a standard likelihood‐based analysis of the non‐missing data would be too glib, not because there is anything wrong with modelling a hypothetical drop‐out‐free process in this setting—on the contrary, this is the natural process that operates in the absence of any intervention by the experimenter—but because the implied target for inference is not necessarily the most sensible interpretation of what precisely is meant by ‘the mean tumour size at 12 months’.

    Longford's Yc could be construed as a mixture of Ya and Yb, with the mixture proportion referring to the rate of compliance in an operational setting; however, we suspect that he is making the stronger point that what happens in a controlled trial setting may or may not be a reliable guide to what happens in clinical practice. This is a fair point, but not specific to the topic of our paper.

    We completely agree with the point that was made by Cox, and by Cook and Lawless, that recording the reason for drop‐out should always be included in the study protocol. We also agree, as suggested by Didelez, that the reason for drop‐out may affect the assumed model for Y2a. In answer to Longford's question about longitudinal design, we would suggest that discussion of the likely drop‐out rate and how this might be minimized should feature strongly. By far the best way to deal with drop‐outs is to avoid them. However, since this is not an achievable goal, we would suggest the recording of any collateral information that, by its inclusion as an explanatory or otherwise classifying variable, might render censoring independent, or nearly so. Put another way, the non‐ignorability of drop‐out can arise in part through a failure to record explanatory variables that are associated both with the measurement process and with the drop‐out process, in the same way that random effects in regression models can be thought of as representing unmeasured subject‐specific explanatory variables.

    Hogan asks whether we can use our dynamic linear increment model to make inferences about Y2b. The short answer is no. Our primary purpose in introducing Y2b is to acknowledge explicitly that it is different from Y2a. We can, however, easily imagine situations in which drop‐out does not imply loss to follow‐up and Y2b is an observable quantity.

    General modelling, sensitivity and diagnostic checking

    We agree with Carpenter's statement that statistical methods should be principled and give valid results under stated assumptions. We are confused, however, by his subsequent remarks on conflation of assumptions and analysis and our proposal leading to estimates that are valid under a mechanism that is different from that assumed. If our assumptions are correct we shall obtain the right answers. Otherwise we may not. We claim nothing more, and we claim nothing less.

    Carpenter uses the two‐time‐point example to raise, we think, three issues: first, possible bias of the estimators (his equations (24) and (25)); second, which of the estimators to use; third, how assumptions about the missing data mechanism change with covariate selection. His discussion throughout is in terms of the familiar MCAR–MAR–NMAR drop‐out terminology instead of the censoring interpretation of our paper. Two of his technical claims are not correct in general: that his equation (24) is not valid when Y2 is MAR, and that his equation (25) is only valid ifY2is MAR (his italics). Using the notation in Section 5 of our paper, a counter‐example to the former is when var(ɛ1)=0 in the model that is defined by our equations (15)–(18). A counter‐example to the latter occurs for the non‐MAR data model

    image

    for which our residual‐adjusted estimator is unbiased. These are minor corrections but perhaps illustrate the danger of attempting to coerce one framework into another. We do not claim that our estimators are valid beyond our assumptions and, contrary to Carpenter's statement, we do not violate the principles that are advocated in the paper. If it is indeed true that our estimators are biased under missingness at random and missingness not at random (which contradicts the italicized statement above) then the issue is of robustness and not validity.

    Nor do we conflate assumptions with analysis. Our fundamental drop‐out assumption is that of independent censoring: given the observed and unobserved past there is no further information in knowing that drop‐out did or did not occur at time t. This is quite separate from the modelling issue, which requires us to assume that all relevant aspects of the past are included in our linear model for increments. This no more conflates assumptions with analysis than, for example, first assuming a fundamental MAR mechanism, then taking a parametric model for the longitudinal responses (or a logistic model for drop‐out, if using inverse probability weighting) and choosing the terms to include in that model. The underlying mechanism may indeed be MAR, but if the response distribution is incorrectly specified or the chosen model does not include the correct terms, then the results may be biased. We are not aware of any approach in which Carpenter's statement ‘Assuming the data are MAR we perform a valid analysis…’ can be made without other assumptions.

    Carpenter concludes that our assumed missingness mechanism is implausible, which is a strong statement given that our assumption of independent censoring underpins the majority of event history methodology. We do not claim that independent censoring is likely to be true in all applications. But nor do we accept that it is less likely to be true than many other drop‐out mechanisms. For instance, and in answer to Zeng and Lin's second question, in the random‐intercept model that is mentioned in the text following our equation (20) it seems to us highly plausible to assume that drop‐out probability is determined by the random effect M1 rather than the values of Y1 and Y2. It is worth noting here the contributions by Commenges and Taylor, as they both draw an extremely useful distinction between modelling the system and modelling the observations. The random effect M1 attempts in a very simplistic way to capture the underlying and unobserved health of the individual and is in the spirit of modelling the system, whereas a standard MAR–MNAR model based on Y1 and Y2 is closer to modelling the observations.

    Returning to the discussion of the two‐time‐point example that is given by Carpenter, we agree that in general the estimators that are defined by his equations (24) and (25) cannot both be simultaneously unbiased and that if we use our first diagnostic for guidance then there will be occasions when the wrong choice is made. How often this will happen is unknown. Carpenter comments briefly on simulation results that indicate non‐trivial bias. Without further details, we cannot offer a specific reply but, as we have indicated above, unless the missingness at random mechanism was included in our independent censoring class we would not claim unbiasedness for either of his equations (24) or (25). Clearly, further work on robustness would be helpful; the same is true for diagnostics, which are an area we believe to be underdeveloped in drop‐out modelling. We welcome therefore the constructive suggestions of Gandy, Martinussen and van Houwelingen. Gandy's formulation of the second diagnostic as a transform means that we can choose K to provide sensitivity to specified alternatives. It would be useful also to investigate properties of diagnostics under those alternatives, which means careful thought about what alternatives or misspecification may be of interest. As mentioned above, choice of covariates is important, as is functional form for covariates. We therefore appreciate the easy‐to‐apply resampling suggestion of Martinussen, which we intend to pursue. We shall pursue also the suggestions of van Houwelingen, which show his characteristic good sense. Plotting trajectories of observed means is valuable for exploratory purposes, and the predictive plot should have diagnostic value. We are not so sure about the retrospective plot, because the martingale structure is not time reversible. For example, suppose that the martingale follows a random walk, MtstZs, with Zt=±1, but those with Zt−1 always drop out at time t. Then, for subjects who are observed at t the increment between t−1 and t has mean 0, whereas the increment between t−2 and t−1 has mean 1, since all subjects in this group have Zt−1=1. So the pattern depends on t and the profiles will not be parallel for different drop‐out groups. If the drop‐out mechanism is simple and stationary then there should be similarities between the profiles, but this stationarity is not required under our proposal: in the present example, for instance, we could have positive steps causing drop‐out at some t and negative steps causing drop‐out at other t.

    No matter how careful the diagnostics, in drop‐out modelling there is invariably the need for untestable assumptions of one sort of another and we agree with the suggestions of Hogan, Molenberghs and Verbeke, and Carpenter that sensitivity methods should be developed for our proposal. How best to do this is not immediately obvious given that we do not model the drop‐out process, so there is no single parameter to vary to introduce degrees of dependent censoring. Hogan's proposal to consider models for means is consistent with our moment‐based approach and provides an excellent foundation for further work. It also shows the attraction of thinking in terms of selection and mixture factorizations rather than models. Both can be used within the same analysis, for their own good purposes and in an entirely consistent way. In particular, we find Hogan's specific comments about pattern–mixture factorizations compelling from a diagnostic checking perspective, though less so from a modelling perspective because the identifying restrictions that are used in practice often seem unnatural.

    The dynamic linear increment model and possible extensions

    Aalen notes that the distinction between longitudinal data analysis and event history analysis is somewhat artificial, unified as they are by the central notion of time. We agree, and we feel that there is much to be gained from cross‐pollination between the two fields. We have unashamedly imported event history analysis methdology to tackle longitudinal data with drop‐out. Sousa's contribution is an example of the opposite case, where multivariate normal distribution machinery, which is used routinely in longitudinal data analysis, is extended to incorporate an event time also. Both Aalen and Commenges comment on the similarities between our work and that of Fosen et al. (2006b). Their work evolved contemporaneously with ours, and there does seem to be considerable potential for combining the two methodologies. It is often true that both a quantitative response and a sequence of ‘failure’ times are of scientific interest, in which case a joint model for events and the longitudinal response is needed.

    Although Aalen favours the dynamic approach that we advocate, other discussants (Cook and Lawless, Pipper and Scheike, and Zeng and Lin) wonder whether a marginal model might be preferable. We note that the parameters of our dynamic model have a marginal interpretation in the case where only exogenous covariates are used: if rX denotes the history of these covariates, then EY(t)|rXt}=X(t) β(t). This interpretation is lost when dynamic covariates are used. However, direct effects of treatment which, as Solis‐Trapala observes, are often of scientific interest, can still be recovered along the lines of Fosen et al. (2006b). Martinussen wonders why we aim at cumulative regression coefficients. We agree with Lin (and the body of event history analysis literature) that typically the cumulative coefficients B(t) are more stable and easier to interpret than the incremental coefficients β(t).

    As Pipper and Scheike, and Lin correctly point out, special care is needed in the choice of covariates (dynamic or otherwise). Further, Longford highlights the fact that martingale residuals are not preserved under non‐linear transformations. We do indeed have a rich set of covariates, including dynamic covariates, but choosing between them, and between transformations of the response, still amounts to model building in the usual fashion. In some cases this may suggest simpler forms for the effects of covariates, such as the semiparametric models that are described by Pipper and Scheike, and by Molenberghs and Verbeke. The computationally undemanding nature of our dynamic modelling approach makes it easier in practice to pay more attention to model building, and to model criticism.

    We must refute the claim by Scheike and Pipper that predictable censoring implies that the time of drop‐out is known at the start of the study. Although we do assume that R(t+1) becomes known at some point strictly before time t+1, we do not assume that this must be at time t. More formally, we do not insist that rRt=rRt−1, and note that not just drop‐out, but anything that happens by time t−1, can influence drop‐out at time t.

    Cox wonders whether our estimates may behave poorly if, in fact, drop‐out is (nearly) uninformative. Though we do make untestable assumptions about the drop‐out process, a further advantage to not modelling it explicitly is that we avoid the problems that he mentions when a joint model collapses to the singular case of separate analyses.

    Zeng and Lin provide an elegant approach to inference. They give a closed form estimator for the variance of inline image, which would also combine naturally with Lunn's suggestion of a weighted least squares estimate of B(t). The fact that both these extensions would require only minimal effort to implement in standard statistical software is particularly pleasing to us, and we wonder whether the analogy with estimating equations and weighted least squares could be further exploited to extend the increments approach to non‐continuous and unbalanced data. Molenberghs and Verbeke point out that such extensions must recognize that unbalanced data further subdivide according to whether or not the observation times are genuinely stochastic, the former case being the more problematic since observation times may themselves be informative; see, for example, Lin et al. (2004).

    Pipper and Scheike, and Solis‐Trapala comment that the inclusion of measurement error in our model is ambiguous and unclear. Though Sousa supplies the intuition behind the lagged error appearing in our equation (10), there is a sense in which the error term is ambiguous. Essentially, its only purpose is to show that it can safely be ignored; the slightly awkward treatment is required because uncorrelated error is not a martingale. We hope that, in future, a less cumbersome treatment of the error term may be devised.

    Carpenter suggests that even more general non‐parametric random effects may be fitted by considering models for second‐order differences of the form
    image

    It is certainly true that the random effects could now be any process whose differences are a martingale (e.g. a random slope). However, unlike when introducing first‐order differences, there is no corresponding gain in simplicity of treatment of drop‐outs; in fact, treating drop‐out becomes slightly more difficult. To make sense of such a model we would need to define Y(T+k)=Y(T)+k{Y(T)−Y(T−1)} so that post‐drop‐out ‘increments’ remain zero. Further, examining higher order increments can actually conceal structure in the data. As pointed out by Harrison (1973) in a classical time series setting, higher order differencing can be a very blunt instrument.

    Taylor observes that modelling increments in discrete time is analogous to modelling slopes in continuous time. He cites a model where random effects on a slope form an Ornstein–Uhlenbeck process, leading to an integrated Ornstein–Uhlenbeck process on the responses. The more general point is that, by modelling response increments with a certain residual structure, we gain a smoother residual structure on the responses, which is often an appealing feature for longitudinal data. As Cox notes, it is imperative that we give serious consideration to the form of the random effects, especially when used over extended periods. That strong implications are associated with a martingale structure is no less true of alternatives such as the random intercept and slope.

    Lunn's concern that she has misread our intentions in the simulation study is unfounded: the inconsistency in the drop‐out mechanism arises simply from having different authors responsible for the two scenarios! However, we believe that the increases in length in the box plots are due mainly to drop‐outs, and not to increased variability in the random effects (as she supposes). We share Taylor's surprise at both the efficiency of our approach and the inefficiency of inverse probability weighting, but we do not yet have sufficient experience with these methods to suggest why, or even whether, this is generally the case.

    Appendix

    Appendix A: Fitting dynamic linear models by using standard software

    Least squares equations can be solved, and hence our proposed models fitted, in virtually all software for statistical computing. We note, reflecting our own computing preferences, that this is particularly straightforward by using the lmList command from the nlme package (Pinheiro and Bates, 2000) in R or S‐PLUS. For example, to fit the dynamic linear models of Section 4 to the schizophrenia data, we constructed a data frame schizophrenia, having columns i (a unique identifier), time (running from 1 to Ti for each i), treat (a factor indicating the treatment regime) and PANSS. This last column stores the change in PANSS that is associated with the given subject and time point, i.e. it contains ΔYi(1),…,ΔYi(Ti) for every i. Then
    image
    returns an object containing a list of estimates inline image of β(t) for each t ∈ rT, which may be extracted by way of the coef method. The cumulative sum of these estimates
    image
    yields inline image. Additionally, estimated standard errors
    image
    can be extracted from the fitted model if measurement error is thought to be negligible. These estimates (squared) may be summed
    image
    to yield an estimate of inline image without the need for bootstrapping.

      Number of times cited according to CrossRef: 30

      • The influence of a liner on deep bulk-fill restorations: Randomized clinical trial, Journal of Dentistry, 10.1016/j.jdent.2020.103454, (103454), (2020).
      • A Birnbaum-Saunders Model for Joint Survival and Longitudinal Analysis of Congestive Heart Failure Data, Revista Colombiana de Estadística, 10.15446/rce.v43n1.77851, 43, 1, (83-101), (2020).
      • An Alternative Sensitivity Approach for Longitudinal Analysis with Dropout, Journal of Probability and Statistics, 10.1155/2019/1019303, 2019, (1-10), (2019).
      • Can Personality Predict Longitudinal Study Attrition? Evidence from a Population-Based Sample of Older Adults, Journal of Research in Personality, 10.1016/j.jrp.2018.10.002, (2018).
      • Methods for handling longitudinal outcome processes truncated by dropout and death, Biostatistics, 10.1093/biostatistics/kxx045, 19, 4, (407-425), (2017).
      • Estimating the treatment effect on the treated under time‐dependent confounding in an application to the Swiss HIV Cohort Study, Journal of the Royal Statistical Society: Series C (Applied Statistics), 10.1111/rssc.12221, 67, 1, (103-125), (2017).
      • Does intensive management improve remission rates in patients with intermediate rheumatoid arthritis? (the TITRATE trial): study protocol for a randomised controlled trial, Trials, 10.1186/s13063-017-2330-8, 18, 1, (2017).
      • Multiple Imputation of Missing Composite Outcomes in Longitudinal Data, Statistics in Biosciences, 10.1007/s12561-016-9146-z, 8, 2, (310-332), (2016).
      • Can we believe the DAGs? A comment on the relationship between causal DAGs and mechanisms, Statistical Methods in Medical Research, 10.1177/0962280213520436, 25, 5, (2294-2314), (2016).
      • The Association of Levels of and Decline in Grip Strength in Old Age with Trajectories of Life Course Occupational Position, PLOS ONE, 10.1371/journal.pone.0155954, 11, 5, (e0155954), (2016).
      • Handling non-ignorable dropouts in longitudinal data: a conditional model based on a latent Markov heterogeneity structure, TEST, 10.1007/s11749-014-0397-z, 24, 1, (84-109), (2014).
      • Sample size calculation for the proportional hazards model with a time-dependent covariate, Computational Statistics & Data Analysis, 10.1016/j.csda.2014.01.018, 74, (217-227), (2014).
      • Generalized Estimating Equations in Longitudinal Data Analysis: A Review and Recent Developments, Advances in Statistics, 10.1155/2014/303728, 2014, (1-11), (2014).
      • Introduction to the special issue on joint modelling techniques, Statistical Methods in Medical Research, 10.1177/0962280212445800, 23, 1, (3-10), (2014).
      • A Smoothing Dynamic Model for Irregularly Time-Spaced Longitudinal Data, Journal of Biopharmaceutical Statistics, 10.1080/10543406.2014.901339, 24, 4, (944-957), (2014).
      • Randomised controlled trial of Tumour necrosis factor inhibitors Against Combination Intensive Therapy with conventional disease-modifying antirheumatic drugs in established rheumatoid arthritis: the TACIT trial and associated systematic reviews, Health Technology Assessment, 10.3310/hta18660, 18, 66, (1-164), (2014).
      • Analysing censored longitudinal data with non‐ignorable missing values: depression in older age, Journal of the Royal Statistical Society: Series A (Statistics in Society), 10.1111/j.1467-985X.2011.01034.x, 176, 2, (415-430), (2012).
      • Statistical methods in randomised controlled trials for delirium, Journal of Psychosomatic Research, 10.1016/j.jpsychores.2012.06.002, 73, 3, (197-204), (2012).
      • Survival Analysis, A Handbook of Applied Statistics in Pharmacology, 10.1201/b13152, (143-149), (2012).
      • A randomized placebo-controlled trial of methotrexate in psoriatic arthritis, Rheumatology, 10.1093/rheumatology/kes001, 51, 8, (1368-1377), (2012).
      • Bibliography, Joint Models for Longitudinal and Time-to-Event Data, 10.1201/b12208-12, (239-255), (2012).
      • Causality, mediation and time: a dynamic viewpoint, Journal of the Royal Statistical Society: Series A (Statistics in Society), 10.1111/j.1467-985X.2011.01030.x, 175, 4, (831-861), (2012).
      • Joint modelling of longitudinal outcome and interval‐censored competing risk dropout in a schizophrenia clinical trial, Journal of the Royal Statistical Society: Series A (Statistics in Society), 10.1111/j.1467-985X.2011.00719.x, 175, 2, (417-433), (2011).
      • A particular diffusion model for incomplete longitudinal data: application to the multicenter AIDS cohort study, Biostatistics, 10.1093/biostatistics/kxq079, 12, 3, (493-505), (2011).
      • Using linear increment models for the imputation of missing composite outcomes in randomized trials, Trials, 10.1186/1745-6215-12-S1-A60, 12, Suppl 1, (A60), (2011).
      • A dynamic approach for reconstructing missing longitudinal data using the linear increments model, Biostatistics, 10.1093/biostatistics/kxq014, 11, 3, (453-472), (2010).
      • Longitudinal perspectives on event history analysis, Lifetime Data Analysis, 10.1007/s10985-009-9137-1, 16, 1, (102-117), (2009).
      • A martingale residual diagnostic for longitudinal and recurrent event data, Lifetime Data Analysis, 10.1007/s10985-009-9129-1, 16, 1, (118-135), (2009).
      • Assessing quality of life in a randomized clinical trial: Correcting for missing data, BMC Medical Research Methodology, 10.1186/1471-2288-9-28, 9, 1, (2009).
      • A new method for analysing discrete life history data with missing covariate values, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 10.1111/j.1467-9868.2007.00644.x, 70, 2, (445-460), (2008).