Analysis of longitudinal data with drop‐out: objectives, assumptions and a proposal
Abstract
Summary. The problem of analysing longitudinal data that are complicated by possibly informative drop‐out has received considerable attention in the statistical literature. Most researchers have concentrated on either methodology or application, but we begin this paper by arguing that more attention could be given to study objectives and to the relevant targets for inference. Next we summarize a variety of approaches that have been suggested for dealing with drop‐out. A long‐standing concern in this subject area is that all methods require untestable assumptions. We discuss circumstances in which we are willing to make such assumptions and we propose a new and computationally efficient modelling and analysis procedure for these situations. We assume a dynamic linear model for the expected increments of a constructed variable, under which subject‐specific random effects follow a martingale process in the absence of drop‐out. Informal diagnostic procedures to assess the tenability of the assumption are proposed. The paper is completed by simulations and a comparison of our method and several alternatives in the analysis of data from a trial into the treatment of schizophrenia, in which approximately 50% of recruited subjects dropped out before the final scheduled measurement time.
1. Introduction
Our concern in this paper is with longitudinal studies in which a real‐valued response Y is to be measured at a prespecified set of time points, and the target for inference is some version of the expectation of Y. Studies of this kind will typically include covariates X, which may be time constant or time varying. Frequently, the interpretation of the data is complicated by drop‐outs: subjects who are lost to follow‐up before completion of their intended sequence of measurements. The literature on the analysis of longitudinal data with drop‐outs is extensive: important early references include Laird (1988), Wu and Carroll (1988) and Little (1995), for which the Web of Science lists approximately 200, 170 and 300 citations respectively, up to the end of 2006.
A useful classification of drop‐out mechanisms is the hierarchy that was introduced by Rubin (1976) in the wider context of missing data. Drop‐out is missing completely at random (MCAR) if the probability that a subject drops out at any stage depends neither on their observed responses nor on the responses that would have been observed if they had not dropped out. Drop‐out is missing at random (MAR) if the probability of drop‐out may depend on observed responses but, given the observed responses, is conditionally independent of unobserved responses. Drop‐out is missing not at random (MNAR) if it is not MAR. Note that we interpret MCAR, MAR and MNAR only as properties of the joint distribution of random variables representing a sequence of responses Y and drop‐out indicators R; Little (1995) developed a finer classification by considering also whether drop‐out does or does not depend on covariates X. From the point of view of inference, the importance of Rubin's classification is that, in a specific sense that we discuss later in the paper, likelihood‐based inference for Y is valid under MAR, whereas other methods for inference, such as the original form of generalized estimating equations (Liang and Zeger, 1986), require MCAR for their validity. Note also that, if the distributional models for the responses Y and drop‐out indicators R include parameters in common, likelihood‐based inference under MAR is potentially inefficient; for this reason, the combination of MAR and separate parameterization is sometimes called ignorable, and either MNAR or MAR with parameters in common is sometimes called non‐ignorable or informative. The potential for confusion through different interpretations of these terms is discussed in a chain of correspondence by Ridout (1991), Shih (1992), Diggle (1993) and Heitjan (1994).
Our reasons for revisiting this topic are threefold. Firstly, we argue that in the presence of drop‐outs the inferential objective is often defined only vaguely. Though there are other possibilities, the most common target is the mean response, which we also adopt. However, many possible expectations are associated with Y: in Section 2 we contend that, in different applications, the target may be one of several unconditional or conditional expectations. We also argue that in all applications careful thought needs to be given to the purpose of the study and the analysis, with recognition that drop‐out leads to missing data but should not be considered solely as an indicator of missingness. The common notation Y=(Yobs,Ymiss) blurs this distinction. The complexity of some of the models and methods that are now available in the statistics literature may obscure the focus of a study and its precise objective under drop‐out. For this reason, we use as a vehicle for discussion the very simple setting of a longitudinal study with only two potential follow‐up times and one drop‐out mechanism. A second but connected issue is that the assumptions underlying some widely used methods of analysis are subtle; Section 3 provides a discussion of these assumptions and an overview of the development of some of the important methodology. We discuss what can and cannot be achieved in practice, again by using the two‐time‐point scenario for clarity. Our third purpose in this paper is to offer in Section 4 an approach that is based on dynamic linear models for the expected increments of the longitudinal process. The assumptions on which we base our models are easily stated and doubly weak: weak with respect to both longitudinal and drop‐out processes. None‐the‐less, all methods for dealing with missing data require, to some extent, untestable assumptions, and ours is no exception. However, we are willing to make such assumptions in the following circumstances. Firstly, the targets for inference are parameters of a hypothetical drop‐out‐free world that describes what would have happened if the drop‐out subjects had in fact continued. Secondly, any unexplained variability between subjects exhibits a certain stability before drop‐out. Thirdly, such stability is maintained beyond each drop‐out time by the diminishing subset of continuing subjects.
The first point is discussed in Section 2 and the ‘stability’ requirement of the next two points is defined formally in Section 4 as a martingale random‐effects structure. Section 4 also presents graphical diagnostics and an informal test procedure for critical assessment of this property. Our methods are quite general but for discussion purposes we return to the two‐time‐point scenario in Section 5, before demonstrating the methods through simulations in Section 6. Section 7 describes a comparative analysis of data from a trial into the treatment of schizophrenia. The paper closes with brief discussion in Section 8. Appendix A describes an implementation of our proposal in the S language.
Our topic can be regarded as a special case of a wider class of problems concerning the joint modelling of a longitudinal sequence of measured responses and times to events. Longitudinal data with drop‐out can formally be considered as joint modelling in which the time to event is the drop‐out time as, for example, in Henderson et al. (2000). In Section 7, we reanalyse the data from their clinical example to emphasize this commonality and to illustrate our new approach. For recent reviews of joint modelling, see Hogan et al. (2004) or Tsiatis and Davidian (2004).
Under our new approach, estimators are available in closed form and are easily interpretable. Further, estimation is computationally undemanding, as processing essentially involves a least squares fit of a linear model at each observation time. This is in contrast with many existing approaches to drop‐out prone data where, in our experience, the computational load of model fitting can be a genuine obstacle to practical implementation when the data have a complex structure and there is a need to explore a variety of candidate models.
2. Inferential objectives in the presence of drop‐out
As indicated in Section 1, we consider in this section a study involving a quantitative response variable Y, which can potentially be measured at two time points t=1,2 but will not be measured at t=2 for subjects who drop out of the study. We ignore covariate effects and focus on estimation of μt=E(Yt), though similar arguments apply to the full distributions of the response variables. We emphasize that this simple setting is used only to illustrate underlying concepts without unnecessary notational complication. The general thrust of the argument applies equally to more elaborate settings.
(1)The parameter μ1 is the population mean at time 1. Writing down model (1) invites a similar interpretation for μ2. In fact, the apparently straightforward adoption of model (1) brings with it some interesting but usually unstated or ignored issues.
(2)In expression (2), E(Z1)=E(Z2a)=E(Z2b)=0, 𝒮 denotes a set of conditioning variables and we allow π(·) to depend arbitrarily on 𝒮. We make no assumption of independence between Z1,Z2a and Z2b, and for the unconditional case 𝒮=Ø we write π=π(Ø)=P(R=0). By construction, the parameters μ1, μ2a and μ2b are the marginal expectations of Y1, Y2a and Y2b respectively.
In the context of longitudinal data with drop‐outs, subjects with R=1 are the completers, who are denoted group 𝒞. For each completer, Y1, Y2a and R are observed and have the obvious interpretations as the responses at times 1 and 2 together with an indicator of response, whereas Y2b is an unobserved counterfactual, representing the value of the response that would have been observed if the subject had in fact dropped out.
The drop‐outs, group 𝒟, are those subjects who have R=0. These subjects experience the event of dropping out of the study, which in different contexts may mean discontinuation of treatment, cessation of measurement or both. If drop‐out refers only to the discontinuation of treatment, then Y2b is the observed response at time 2, and Y2a the counterfactual that would have been observed if the subject had continued treatment. This situation, where drop‐out does not lead to cessation of measurement, is one which we discuss no further. Throughout the remainder of the paper, we are concerned with the case when R=0 does correspond to cessation of measurement, and consequently neither Y2a nor Y2b is observed for any subject in group 𝒟. In this case, Y2b is the extant, but unobserved, longitudinal response at time 2 and Y2a is the counterfactual that would have been observed if the subject in question had not dropped out.
In this framework we make explicit the possibility that the act of dropping out can influence the response, rather than simply lead to data being missing. In other words, we separate the consequence of dropping out from the observation of that consequence. At least conceptually, the events ‘avoiding drop‐out’ and ‘observing Y2a’ are considered to be distinct.
The above is reminiscent of the usual framework for causal inference, as described for instance by Rubin (1991, 2004), in which R would be a binary treatment assignment or other intervention indicator. However, there are three important differences. The most obvious is that with drop‐out we never observe Y2b, whereas in causal inference it would be observed for each subject in group 𝒟. The second difference is that, assuming no initial selection effect, in the longitudinal setting we observe Y1 for all subjects, and this can be exploited in inference through assumed or estimated relationships between responses before and after drop‐out. The third difference is that we assume R to be intrinsic to the subject rather than an assigned quantity such as treatment, and between‐subject independence is sufficient for us to avoid the need to discuss assignment mechanisms.
In particular applications we need to consider the scientific objective of the study and consequent target for inference. At time t=1 we can easily estimate μ1=E(Y1) by standard techniques. Our focus will be the target for estimation at time t=2, which we assume can be expressed as some property of a random variable Y2, typically E(Y2). We discuss this within the specific setting of model (2).
2.1. Objective 1: realized second response
(3)In contrast, the data that we analyse in Section 7 come from a longitudinal randomized clinical trial of drug treatments for schizophrenia, in which drop‐out implies discontinuation of the assigned drug and the response could have been (but in fact was not) measured after drop‐out. In this setting, Y2 as defined at expression (3) is readily interpretable as the intention‐to‐treat response.
2.2. Objective 2: conditional second response

Only complete cases, group 𝒞, contribute to inference, which is therefore always conditional on R=1. This is perfectly proper if the objective is to study the response within the subpopulation of subjects who do not drop out.
In the schizophrenia example, some subjects were removed from the study because their condition did not improve. Objective 2 would therefore be appropriate in this context if interest were confined to the subset of subjects who had not yet been removed from the study owing to inadequate response to treatment.
2.3. Objective 3: hypothetical second response

The essential difference between the interpretations of Y2 under objectives 2 and 3 is between the marginal and conditional distributions of the response at time 2. This can be substantial, as would be the case if, for example, drop‐out occurs if and only if Z2a<0. This might seem an extreme example, but it could never be identified from the observed data.
It is important that the objectives be clearly stated and understood at the outset of a study, especially for regulatory purposes. There are similarities with distinguishing intention‐to‐treat and per‐protocol analyses (Sommer and Zeger, 1991; Angrist et al., 1996; Little and Yau, 1996; Frangakis and Rubin, 1999) and with causal inference in the presence of missing data or non‐compliance quite generally (Robins, 1998; Peng et al., 2004; Robins and Rotnitzky, 2004). The hypothetical second response Y2a will be our inferential target for the analysis that we present in Section 7 for the schizophrenia data. We argue that in this setting, where drop‐out need not be related to an adverse event, clinical interest genuinely lies in the hypothetical response that patients would have produced if they had not dropped out. This is likely to be of greater value than the realized or conditional second responses, since treatment performance is of more concern than subject profiles. We emphasize, however, that this need not always be so, and that in some circumstances a combination of objectives may be appropriate. For example, Dufoil et al. (2004) and Kurland and Heagerty (2005) separately discussed applications in which there are two causes of drop‐out: death and possibly informative loss to follow‐up. In these applications the appropriate target for inference is the response distribution in the hypothetical absence of loss to follow‐up but conditional on not dying, thus combining objectives 2 and 3. In other applications it is quite possible that a combination of all three objectives may be appropriate.
3. Approaches to the analysis of longitudinal data with drop‐out
We now illustrate in the context of model (2) some of the variety of approaches that have been proposed for the analysis of longitudinal data with drop‐out. We do not attempt a complete review (see Hogan and Laird (1997a,b), Little (1998), Hogan et al. (2004), Tsiatis and Davidian (2004) or Davidian et al. (2005)) but hope to give a flavour of the broad classes of methods and their underlying assumptions.
3.1. Complete case


3.2. Pattern–mixture
A complete‐case analysis forms one component of a pattern–mixture approach (Little, 1993), in which we formulate a separate submodel for each of [Y1|R=0] and [Y1,Y2a|R=1], perhaps with shared parameters. From this, we can obtain valid inference for the marginal [Y1] by averaging, but again only conditional inference for [Y2a|R=1], as with complete‐case analysis. The pattern–mixture approach is intuitively appealing from the perspective of retrospective data analysis, in which context it is natural to compare response distributions in subgroups that are defined by different drop‐out times. From a modelling perspective it is also natural if we regard the distribution of R as being determined by latent characteristics of the individual subjects. In its most general form, the pattern–mixture approach is less natural if we regard drop‐out as a consequence of a subject's response history, because it allows conditioning on the future. However, Kenward et al. (2003) discussed the construction of pattern–mixture specifications that avoid dependence on future responses.
3.3. Imputation methods
Imputation methods implictly focus on objective 3, sometimes adding the assumption that Y2a=Y2b, in which case objectives 1 and 3 are equivalent.
3.3.1. Last observation carried forward
, the implied estimator for the mean response at time 2 is
, where
is the mean at time 1 for group 𝒟. The estimator is consistent for

3.3.2. Last residual carried forward

(4)
will be closer to μ2a than the expectation of
, which is a desirable shift from the complete‐case estimand if μ2a is the target for inference.
For these reasons the last residual carried forward method must be preferable to the LOCF approach as a means of overcoming potentially informative drop‐out, but in our opinion it does not provide an adequate solution to the problem. We describe it here principally to highlight two important points. Firstly, the unspoken question underlying the estimator (4) is ‘how unusual were the completers at time 1?’. If they were unusual, then we presume that this may also have been true at time 2, and consequently adjust the observed time 2 average accordingly. Second, this adjustment is downweighted by a factor
. We observe, anticipating results in Section 4, that in our hypothetical drop‐out‐free universe π=0, suggesting the estimator
as another candidate.
3.3.3. Multiple imputation
One of several possible criticisms of both the LOCF and the last residual carried forward methods is that, at best, they ignore random variation by imputing fixed values. Hot deck imputation addresses this by sampling post‐drop‐out values from a distribution; in principle, this could be done either by sampling from an empirical distribution, such as that of the observed values from other subjects who did not drop out but had similar values of available explanatory variables, or by simulating from a distributional model. Multiple‐imputation methods (Rubin, 1987) take this process one step further, by replicating the imputation procedure to enable estimation of, and if necessary adjustment for, the component of variation that is induced by the imputation procedure.
3.4. Missing at random: parametric modelling


The factorization [Y,R]=[R|Y][Y] is usually called a selection model (e.g. Michiels et al. (1999)), although we prefer the term selection factorization, to contrast with the pattern–mixture factorization [Y,R]=[Y|R][R], and to emphasize the distinction between how we choose to model the data and how we subsequently conduct data analysis.
(5)
(6)
as being appropriate when within‐subject variability is small (ρ→1).
Parametric modelling under the combined assumption of MAR drop‐out and separate parameterization has the obvious attraction that a potentially awkward problem can be ignored and likelihood‐based inference using standard software is straightforward. A practical concern with this approach is that the ignorability assumption is untestable without additional assumptions. A more philosophical concern arises if, as is usually so, the data derive from discrete time observation of an underlying continuous time process. In these circumstances, it is difficult to imagine any mechanism, other than adminstrative censoring, under which drop‐out at time t could depend on the observed response at time t−1 but not additionally on the unobserved response trajectory between t−1 and t.
3.5. Missing at random: unbiased estimating equations
If interest is confined to estimating μ2a, or more generally covariate effects on the mean, then an alternative approach, which is still within the framework of MAR drop‐out, is to model π(Y1) but to leave [Y1,Y2a] unspecified.
of drop‐out probability, often via a logistic model. The marginal mean of Y2a can now be estimated consistently by using a weighted average of the observed Y2a, where the weights are the inverse probabilities of observation (Horvitz and Thompson, 1952; Robins et al., 1995):
(7)Use of equation (7) requires
to be strictly positive for all subjects, and it encounters difficulties in practice if this probability can be close to 0. This will not often be a material restriction within the current simplified setting, but it can be problematic in more complex study designs with high probabilities of drop‐out in some subgroups of subjects.
3.6. Missing not at random: Diggle–Kenward model
(8)3.7. Missing not at random: random effects

For maximum likelihood estimation for the simple model above, the shared effect U can be treated as missing data and methods such as the EM or Markov chain Monte Carlo algorithms used, or the marginal likelihood can be obtained by numerical integration over U, and the resulting likelihood maximized directly. Implementation is computationally intensive, even for this simple example, and there is again no closed form for
.
Models of this kind are conceptually attractive, and parameters are identifiable without any further assumptions. But, as with the Diggle–Kenward model, the associated inferences rely on distributional assumptions which are generally untestable. Furthermore, in our experience the computational demands can try the patience of the statistician.
3.8. Missing not at random: unbiased estimating equations
A random‐effects approach to joint modelling brings yet more untestable assumptions and we can never be sure that our model is correct for the unobserved data, although careful diagnostics can rule out models that do not even fit the observed data (Dobson and Henderson, 2003). Rotnitzky et al. (1998), in a follow‐up to Robins et al. (1995), argued strongly for a more robust approach, on the assumption that the targets for inference involve only mean parameters. They again left the joint distribution of responses unspecified but now modelled the drop‐out probability as a function of both Y1 and Y2a, e.g. by the logistic model (8). As applied within the simple framework of model (2), the most straightforward version of the procedure of Rotnitzky et al. (1998) is two stage: first, estimate the drop‐out parameters from an unbiased estimating equation; second, plug drop‐out probability estimates into another estimating equation.
(9)
. Since we need only π(Y1,Y2a) in the fully observed group, all components of equation (9) are available, and for estimation there is no need for assumptions about Y2b. Assumptions would, however, be needed for estimands to be interpretable. Rewriting equation (9) as

At the second stage, the newly obtained estimated drop‐out probabilities are plugged into an inverse‐probability‐weighted estimating equation to give

Rotnitzky et al. (1998) indicated that efficiency can be improved by augmenting the estimating equation for μ2a by a version of equation (9) (with a different φ) and simultaneously solving both equations for all parameters. Fixed weight functions may also be introduced as usual. They also argued that estimation of the informative drop‐out parameter γ0 will be at best difficult and that the validity of the drop‐out model cannot be checked if γ0≠0. Their suggestion is that γ0 be treated as a known constant but then varied over a range of plausible values to assess sensitivity of inferences for other parameters to the assumed value of γ0.
Carpenter et al. (2006) compared inverse probability weighting (IPW) methods with multiple imputation. In particular, they considered a doubly robust version of IPW, which was introduced by Scharfstein et al. (1999) in their rejoinder to the discussion and which gives consistent estimation for the marginal mean of Y2a provided that at most one of the models for R or for Y2a is misspecified. Their results show that doubly robust IPW outperforms the simpler version of IPW when the model for R is misspecified, and outperforms multiple imputation when the model for Y2a is misspecified.
3.9. Sensitivity analysis
Rotnitzky et al. (1998) are not the only researchers to suggest sensitivity analysis in this context. Other contributions include Copas and Li (1997), Scharfstein et al. (1999, 2003), Kenward (1998), Rotnitzky et al. (2001), Verbeke et al. (2001), Troxel et al. (2004), Copas and Eguchi (2005) and Ma et al. (2005).
Sensitivity analysis with respect to a parameter that is difficult to estimate is clearly a sensible strategy and works best when the sensitivity parameter is readily interpretable in the sense that a subject‐matter expert can set bounds on its reasonable range; see, for example, Scharfstein et al. (2003). In that case, if the substantively important inferences show no essential change within the reasonable range, all is well. Otherwise, there is some residual ambiguity of interpretation.
Most parametric approaches can also be implemented within a Bayesian paradigm. An alternative to a sensitivity analysis is then a Bayesian analysis with a suitably informative prior for γ0.
3.10. Conclusions
Existing approaches to the analysis of longitudinal data subject to drop‐out may, if only implicitly, be addressing different scientific or inferential objectives. In part this may be because methods and terminology that are designed for general multivariate problems with missing data do not explicitly acknowledge the evolution over time of longitudinal data. In the next section we offer an alternative, which we believe is better suited to the longitudinal set‐up and which borrows heavily from event history methodology. We consider processes evolving in time and propose a martingale random‐effects model for the longitudinal responses, combined with a drop‐out mechanism that is allowed to depend on both observed and unobserved history, but not on the future. The martingale assumption formalizes the idea that adjusting for missing data is a defensible strategy provided that subjects’ longitudinal response trajectories exhibit stability over time. Our drop‐out model is formally equivalent to the independent censoring assumption that is common in event history analysis; see, for example, Andersen et al. (1992). We do not claim that the model proposed is universally appropriate nor suggest that it be adopted uncritically in any application. We do, however, offer some informal diagnostic procedures that can be used to assess the validity of our assumptions.
4. Proposal
4.1. Model specification
4.1.1. Longitudinal model
We suppose that τ measurements are planned on each of n independent subjects. The measurements are to be balanced, i.e. the intended observation times are identical for each subject, and without loss of generality we label these times 1,…,τ. For the time being, let us suppose that all n subjects do indeed provide τ measurements. In the notation of Section 2, Ya is therefore observed for every subject at every observation time, and Yb is counterfactual in every case.
We presume that covariates are also available before each of the τ observation times. These we label Xa, noting that in theory there are also counterfactual covariates Xb: the values of covariates if a subject had dropped out. We understand Xa to be an n×p matrix process, which is constant if only base‐line covariates are to be used, but potentially time varying and possibly even dependent on the history of a subject or subjects. Note that we shall write Xa(t) for the particular values at time t, but that by Xa without an argument we mean the entire process, and we shall follow this same convention for other processes.
At each observation time t we acknowledge that the underlying hypothetical response may be measured with zero‐mean error ɛa(t). We assume that this process is independent of all others and has the property that ɛa(s) and ɛa(t) are independent unless s=t. We make no further assumptions about this error process, and in particular we do not insist that its variance is constant over time.


We argue that the expected increments in Ya are a natural choice for statistical modelling. Asking ‘What happened next?’ allows us to condition on available information such as the current values of covariates and responses. Later, it will also be useful to condition on the presence or absence of subjects.

(10)
, with B(0)=0, the transform of B by Xa, denoted Xa·B, is given by

The residual process is Ma=Ya−Xa·B−ɛa. This process has a property that makes it a kind of random walk: it takes zero‐mean steps from a current value to a future value. More formally, for s t we have that E{Ma(t)|rGs}=Ma(s), and the process is thus a martingale. Model (10) may therefore be appropriate when, having accounted for fixed effects and measurement error, the random effects can be modelled as a martingale.
Although their conditional mean properties may seem restrictive, martingales represent, from the modeller's perspective, a wide range of processes. Neither continuity nor distributional symmetry is required of Ma, and for our purposes its variance need only be constrained to be finite. Further, the variance of the martingale increments may change over time. Serial correlation in the Ma‐process induces the same in the Ya‐process, which is often a desirable property in models for longitudinal data.

The sample vector of martingale random effects is free to be, among other things, heteroscedastic, where the variance of a martingale may change over time and between subjects, and completely non‐parametric, since the distribution of a martingale need not be specified by a finite dimensional parameter. We reiterate, however, that martingale residuals impose a condition on the mean of their distribution given their past. This single condition, of unbiased estimation of the future by the past, is sufficiently strong to be easily dismissed in many application areas—though we note that this can often be overcome by suitable adjustment of the linear model. It seems to us that in many applications an underlying martingale structure seems credible, at least as a first approximation. We reiterate that the linear model may be adapted to include summaries of previous longitudinal responses if appropriate. Including dynamic covariates, e.g. summaries of the subject trajectories to date, may sometimes render the martingale hypothesis more tenable, although the interpretation of the resulting model is problematic if observed trajectories are measured with appreciable error.
We have shown that models for the hypothetical response Ya can be defined in terms of linear models on its increments, and that such models are quite general. At no extra cost, these comprise subject‐specific, martingale random effects. We do not discuss in detail the full generality of this approach; instead, we now turn to the problem of drop‐out.
4.1.2. Drop‐out model
Unfortunately, not all the hypothetical longitudinal responses Ya are observed. Rather, subject i gives rise to 1 Ti τ measurements, i.e. we observe Yai(1),…,Yai(Ti). Although both the hypothetical responses Yai(Ti+1),…,Yai(τ) and the realized responses Ybi(Ti+1),…,Ybi(τ) go unobserved, we restrict our assumptions to the former.
We can also consider drop‐out as a dynamic process. Let Ri denote an indicator process that is associated with subject i, with Ri(t)=1 if subject i is still under observation at time t, and Ri(t)=0 otherwise. We let rRt be the history of these indicator processes up to time t. We do not distinguish between competing types of drop‐out, for instance between administrative censoring, treatment failure or death, because we need not do so to make inferences regarding the hypothetical responses Ya.
Like the covariate processes, we assume that the drop‐out processes are predictable, in the sense that Ri(t) is known strictly before time t. More formally, we shall denote by rRt− the information that is available about drop‐out before time t, and assume that Ri(t) ∈ rRt−. Although in this instance it follows that rRt−=rRt, it is useful to distinguish notationally between information that is available at these different points in time. We think of Ri as a process in continuous time, but in practice we are only interested in its values at discrete time points. Predictability is a sensible philosophical assumption, disallowing the possibility that drop‐out can be determined by some future, unrealized, event. Note that this does not preclude the possibility that future events might depend on past drop‐out.


We emphasize that independent censoring is a weaker assumption than sequential MAR drop‐out, since the former conditions on the complete past, and not just the observed past, and so allows drop‐out to depend directly on latent processes. Moreover, it is a statement about conditional means, whereas the assumption of sequential missingness at random concerns conditional distributions.
Having laid out our assumptions concerning the drop‐out process, we make a few comments on what has not been assumed. We have not specified any model, parametric or otherwise, for the drop‐out process. Consequently, the drop‐out process may depend on any aspect of the longitudinal processes, e.g. group means, subject‐specific time trends or within‐subject instability. The only requirement is that this dependence is not on the future behaviour of Ya. Though often plausible, this is usually untestable.
4.1.3. Combined model




(11)4.2. Model fitting
4.2.1. Estimation


. This leads to the estimator
of B that is given by
(12)Thus we set
, the transform of Y by X−. So defined,
is an estimator of B on rT; specifically, it estimates BrT=1rT·B, and there may be some small bias in estimating B. Estimation of BrT is reasonable in the present context of varying sample sizes and covariates, and is, in fact, all that can be expected of a non‐parametric technique. Without parametric interpolation, there may be time points about which the data can say nothing.
is unbiased for 1rT(t) β(t):

Therefore,
is unbiased for BrT. What we have done is to mimic Aalen's unbiased estimator, and to show that measurement error does not affect this unbiasedness.
The estimator
is essentially a moment‐based estimator of B. It sums the least squares estimates of β based on the observed increments. Crucially, nowhere do we require Y and R to be independent. We rely on an assumption that hypothetical random effects are martingales, and if this assumption breaks down then so does unbiasedness. Each surviving subject is thought to have a mean 0 step in their random effects; non‐zero expected increments in the random effects cannot be distinguished from a change in population mean.
4.2.2. Inference
Inference is discussed in Farewell (2006). Estimators of the finite sample and asymptotic variances of
are not so readily derived as in the corresponding theory of event history analysis. Counting processes behave locally like Poisson processes (Andersen et al., 1992), having equal mean and variance, but this result does not hold in generality. Moreover, error ɛa in the measurement of the hypothetical variable leads to negatively correlated increments in
and results in a complex pattern of variability. However, computing time occupied by parameter estimation is negligible, so we recommend the use of the bootstrap for inference about B. Farewell (2006) provides a result that
is √n consistent for B with a Gaussian limiting distribution. He also gives an approximation that, in the absence of measurement error, justifies a simple calculation using OLS regression, as outlined in Appendix A. In the application to follow, we use the bootstrap distribution for
.
4.3. Diagnostics

against fitted values or covariates, should reveal systematic misspecifications of the model for the mean response but need not show the usual random scatter since we do not assume homogeneity of variances, either between or within subjects.
One simple diagnostic that is tailored to the martingale assumption is a scatterplot of increments in the residuals,
, against
. In the absence of measurement error, a plot of this kind should show no relationship. Substantial measurement error would induce a negative association, in which case the fit would be improved by including
as a covariate at time t.
(13)This result is easily proved, since martingales have uncorrelated increments and the errors ɛ are mutually independent. The point about equation (13) is that the empirical version of the left‐hand side can be evaluated at each measurement time, whereas the expression on the right‐hand side shows that the corresponding theoretical quantity is constant over time. Hence, a plot of
against t has diagnostic value, with departures from a straight line with zero slope indicating unsuitability of model (11).

for the final value that is assumed by the process
, we have in particular that

, and for large n the approximation
(14)4.4. Summarizing remarks



Appendix A illustrates how this can be implemented by using standard statistical software.
5. Simple example revisited


(15)
(16)
(17)
(18)
(19)Consider now the assumptions that lead to the unbiasedness of
. Equation (15) is unremarkable; equation (16) is for the possibly counterfactual drop‐out‐free response Y2a, as we have argued for objective 3. The zero‐mean assumptions in condition (17) are needed to give μ1 and μ2a interpretations as drop‐out‐free population means, which are the parameters of interest. Note, though, that we do not require M1 and M2a to be independent. Equation (18) provides our key assumption, that the subject‐specific random effects have zero‐mean increments, conditional on that subject's observed history. It is this assumption that we test with our diagnostic in Section 4.3. An untestable consequence of equation (18), taken together with condition (17), is that the subject‐specific random effects also have zero‐mean increments conditional on dropping out.
(20)None of these examples are missingness at random models, since in every case π(Y1,Y2a)≠π(Y1). Notwithstanding this comment, in the first two examples we have drop‐out probability depending only on the most recent random effect M1. In this sense our assumptions are similar to sequential missingness at random (Hogan et al., 2004), with the additional assumption of martingale random effects. Nevertheless, and as the third example illustrates, it is possible to construct a variety of models for which π(M1,Δ)≠π(M1) yet condition (20) remains true.
6. Simulations
We demonstrate the use of the covariance diagnostics in two simulation studies. Pitting a martingale random‐effects process against a popular non‐martingale alternative, we report the estimated power and type I error rates of the informal test (14) and illustrate the suggested covariance plots.
6.1. Scenario 1
The first simulation scenario mimics the schizophrenia example that is to be considered in Section 7, though with just one treatment group and so no covariates. Measurements are scheduled at weeks (w1,…,w6)=(0,1,2,4,6,8).

Then Sa is a random intercept and slope process, of the kind that was described by Laird and Ware (1982), whereas Ma is a martingale. We take
and
and choose the variances of the further values to ensure that V{Sa(t)}=V{Ma(t)}. This set‐up allows us to compare these two types of random‐effects process with, as far as is possible, all else being equal.

, and independence between time points. The probabilities of drop‐out between times t and t+1 are logistic with linear predictors αt+γt a(t) and αt+γt a(t) for YS and YM respectively.
,
,
and

This led to about 50% drop‐out in each model, spread over time points 2–5, with only about 1% of subjects dropping out after just one observation. Each data set was analysed by using our linear increments (LI) approach, an IPW estimating equation approach and by fitting a multivariate normal distribution with unstructured within‐subject covariance matrix (method UMN). Both the IPW and the UMN methods included the incorrect assumption that drop‐out is MAR. For IPW we used response at time t−1 as covariate in a logistic model for drop‐out at time t. No drop‐out model is needed for UMN under MAR.
Table 1 summarizes results at n=500. There was severe downward bias in the observed mean values (OLS) for each of
and
and this is only partly corrected by the misspecified IPW or UMN methods. The LI fit to YM shows no bias, as expected, and confidence interval coverage is good. The observed mean bias was improved but not removed when our method is used on YS, unsurprisingly given that the model is then also misspecified. Usually such misspecification would be detected by the diagnostics. For example, box plots of the residual covariances (Fig. 1) suggest good diagnostic power for distinguishing the models and this is confirmed by the performance of the test statistic (14), for the variance of which we used 100 bootstrap samples for each data set (Table 2).
| Method | Results for the following value of w: | |||||||
|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 2 | 4 | 6 | 8 | |||
| Y M | OLS | Mean | 0.00 | −0.30 | −2.75 | −4.34 | −10.61 | −19.41 |
| SE | 0.77 | 0.78 | 0.77 | 0.91 | 1.32 | 1.89 | ||
| IPW | Mean | −0.03 | −0.03 | −1.12 | −2.25 | −6.10 | −13.17 | |
| SE | 0.78 | 0.81 | 0.84 | 1.12 | 2.04 | 2.83 | ||
| UMN | Mean | −0.02 | −0.02 | −0.53 | −1.80 | −6.00 | −12.90 | |
| SE | 0.77 | 0.77 | 0.85 | 0.91 | 1.44 | 1.83 | ||
| LI | Mean | 0.00 | −0.02 | −0.02 | 0.01 | 0.05 | 0.02 | |
| SE | 0.77 | 0.78 | 0.89 | 0.97 | 1.55 | 2.05 | ||
| Cov (%) | 96.4 | 94.1 | 95.2 | 94.3 | 94.8 | 94.6 | ||
| Y S | OLS | Mean | −0.01 | 0.26 | −2.90 | −5.08 | −12.95 | −22.38 |
| SE | 0.79 | 0.82 | 0.83 | 1.06 | 1.11 | 1.34 | ||
| IPW | Mean | 0.01 | −0.17 | −1.25 | −2.84 | −8.06 | −15.67 | |
| SE | 0.79 | 0.82 | 0.97 | 1.16 | 1.68 | 1.83 | ||
| UMN | Mean | 0.01 | −0.15 | −0.75 | −2.38 | −7.12 | −13.45 | |
| SE | 0.79 | 0.82 | 0.89 | 1.12 | 1.16 | 1.39 | ||
| LI | Mean | −0.01 | 0.02 | −0.16 | −0.98 | −3.61 | −7.81 | |
| SE | 0.79 | 0.82 | 0.93 | 1.20 | 1.18 | 1.44 | ||
| Cov (%) | 94.8 | 95.7 | 94.1 | 85.9 | 19.8 | 0.1 | ||
- †The coverage Cov of nominal 95% confidence intervals under LI is also included. The sample size was n=500, and results were averaged over 1000 simulations.

Box plots of
based on 1000 simulations under scenario 1 at sample size n=500: (a) true martingale structure YM; (b) Laird–Ware random intercept and slope structure YS
| Scenario | Results for the following values of n: | ||||
|---|---|---|---|---|---|
| 125 | 250 | 500 | 1000 | ||
| 1 | Power | 0.307 | 0.530 | 0.766 | 0.980 |
| Type I error | 0.056 | 0.056 | 0.053 | 0.059 | |
| 2 | Power | 0.147 | 0.241 | 0.390 | 0.686 |
| Type I error | 0.056 | 0.059 | 0.045 | 0.052 | |
6.2. Scenario 2
at each time point. In the notation of Section 4, the corresponding cumulative regression functions are taken to be

. The final measurement times T1,…,Tn are determined by the relationship


The parameters were taken to be
This gave approximately 25% drop‐out, roughly evenly spread over times 2–6. Again 100 bootstrap samples were drawn to compute variances for the test statistic (14).
Mean estimates of B for sample size n=500 using both YM and YS are shown in Fig. 2, together with the true values and ±2 empirical standard errors around the YM‐estimates. Bootstrap standard errors matched the empirical values closely. Standard errors derived from asymptotic results, which avoid the need to bootstrap but at the expense of assuming negligible measurement error, were slightly conservative, overestimating typically by about 5%. As expected there was no evidence of bias for our increment‐based estimates of B based on YM. Estimates from the misspecified model for YS were also good for B2 and B3; in fact so close that the lines in the plots are hardly distinguishable. There was, however, bias for the intercept B1. Identification of the random‐effect structure through residual covariances was more difficult than for scenario 1, causing some loss of power for the test statistic (Table 2).

Summary of estimates
for scenario 2, at sample size n=500: mean dynamic estimates from YM (——) and YS (– – –) together with true values (·······)
7. Analysing data from a longitudinal trial
We now describe an application of the methods of Section 4 to data from the schizophrenia clinical trial that was introduced earlier. The trial compared three treatments: a placebo, a standard therapy and an experimental therapy. The response of interest, PANSS, is an integer ranging from 30 to 210, where high values indicate more severe symptoms. A patient with schizophrenia entering a clinical trial may typically expect to score around 90.
Of the 518 participants, 249 did not complete the trial, among whom 66 dropped out for reasons that were unrelated to their underlying condition. The remaining 183 represent potentially informative drop‐out, though we emphasize that our new approach does not need to distinguish these from the non‐informative drop‐outs. We mention them only because we shall refer to other procedures that draw such a distinction.
The goal of the study was to compare the three treatments with respect to their ability to improve (reduce) the mean PANSS‐score. The patients were observed at base‐line (t=1) and thereafter at weeks 1, 2, 4, 6 and 8 (t=2,3,4,5,6) of the study. The only covariates used here are treatment groups. The dotted curves in Fig. 3 show for reference the observed mean response at each time in each treatment group, calculated in each case from subjects who have not yet dropped out. Hence, the plotted means estimate conditional expectations of the PANSS‐score (objective 2), which are not necessarily the appropriate targets for inference.

Estimated PANSS mean values under OLS (·······) and our dynamic linear approach (——): the topmost curves correspond to the placebo group, the middle curves to the standard treatment group and the lowest curves to the experimental treatment group
Fig. 3 displays the pronounced differences between the OLS estimates and their dynamic linear counterparts. The OLS estimates invite the counter‐intuitive conclusion that, irrespective of treatment type, patients’ PANSS‐scores decrease (improve) over time. By contrast, our increment‐based estimator suggests that this is a feature of informative drop‐out, and that patients on the placebo do not improve over time; in fact, there is even a suggestion that their PANSS‐scores increase slightly. The levelling out of treatment effects over time that is seen under our new approach is also unsurprising.
In Fig. 4 and Table 3 we compare the dynamic linear fits with those which were obtained under four other approaches. Fig. 4 shows the estimated means for each treatment group whereas Table 3 gives for standard treatment the estimated mean change in response between the beginning and end of the study, together with the effect of placebo or experimental treatment on this quantity. The other approaches are as follows:

Estimated PANSS mean values for (from top to bottom pairs of curves, in every case) the placebo, standard and experimental groups (‐ ‐ ‐ ‐ ‐, estimates generated under methods (a)–(d) in the text; —–, estimates under the dynamic linear approach): (a) method UMN; (b) Dobson and Henderson's (2003) method; (c) IPW method; (d) method DYN
| Treatment | Results for the following methods: | |||||
|---|---|---|---|---|---|---|
| LI | OLS | (a) (UMN) | (b) (Dobson and Henderson, 2003) | (c) (IPW) | (d) (DYN) | |
| S | −5.10 | −19.12 | −9.90 | −5.34 | −6.22 | −8.29 |
| (3.49) | (3.43) | (3.06) | (2.94) | (7.72) | (3.21) | |
| P – S | 13.04 | 6.01 | 11.01 | 13.66 | 12.37 | 12.42 |
| (5.32) | (5.01) | (4.49) | (5.29) | (8.82) | (4.82) | |
| E – S | −7.07 | −1.43 | −4.89 | −5.97 | −8.18 | −5.40 |
| (3.80) | (3.86) | (3.38) | (3.37) | (7.83) | (3.73) | |
- †‘S’ represents the standard treatment, ‘P’ placebo and ‘E’ the experimental treatment. Standard errors are in parentheses.
- (a)
maximum likelihood estimation under a multivariate normal model with unstructured covariance matrix (method UMN) (this approach assumes that drop‐out is MAR);
- (b)
a quadratic random‐effects joint longitudinal and event time informative drop‐out model that was fitted by Dobson and Henderson (2003) using EM estimation, as suggested by Wulfsohn and Tsiatis (1997) (Dobson and Henderson compared four random‐effects structures and concluded that, between these, the model that is used here with random intercept, slope and quadratic terms ‘is strongly preferred by likelihood criteria, even after penalizing for complexity’);
- (c)
an IPW estimating approach as described by Robins et al. (1995), with a logistic drop‐out MAR model;
- (d)
a second martingale fit (DYN) in which residuals at time t are included as covariates for the increments between t and t+1, along the lines of the dynamic covariate approaches for event history analyses as described by Aalen et al. (2004) and Fosen et al. (2006a).
There are broad similarities between our increment‐based estimates and any of approaches (a)–(d) but some differences are worth noting. Method (a) gives a smaller adjustment to the observed means than the others, whereas method (c) adjusts almost as much as our linear increment fits. Both of these are missingness at random models. Method (b) assumes a Gaussian response but method (c) has no modelling assumptions for the responses, a gain that is obtained at the expense of an increase in standard errors. Method (d) leads to estimates that are comparable with the fit that is obtained by using only exogenous covariates, albeit slightly closer to the observed means. Method (b), the quadratic random‐effects model, gives estimates that are close to those obtained by using our new approach. Method (b) took several days of computing time to fit, whereas estimates for other models can be obtained quickly, our linear increment models in particular. The availability of a closed form estimator (12) meant that the 1000 bootstrap simulations that were needed to compute the standard errors were completed in under 10 s on an unremarkable laptop computer. In Appendix A, we demonstrate briefly one way in which our dynamic linear models may be implemented by using standard software.
It is interesting to recall that, in approach (b), Dobson and Henderson (2003) modelled the drop‐out process explicitly and distinguished censoring due to inadequate response from other censoring events; neither is necessary under our proposed approach. Given the similarities between our dynamic linear results and those of method (b), the Dobson and Henderson assumption that these other events are uninformative about PANSS seems to be justified.
The diagnostics proposed may be illustrated by using these data. Having computed
, it is straightforward to extract
. Fig. 5 shows
against
at each time point and provides some evidence that our original model is misspecified. Fig. 5(a) for week 1 clearly indicates a weak negative association, which is consistent with measurement error in the response. The effect is less marked in later weeks. As discussed in Section 4.3, this suggests considering inclusion of
as an additional covariate in the model for increments at time t, which is approach (d) above. Fig. 4(d) shows that the fitted mean response profiles are not materially affected by the misspecification that is indicated by Fig. 5.

PANSS data: residual increments
plotted against
: (a) week 1; (b) week 2; (c) week 4; (d) week 6; (e) week 8
Box plots illustrating the bootstrap distribution of the diagnostic
are shown in Fig. 6. The plot includes results for t=1 to exhibit the magnitude of the independent noise terms. Since the covariance is expected to be constant only for t>1, for diagnostic purposes the first box plot may be safely ignored. On the basis of the remaining box plots, derived from 1000 bootstrap samples, there is evidence of a downward trend in the diagnostic. However, this is mild, and the informal test statistic (again based on 1000 bootstrap samples) is −1.61, corresponding to a p‐value of about 0.1. Together, the diagnostics suggest that departures from the model are sufficiently small to be of little concern.

PANSS data: box plots of
from 1000 bootstrap samples; for a correctly specified model the mean values for t>1 should be equal
8. Discussion
Many approaches to the analysis of longitudinal data with drop‐out begin with the idea of vectors of complete data Y, observed data Yobs and missingness indicators R. We have argued that this set‐up can be too simple, as it does not recognize that drop‐out can be an event that occurs in the lives of the subjects under study and that can affect future responses. Distributions after drop‐out may be different from those that would have occurred in the absence of that event, an extreme example being when drop‐out is due to death. Another might be when drop‐out is equivalent to discontinuing a treatment. Thus there is no well‐defined complete‐data vector Y and we are led into the world of counterfactuals, as described for the two‐time‐point example of Section 2, and the need for careful thought about objectives and targets for inference. An exception is when inference is conditional on drop‐out time (objective 2) and hence based only on observed data. Otherwise, untestable assumptions of one form or another are required for inference. In this paper we consider interest to lie in the drop‐out‐free response Ya and make the two key assumptions of independent censoring and martingale random effects.
In our view, the analysis of longitudinal data, particularly when subject to missingness, should always take into account the time ordering of the underlying longitudinal processes. Often, the drop‐out decision is made between measurement times, and we acknowledge this by insisting that the drop‐out process be predictable, while allowing it to depend arbitrarily on the past. Subsequent events could be affected by the drop‐out decision, and in this sense drop‐out could be informative about future longitudinal responses. We reiterate that we do not require all future values to be independent of the drop‐out decision: the realized response is free to depend on this decision. Nor is the required independence unconditional: our assumption is that, given everything that has been observed, drop‐out status gives no new information about the mean of the next hypothetical response. This is a weaker and, to us, more logical assumption than the standard MAR form. Ultimately, however, both the missingness at random and the independent censoring assumptions share the same purpose: to enable inference by making assumptions about the drop‐out process. MAR enables inference using the observed data likelihood, whereas independent censoring enables inference using the observed local characteristics.
What is therefore important is that all relevant information in rFt should be included in the model for the next expected increment. For example, Fig. 5 suggested inclusion of the previously observed residual as a covariate for current increments. A similar approach might be used to simplify variance estimation, or if there are subject‐specific trends, as in a random‐slope model. Aalen et al. (2004) advocated an equivalent approach in dynamic linear modelling of recurrent event data. We note also the argument in Fosen et al. (2006a) that use of residuals
rather than Y helps to preserve the interpretation of exogenous covariate effects.
Modelling the local characteristics acknowledges the time ordering in longitudinal data analysis, naturally accounting for within‐subject correlation and possibly history‐dependent drop‐out. These features can all be accommodated through linear models on the observed increments of the response process. At no great loss of understanding, the applied statistician could think of our procedure as ‘doing least squares on the observed response increments, then accumulating’, to draw inference about the longitudinal features that a population would have exhibited, assuming that no‐one had dropped out.
Thus far, we have assumed a balanced study design, by which we mean a common set of intended measurement times for all subjects. A natural extension is to unbalanced study designs. It would also be of interest to consider more complicated random‐effects models for the increments of a longitudinal process, potentially gaining efficiency but requiring additional parametric assumptions. We have not so far explored this option; nor the important but challenging possibility of developing sensitivity procedures for our approach.
Acknowledgements
The authors are grateful for the detailed comments and helpful advice of all referees for the paper. Peter Diggle is supported by an Engineering and Physical Sciences Research Council Senior Fellowship. Daniel Farewell's research was carried out during his Medical Research Council funded studentship at Lancaster University. Robin Henderson is grateful for valuable discussions with Ørnulf Borgan and Niels Keiding at the Centre for Advanced Study, Oslo.
Appendix
Appendix A: Fitting dynamic linear models by using standard software

of β(t) for each t ∈ rT, which may be extracted by way of the coef method. The cumulative sum of these estimates

. Additionally, estimated standard errors


without the need for bootstrapping.
References
Discussion on the paper by Diggle, Farewell and Henderson
Joseph W. Hogan (Brown University, Providence)
Diggle, Farewell and Henderson deserve congratulations for a wide‐ranging and thought provoking paper on a common but still somewhat vexing problem. Among the many contributions that are made in this paper, three deserve attention.
- (a)
The authors directly confront the question of defining the target of inference, using potential outcomes to formalize definitions. The importance of carefully defining the estimand cannot be overstated; drop‐out occurs for many reasons, and the consequent missing data cannot necessarily be assumed to arise from a common distribution. In some cases, as with death, ‘missing data’ do not exist.
- (b)
The authors make use of stochastic process machinery to formulate a semiparametric shared parameter model that is identified solely through moment restrictions. This is a welcome contribution. It is natural to view the full data as a two‐dimensional stochastic process {Y(t),R(t):t0}; associated models and inferential methods are highly appropriate and lend important insights (see also Lin and Ying (2001) and Tsiatis and Davidian (2004)). Shared parameter models tend to rely heavily on untestable distributional assumptions for random effects (e.g. normality). The model that is given by equations (l5) and (16), where M(t) is the ‘random effect’, requires only moment assumptions (17) and (18) for identification.
- (c)
Diggle and his colleagues contribute a comprehensive comparative analysis of real data, using six different methods, allowing readers to consider carefully the underlying assumptions of each method and their effect on inferring the full data distribution.
My comments relate to the first and third of these.
Defining the target of inference
The authors define the target of inference by using potential outcomes (counterfactuals). In Section 2, it is argued that Y2 may be altered by the act of dropping out; hence Y2=RY2a+(1−R)2b, where the realized response at time 2 is Y2a if the participant remains in the study, and Y2b if she drops out. The full data are (Y1,Y2a,Y2b,R). This framework enables articulation and criticism of the modelling objective. It also invites a comparison with the more familiar application of potential outcomes to causal inference.
In that context, the full data for each individual are (Y0,Y1,T), where the outcome is Y1 if a treatment is received, Y0 if not, and T ∈ {0,1} indicates actual receipt of treatment. Inference about causal effects such as θ=E(Y1−Y0) is a missing data problem because only T and YT=TY1+(1−T)Y0 are observed; Y1−T is missing. Causal parameters are identified by placing untestable constraints on the joint distribution of (Y0,Y1,T) and possibly confounders or instrumental variables; see Angrist et al. (1996) and discussants for examples. Similarly, the use of potential outcomes to define the full data by Diggle and his colleagues implicitly requires the analyst to specify or constrain the joint distribution of (Y1,Y2a,Y2b,R). The linear increments method confines attention to (Y1,Y2a,R); however, I could not ascertain whether assumptions about Y2b are required in general, or whether the linear increments method can be used to infer aspects of Y2b.
On a more conceptual note, one can plausibly argue that potential outcomes are well defined when viewed as inherent characteristics of each individual, e.g. response if treatment is taken, and response if not taken (but see Dawid (2000) and discusssion for a range of viewpoints). In this paper, it is not clear whether the potential outcomes can be viewed as inherent characteristics, or whether they are metaphysical: can we easily conceptualize Y2b for a person who does not drop out, or Y2a a person who does?
A role for sensitivity analyses?
The authors briefly mention sensitivity analyses, but I believe that the issue warrants a closer look. Any full data model fit to incomplete data extrapolates the missing data under some set of untestable assumptions. The extrapolation is explicit for some models, and less so for others. Although mixture models were not used to analyse the trial data, they are well suited to assessing sensitivity to assumptions about the missing data mechanism. The mixture model factorization is f(yobs,ymis,r)=f(yobs,ymis|r) f(r), where ω=(θ,φ) is the parameter indexing the full data distribution. In many cases, mixture models admit
- (a)
a partition (θI,θNI) of θ into its identified and non‐identified elements and
- (b)
a closed form factorization of the mixture components f(yobs,ymis|r) into an unidentified extrapolation model fθI,θNI(ymis|yobs,r) and an identified observed data model fθI(yobs|r), i.e.
(21)
The implications of equation (21) are that
- (a)
the fit of the model to observables can be checked and
- (b)
assumptions about the unobservables that are encoded via fixed values or prior distributions for θNI will not affect the fit to the observables.
(22)
, and
. Contrary to the assertion in Section 3.2, this model—and all pattern–mixture models—parameterizes the distribution of the full data (Y1,Y2,R), and not just that of observables. Constraints are then imposed to identify E(Y2|Y1,R=𝒟). Indeed, the transparency of pattern–mixture models with respect to model identification can be seen as a virtue in missing data settings (Little, 1995). From expression (22),
(23)
and
; here θ1 remains the same but θN1=τ. It follows that


More generally,
provides estimates of or bounds on E(Y2) that are consistent with the observed data. Derivatives of
with respect to τ measure sensitivity to departures from MAR, With more measurement times, the dimension of τ obviously increases, but simplifications can be made while preserving the structure in model (21) and maintaining lack of identifiability (e.g. assuming that τ is constant over time, or assuming that departures from MAR are confined to first‐order serial dependence parameters). These principles also can be applied when drop‐out is continuous (Hogan et al., 2004).
If
and τ0=τ1=0, then
coincides with the linear increments estimator that is given by equation (19). In general it seems clear that the linear increments model admits missingness not at random mechanisms, but without a decomposition like model (21) it is difficult to understand specifically how it departs from MAR, how missing data are extrapolated from observed data and whether parameterizations for sensitivity analysis can be easily developed.
Summary
In my own experience, articulating model assumptions (and their limitations) to collaborators who generate the data and decision makers who interpret the analyses is important. The authors clearly share this view and have gone to great lengths to clarify assumptions and objectives. They also have considerable experience and depth of knowledge in both theory and application of models for missing data, and I look forward to reading their insights about the issues that are raised in the discussion. It is an honour and a pleasure to propose a vote of thanks.
James Carpenter (London School of Hygiene and Tropical Medicine)
The first part of the paper rightly highlights the principle of separating assumptions from analysis. Assumptions come first and should be as accessible as possible; statistical methods should be principled and give valid results under the assumptions. Sensitivity analysis is important.
My concern is that the proposal conflates assumptions and analysis. We can assume a particular missing data mechanism, model as suggested by the authors and end up with an estimate that is valid under quite a different mechanism.
(24)When will this give valid estimates?:
- (a)
always when Y2 is missing completely at random (MCAR);
- (b)
not when Y2 is missing at random (MAR);
- (c)
not when Y2 is not missing at random (NMAR) unlessY2i−Y1i are MCAR.
, the estimated mean at time 2 is
(25)However, this estimator is only valid ifY2is MAR (i.e. MCAR givenY1).
We use a statistical test to choose between expressions (24) and (25). This means that if Y2 is NMAR sometimes we shall use expression (24) when we should not, and the estimate of the mean will be biased if data are NMAR, and vice versa if Y2 is MAR. Simple simulation studies with random intercept and missingness at random mechanisms show that the bias is non‐trivial.
Thus the missing data mechanism that is needed for valid estimates changes as covariates are introduced or removed from the increment model, in a way that it does not when modelling responses (if assuming data MAR we always use expression (25); the estimator is unbiased whether β=0 or not). So we cannot make statements like ‘Assuming that the data are MAR we perform a valid analysis…’. Instead, the assumption about the missing data mechanism is conflated with the model, yielding biased estimators under the assumptions of data MAR and NMAR. This problem is compounded when we have observations at more than two time points. ‘Summing up’ our increment estimators gives an estimator whose components are valid under a range of conflicting missing data mechanisms. This conflicts with our opening principle.

(26)Using these ‘increments’ we now obtain valid estimates for a much broader class of drop‐out mechanisms (e.g. random intercepts and slopes). One can argue, then, that equation (26) is preferable.
Diagnostics proposed include a scatterplot of
versus
. However, this will often show a strong slope due to regression to the mean, which is easily confirmed by simulating a random‐intercept model with intercept variance four times the error variance. Thus it is of limited use.
In conclusion I have enjoyed this stimulating paper, but I am left with some nagging doubts. The proposal conflates the analysis model and untestable assumptions about the missing data mechanism in a way that violates the principles that are advocated in the paper and risks confusing all except the most wary: results are generally biased if data are MAR and if data are NMAR, even if the increments have mean 0.
By contrast, we argue that a data MAR analysis (where [Ymiss|Yobs,R]=[Ymiss|Yobs]) is the natural starting‐point for a per‐protocol analysis (Carpenter and Kenward, 2007). We then look at how robust our conclusions are to departures, either by multiple imputation (chapter 6 of Carpenter and Kenward (2007)), using post‐imputation weighting (Carpenter et al., 2007) or prior information (White et al., 2007).
Additionally, the residual diagnostic is unreliable and the method cannot naturally handle interim missing data. The method can be seen as one of a class of methods (see equation (26)). However, the underlying missingness mechanisms that these assume are generally implausible relative to final increments or concomitant processes.
Despite filling my alloted space with criticisms, clearly much is praiseworthy and it gives me great pleasure to second the vote of thanks.
The vote of thanks was passed by acclamation.
Hans C. van Houwelingen (Leiden University Medical Center)
I compliment the authors for an inspiring paper. My attention was immediately drawn by the analysis and graphical presentation of the PNASS‐example. I would have loved to reanalyse the data, but unfortunately they are not available. Therefore, I created my own data set that approximately mimics the example. I created a sample of size n=100 000 with measurements at t=0,1,2,3,4, no covariates and autoregressive data with σ=1, ρ=0.8 and drift μ=0.1t. After the first observation (t=0), values above 1.5 cannot be observed and cause drop‐out, resulting in cumulative drop‐out rates of 8%, 13%, 18% and 23% at t=1,2,3, 4 respectively. My data violate the assumption of the paper because the process is not a martingale and the drop‐out depends on the future. I wondered whether the violations could be visualized at the mean level and not only through the covariance structure.
I fitted the very simple martingale model with time varying mean increment. The observed means among the ‘survivors’ are −0.0009, −0.0613, −0.0302, 0.0136 and 0.0617; the observed mean increments are 0.0694, 0.0879, 0.1018 and 0.1096 and the cumulative increments are 0.0694, 0.1574, 0.2592 and 0.3688. Fig. 7 shows the reconstruction of the full data means as presented in the paper.

Reconstruction
This is a counterfactual picture. I would like to see the link with the observed data. A more informative picture could be Fig. 8. It shows the mean values of the history of all individuals present at t=1,2,3,4. The graph shows the ‘last’ increments in each trajectory as used in the martingale model, but also the increments for those who are ‘alive’ at later measurements. It unmasks the increase over time as well as Fig. 7. It has the flavour of pattern–mixture, but not quite. The martingale model would imply that these lines are parallel. Close inspection shows that they are not perfectly parallel. This could be used for model checking.

Retrospection
The graph can also be used to read off predictions one, two, three and four periods ahead. I think that that could be useful for clinical purposes and, again, for model checking. This leads to a ‘predictive’ goodness‐of‐fit check comparing ‘empirical’ mean increments over larger intervals with the ‘model’ value (Fig. 9). This again shows some model violation.

Predictive increments
The predictive interpretation of the martingale model and the related goodness‐of‐fit test could be interesting issues for further research.
Inês Sousa (Lancaster University)
This paper is an important contribution in the literature of analysis of longitudinal data with drop‐out. The authors have presented a useful discussion on inferential objectives when individuals drop out of the study, distinguishing between unobserved and counterfactual measurements. The most widely used methods are reviewed and categorize according to their different inferential objectives. The model that is proposed brings ideas and methods from survival data analysis into the literature of longitudinal data analysis, when a clear and specified objective is defined. The novelty of the model that is proposed is on modelling increments conditional on all observed history rGt−. Therefore, it is possible to simplify E{Ya(t−1)|rGt−}=Ya(t−1), and so the subtraction of the error ɛa(t−1) in equation (10) of the paper. Moreover, modelling the conditional distribution on the history makes the realistic assumption that anything now is not determined by something that is unrealized in the future.
The main difference between a martingale random effect and a stationary random effect, which is commonly used in mixed effects models for longitudinal data, is the expected value of the longitudinal response conditional on data are missing. For the former model, the expected value is the last observed measurement for that individual, whereas for the latter the same expected value is that for the population average.
(27)
(28)The fitted model is presented in Fig. 10, for the same mean model as in the paper. The results in Fig. 3 of the paper are comparable with the estimated unconditional mean here in Fig. 10, as this is what we would have observed if no individuals had dropped out. Although confidence intervals are not given in Fig. 3 for the model proposed, these seem to be possible to obtain with standard software.

Observed and fitted mean response profiles (for each treatment are presented observed means (•), fitted unconditional (——, E[Yj]) and conditional means (·−·−·−, xE[Yj|D> log (tj)]) and approximate 95% pointwise confidence limits for the fitted unconditional means (– – –)): (a) standard treatment; (b) placebo; (c) experimental treatment
Vanessa Didelez (University College London)
I congratulate the authors for this stimulating paper.
They suggest that we consider the (possibly counterfactual) response Y2a if the subject had not dropped out. Which real world quantity does this correspond to? In my view, Y2a is only well defined if a specific intervention to prevent drop‐out can be conceived of—this is also emphasized by the main protagonists of counterfactuals; see Rubin (1978) or Robins et al. (2004). Only then can assumptions about Y2a, such as independent censoring, be justified. Moreover, depending on how we ‘force’ subjects not to drop‐out, there might be different Y2as satisfying different assumptions.
An alternative formal framework has been suggested by Dawid (2002) (see also Pearl (l993) andLauritzen (2001)): let FR be an intervention indicator, where FR=Ø means that no intervention takes place and drop‐out happens ‘naturally’, and FR=1 means that drop‐out is prevented by a well‐specified mechanism. Then P(R=1|Y1;FR=1)=1, i.e. FR=1 ‘cuts off’ any influence from other variables on drop‐out (illustrated in Figs 11 and 12), whereas P(R=1|Y1;FR=Ø) is the observational distribution of drop‐out. Objective 1 now corresponds to [Y2|FR=Ø], objective 2 to [Y2|R=1;FR=Ø] and objective 3 to [Y2|FR=1].

Intervention graphs (a) when no intervention takes place and (b) when drop‐out is prevented by an intervention

Intervention graphs with an unobserved adverse event A (a) when no intervention takes place and (b) when drop‐out is prevented by an intervention
Assumptions about drop‐out will often be in terms of conditional independence (though independent censoring uses expectation) and may be formalized with graphical models. In Fig. 11, for example, drop‐out implies discontinuation of treatment (more generally T could stand for anything relevant happening after drop‐out), whereas preventing drop‐out, FR=1, ensures that subjects continue treatment as planned. In Fig. 11 we have that in general P(Y2|FR=1)≠P(Y2|FR=0) (or Y2a≠Y2b); however, as can be read off the graph
and hence P(Y2|Y1;FR=1)=P(Y2|Y1,R=1;FR=Ø), so that, once we condition on the ‘past’Y1, identification from observed data is still possible. In Fig. 12, drop‐out as well as discontinuation of treatment are caused by an adverse event. Intervening to prevent drop‐out will then not prevent the adverse event and treatment might still be discontinued. Here P(Y2|FR=1)=P(Y2|FR=Ø) (or Y2a=Y2b) and especially P(Y2|Y1;FR=1)≠P(Y2|Y1,R=1;FR=Ø) owing to the ‘confounder’ A.
Given the analogy to causal reasoning it is not surprising that independent censoring, which is required for identification of objective 3, is very similar to ‘no‐counfounder’ assumptions in causal theories. For instance, the sequential version of the conditional independence in Fig. 11 is called ‘stability’ by Dawid and Didelez (2005) and corresponds to ‘sequential randomization’ (Robins, 1986).
Axel Gandy (Imperial College London)
I congratulate the authors on their interesting paper. I would like to discuss the relationship of the diagnostic that is suggested by their formula (13) to transforms of the residual
. On the basis of these transforms, diagnostics that are sensitive to specific alternatives can be constructed.
Checking that the left‐hand side of formula (13) is constant for t2 is equivalent to checking that for t3



where K(1)=K(2)=(0,…,0) and
for s3. Thus the diagnostic is based on a transform of
.
Suppose that we have no measurement error, i.e. ɛa(t)=0. Then the process
is a martingale. Hence,
is a zero‐mean martingale, not only for this specific K, but for any predictable process K. A diagnostic can be based on how far K·Z deviates from 0. By choosing K suitably, we can make it sensitive against specific alternatives, similarly to the approach that was described in Gandy and Jensen (2005) for event history analysis.
need not be a martingale and thus
need not be a mean 0 martingale. However, for a slightly smaller class of processes K we can show that
for all t. Essentially, besides being predictable, K also needs to satisfy K(t)⊥Δɛ(t). Suppose that, for all t, K(t) is measurable with respect to the σ‐algebra that is generated by

One can show that
is a mean 0 martingale. Furthermore, E[{K·(I−H)·ɛrJ}(t)]=0 because K(s)⊥Δɛa(s) for all s. Hence,
. Thus directed model diagnostics can be based on a suitable choice of K also in the presence of measurement error.
N. T. Longford (SNTL, Reading)
The authors’ emphasis on the objectives of a study is to be commended, although a clear formulation of the objectives of any study is important, even if there is no drop‐out and no longitudinal dimension. It is particularly important for studies that are expensive (ethically, financially or with regard to other resources) and have to be carefully designed. The objectives should be a key consideration in their design stages. In many secondary analyses, the objectives are constructed for the sake of instruction or illustration, often because the original purpose of the study has not been recorded, or may never have been formulated in detail.
The potential outcomes framework in Section 2 is more complete than is usually encountered in the literature, but it could easily be extended. In addition to Ya and Yb, I would consider Yc, the outcome that would be attained in an ‘operational’ (prescription) scenario, not a clinical trial. I believe that context matters, and therefore Yc≠Yb. But the real objective is inference about Yc.
The rigour that is associated with the study objectives can be ratcheted up indefinitely. For example, in the schizophrenia trial, inference is sought about the population of sufferers. Their good representation in the study cannot be arranged by a sampling design because a sampling frame cannot be constructed. (Informed consent is another obstacle.) In any case, inference is desired about the sufferers in the future, and this population is not realized yet, because some of its members are as yet to contract the condition, and the condition is temporally not stable. Further, the interest may be not in all sufferers, but only those who would in the future be prescribed the treatment. In brief, there are several layers of uncertainty beyond those which are taken into account in the analysis, or those that could reasonably have been.
The authors propose an efficient estimator, but one might be interested in the efficient pairing of a design and an estimator. Intending to apply their estimator, how does one go about designing a longitudinal study in which drop‐out is expected? Do the established principles suffice? How is the uncertainty about the martingale condition taken into account?
The conditional independence involved in the missingness at random mechanism is retained by non‐linear transformations. In contrast, the identity of conditional expectations that are involved in martingales is not invariant with respect to non‐linear transformations. So, the scale on which the outcomes are analysed matters. Is this a problem?
D. R. Cox (Nuffield College, Oxford)
It is a pleasure to congratulate the authors on their contribution to a challenging topic.
It is known (Rotnitzky et al., 2000) that for some simple models of informative non‐response both that not directly testable assumptions are involved, as indeed seems inevitable, but also that even if those assumptions are satisfied the estimates have very poor properties near a null hypothesis where in fact non‐response is uninformative. This is because the score vector may be singular. Is broadly similar behaviour possible in the authors’ model?
The authors’ discussion of objectives is clearly central and raises the design requirement, wherever feasible, to clarify for each individual the reason for missingness, especially since types of missingness may vary systematically between treatment arms.
The use of a random‐walk‐type error model is elegant but does have rather strong implications if used over extended time periods.
The following contributions were received in writing after the meeting.
Odd O. Aalen (University of Oslo)
This interesting paper gives a fresh view on the analysis of longitudinal data. Such data are very common and are often handled with complex methods that practical researchers may find difficult to carry out. A major issue is that of missing data, which are very common, in fact hardly avoidable in many fields. The following general comments can be made.
- (a)
The use of martingale modelling gives a very flexible tool. This means that drop‐out may depend on the past in possibly complex ways, and even allows dependence between individuals. In fact, martingale assumptions may replace the classical assumptions of independence, just like in the counting process approach to event history analysis.
- (b)
The paper is also a contribution to removing the artificial distinction between longitudinal data analysis and event history analysis. The connection is made by the parallel between increments in longitudinal data and events in counting processes. In fact, events are a special case of increments, and the realization that methods for event histories may be applied to longitudinal data is of great interest.
- (c)
In fact, we often see a mixture of event data and longitudinal data, for instance when covariates or markers are measured repeatedly over time in survival studies. The typical approach is to view the events as the primary focus, and to include the time‐dependent covariates in, say, a Cox model. A better approach would be to consider event processes and the covariate process in parallel, and to treat them on the same level. This gives also a much better understanding of time‐dependent covariates, as demonstrated in Fosen et al. (2006b). The concept of dynamic path analysis that was introduced there, with the attendant graphical analysis, should be equally useful in the setting of the present paper.
- (d)
Missing data are often handled by means of inverse probability weighting, which is part of a more general approach to causal modelling where one constructs pseudopopulations that would have been observed in the absence of missing data. Although counterfactual approaches are useful, the construction of pseudopopulations by adjusting for imbalance in other variables or processes may limit the understanding of the process that actually occurs. The approach that is advocated in the present paper, in contrast, is dynamic. An essence of the dynamic view is to analyse the data as they present themselves without construction of hypothetical populations (Aalen et al., 2004; Fosen et al., 2006a, b).
Daniel Commenges (Université Bordeaux 2)
The authors propose an approach which focuses on modelling the change rather than the absolute value of the process of interest and this, together with the estimating method, provides a useful tool which has similarities with the proposal of Fosen et al. (2006b).
Among several issues that are raised by this stimulating paper, I shall focus on that of the continuous or discrete nature of time in statistical models. The model that is presented by the authors is clearly in discrete time, except for the indicator process which is ‘thought’ of in continuous time. I would like to advocate a more realistic point of view which has become classical in automatic theory for a long time but is still uncommon in the biostatistical literature. It consists in separating the ‘model for the system’ and the ‘model for the observation’. Most often the system lives in continuous time whereas observations may, for some events, be in continuous time but are most often in discrete time. The response indicator process Ri(t) in the so‐called time coarsening model for processes observation scheme (Commenges and Gégout‐Petit, 2005; Commenges et al., 2007) can represent continuous or discrete time observations. For instance, if observations are made at t1,t2,…,tm,Ri(t)=0 for all t except for t1,t2,…,tm where it takes the value 1. Generally both the process of interest and the response indicator process are multidimensional so some components may be observed in continuous time whereas others are observed in discrete time. In Commenges and Gégout‐Petit (2005) ignorability conditions are given. An example is that of a medical doctor who decides the date of the next observation of CD4 cell counts and human immunodeficiency virus load on the basis of the observations of these two markers at the current visit. If the CD4 cell counts and viral loads are modelled, this observation scheme is ignorable. An application of this point of view is given in Ganiayre et al. (2007) where a cognitive process living in continuous time is observed in discrete time by both a psychometric test and the diagnosis of dementia.
Richard Cook and Jerry Lawless (University of Waterloo)
It is difficult to give convincing general prescriptions for modelling and analysis when the physical processes behind longitudinal data and the nuances of cessation of treatment, drop‐out or termination processes vary so widely across applications. Counterfactuals seem to us unappealing in this setting, though many statisticians would disagree; see the discussion of Dawid (2000). We prefer notation that distinguishes actual outcomes. Excluding the possibility of termination, let X2=1 if the individual remains on treatment and X2=0 otherwise, and R2=1 if their response is observed at the second assessment and R2=0 otherwise. The variable X2 indicates whether the treatment was received and R2 indicates whether the response is observed. We then have E(Y2|R2=1,X2=1), E(Y2|R2=1,X2=0), E(Y2|R2=0,X2=1) and E(Y2|R2=0,X2=0); of course, if R2=0, then we may not know whether X2=0 or X2=1, and Y2 is unobserved. If ‘clinical interest genuinely lies in the hypothetical response that patients would have produced if they had not dropped out’, then presumably we are interested in Y2|X2=1, R2=1, i.e. we must draw comparisons with ‘similar’ subjects who received the treatment and did not drop out.
We have some other comments.
- (a)
Conditional models are most convenient for describing the effects of inclusion in a study, the evolution of responses over time and drop‐out and treatment processes. See Raudenbush (2001) for some interesting discussion in a specific setting.
- (b)
Regarding comparisons based on marginal process features, Cook and Lawless (1997, 2002) considered examples involving recurrent and terminating events.
- (c)
Information about the observation and drop‐out processes should be collected including, for example, factors that are related to the times that an individual is seen, or why she leaves the study. The drop‐out process and any terminating processes should also be examined and analysed as functions of the treatment and previous process history. This is a necessary step for methods involving inverse probability weights for estimating marginal features, but it also provides insight into the interpretation of marginal features. In some settings it may be worth recruiting fewer subjects to leave funds for tracing drop‐outs.
- (d)
As the authors mention, when observations times aj (j=1,2,…) are widely spaced there is a strong likelihood that some losses to follow‐up at aj are not conditionally independent of the process history over [aj−1,aj), given the history up to aj−1. Point (c) is crucial in formulating plausible models as a basis for sensitivity analysis.
David J. Hand (Imperial College London)
I was delighted to see this paper. In particular, I was especially pleased to see the first half of the paper, drawing attention to the several research questions which one may wish to address in the context of drop‐outs in longitudinal data. The reason why I was so pleased is that the paper represents a reinforcement of, and detailed exploration of a special case of, a general point that I made in a previous paper (Hand, 1994) that was presented to the Society. I there pointed out that much statistical research fails to pay sufficiently close attention to the real aims of the research, and so risks drawing inappropriate, irrelevant and incorrect conclusions. The authors of the present paper say
‘In all applications careful thought needs to be given to the purpose of the study and the analysis’.
In my paper, I said
‘Too much current statistical work takes a superficial view of the client's research question …without considering in depth whether the questions being answered are in fact those which should be asked’.
I presented a series of detailed examples of this—and, indeed, mentioned drop‐outs in longitudinal data in passing. I hope that the present paper will serve to draw people's attention, not only to the need for precise formulation of the research question in the context of longitudinal data, but also to such problems elsewhere: parallel issues exist in many other situations.
Haiqun Lin (Yale University School of Public Health, New Haven)
I compliment Diggle, Farewell and Henderson on this outstanding and enlightening work. It is my great pleasure and honour to contribute to the discussion.
This paper introduces a novel and elegant discrete time local incremental linear model to target the inference in a counterfactual drop‐out‐free world under balanced longitudinal study. The residual process of the response is a discrete time martingale that can be regarded as distributional‐free, subject‐specific, time varying random effects with heteroscedastic variances.
Strikingly, the incremental approach does not need to specify a drop‐out model but allows the drop‐out to depend on a past latent process that is related to the response such as a previous martingale residual. This can be regarded as a type of assumption of data missing not at random that is weaker than data missing at random or sequentially missing at random.
A remarkable result from the method proposed is that the parameters for the counterfactual outcome can be obtained with the ordinary least squares from the observed data with drop‐out. The inverse weighting method of Robins et al. (1995) needs to specify an additional drop‐out model under data sequentially missing at random and the likelihood method accommodates data missing at random but requires a distributional assumption for the response and identical counterfactual outcomes if a subject had dropped out or continued to stay.
In the model proposed, the dependence of expected increment E{ΔYa(t)} on past history of the response in rGt− is only through Xa(t) β(t) and therefore the martingale properties of the residual process rely critically on the choice of covariates in X which can include exogenous variables, measured responses and dynamic covariates and on correctly specifying the covariates’ functional forms. Nevertheless, the diagnostic procedures proposed may help in decision regarding X. The model is not designed for interpreting a covariate effect on the response itself. An increment is much less consistent in its direction especially when the response is relatively stable over a period of time, in which case β(t) may be forced to have opposite sign for a same covariate at different ts. However, this is a minor concern if the response trajectory is of major interest rather than the covariate effect.
For discrete longitudinal responses where the method may not be readily applied, the inverse probability method assuming data sequentially missing at random will be highly desirable if a populational‐average inference under a distribution‐free setting is preferred. With data missing not at random, the joint model would be of great value if a similar result to that obtained by Hsieh et al. (2006) can be established.
Mary Lunn (University of Oxford)
The paper proposes a model which carries over some features of the additive model from counting processes with censoring to longitudinal data with drop‐out.
The two key assumptions are that
- (a)
the random effects are martingales and
- (b)
drop‐out does not affect the expected mean differences so

These are of similar import to the assumptions in Andersen et al. (1993). One key point is that it is not assumed that the martingale random effects have constant variability. This can clearly be seen in the first simulation (Fig. 1) where the correctly specified model has increased length in the box plots as time increases. This is less prevalent in the incorrectly specified model using the Laird–Ware random effect as might be expected, although again as expected the median certainly decreases since observations with high positive random effect are more likely to drop out.
Andersen et al. (1993), page 565, commented that unweighted least squares do not take into account the differing variability in the martingale process and they went on to show that to achieve some kind of optimality a weighted least squares would be preferable. They deduced the form of the weighted estimators. One suspects that this will also be the case in this model for longitudinal data.
One final incidental comment is that it is not immediately clear why the same drop‐out process was used for both models in the second scenario, but not in the first. This may be a misreading of what was intended here.
Torben Martinussen (University of Copenhagen)


corresponding to Y2a and R being uncorrelated given (Y1,ɛ1).
The model for the longitudinal response corresponds to the Aalen additive hazards model (Aalen, 1980) for survival data. Results are formulated for the estimated cumulated coefficients
, which is claimed to be √n consistent (Farewell, 2006). For survival data, the Aalen additive hazards model is indeed appealing as it easily accommodates inference for the time‐dependent regression coefficients; McKeague and Sasieni (1994) and Martinussen and Scheike (2006). This is carried out on the basis of the cumulated coefficients as they can be estimated at the usual √n‐rate, which is not so for the regression coefficients. It is not so obvious, however, that the same approach for longitudinal data with drop‐outs is likewise appealing as interaction with time is easily modelled and estimated by using traditional methods, and the estimators for the regression coefficients in this setting converge at the usual √n‐rate. Looking at increments the cumulatives arise naturally, but is anything else gained in this situation by aiming at the cumulated regression coefficients?




Geert Molenberghs (Universiteit Hasselt, Diepenbeek) and Geert Verbeke (Catholic University of Leuven)
The paper is thought provoking and of interest since, although much has been written about non‐ignorable missingness (Verbeke and Molenberghs, 2000; Molenberghs and Verbeke, 2005; Molenberghs and Kenward, 2007), the authors succeed in presenting a fresh take on the problem, not only through their original taxonomy in terms of counterfactuals, but also by using a novel modelling framework using increments, martingale theory and stochastic processes.
The authors take a parametric view. Although this agrees with our own inclinations, there also is an increasing volume of semiparametric research, synthesized in Tsiatis (2006). A fine comparison between the semiparametric, doubly robust framework and conventional methods is provided in Davidian et al. (2005). Although a focus on but one framework is common, owing to research interests and the strong but unfortunate dividing lines between the Rubin and Robins camps, every contribution furthering understanding of the relative merits is welcome. This includes connections with causal inference, counterfactuals and instrumental variables. Similarly, it is important to discuss the sensitivities of the various techniques that have been proposed, especially when fully parametric, the implications thereof and (in)formal ways to assess and address such sensitivities. Although these points have been touched on by the authors through illuminating literature reviews, showing the authors’ thorough familiarity with the topic, we are looking forward to, on the one hand, further integration between the parametric and semiparametric schools from the viewpoint of the methods proposed and, on the other hand, to suggestions for sensitivity analysis tools.
Likewise, further studying the implications of the proposal on the pattern–mixture and shared parameter frameworks, both of which are now touched on only lightly, will enhance understanding; this has a large potential for practical use.
The authors acknowledge that their work thus far has been confined to the balanced case, meaning that all subjects are measured at a common predetermined set of measurement occasions and that the extension to the unbalanced case is natural. We certainly agree that such extensions would be important and practically useful given the huge volume of observational and other non‐balanced studies. Thanks to the martingale take on the problem, this statement seems warranted and we would be very interested in learning more. Arguably, such extensions would have to distinguish between cases where measurement times are merely unbalanced by design, or where, in addition, the measurement times are genuinely random and potentially contain information about the process of scientific interest.
Christian B. Pipper and Thomas H. Scheike (University of Copenhagen)
We enjoyed reading this very stimulating paper.
Additive time varying models are very useful to obtain a covariate‐dependent description of time dynamics and have been studied in both the regression and the hazard setting. The authors consider discrete time models that are conceptually much simpler than continuous time longitudinal data, in particular when increments are of interest. We find it very relevant to model increments exactly as was done in Martinussen and Scheike (2000) in a continuous time setting. In growth trials for example the velocity of growth is often the natural quantity of interest.

with additional independent measurement error. The observed data are then modelled by introducing at‐risk indicators Ri(t) that are 1 if the subject is under risk and 0 otherwise. We discuss the model specification and the two key assumptions therein:
- (a)
the martingale assumptions on the error term;
- (b)
the censoring is predictable (see Scheike and Pipper, below).
The authors suggest that we partition the random variation into random effects and measurement error. This seems ambiguous as long as a more specific error term model is not specified. It is not clear to us what is really gained by this, in particular since the variance estimator of the model is only practically operational in the case without measurement error.

Thomas H. Scheike and Christian B. Pipper (University of Copenhagen)
First we discuss the assumption that the censoring process is predictable. Formally it is assumed that R(t+1) is measurable with respect to ℛt. This implies that there are functions ft so that R(t+1)=ft(R(1),…,R(t)) and by recursion we see that there are deterministic functions gt so that R(t)=gt{R(1)}. Thus the point of drop‐out is known to us at the beginning of the study. In an independent and identically distributed data setting all subjects are thus censored at the same time. This assumption is very restrictive. For the PANSS study this assumption is barely satisfied.
(29)

where the design is partitioned into two parts. In the discrete time setting this is a standard model.

in contrast with a standard regression model that models the mean by X(t) β(t). When the covariates are constants, like treatment groups, the first mean equals X(t) B(t) with
. The two models are thus equivalent in this situation.
I. L. Solis‐Trapala (Lancaster University)
This stimulating paper begins with a succinct account of existing approaches to the analysis of longitudinal data with drop‐out, encouraging the reader to consider carefully the objectives of the study at hand in their own modelling strategies.
Firstly, I would like to reflect on the target of inference which motivates the authors’ proposal. Although they state that their target of inference is the mean response, they propose to model the expected increments of the longitudinal process. Indeed, it seems from the context that they are interested in measuring mean contrasts between groups of participants who are assigned to different treatments, rather than mean contrasts within subjects.
This distinction is briefly highlighted in the discussion section, where it is argued that inclusion of residuals in the mean specification rather than previous responses preserves the interpretation of the effects of the exogenous covariates. For example, in the case of the schizophrenia study, measurement of a direct effect of treatment appears to be of primary scientific interest.
Secondly, the way that the measurement error is included in the linear model is not entirely clear to me. Intuitively, I would associate a measurement error, rather than a lagged error, with the response increments.
Thirdly, the authors acknowledge a limitation of their model, namely that it is based on untestable assumptions. This limitation is not specific to their approach, but is well known from other models dealing with missing data. Assuming a martingale for the random effect is, in my opinion, an elegant way of formalizing the key assumption of stability. This reflects that the unobserved increments (due to drop‐out) are assumed to follow a process that is similar to that observed in the past.
Jeremy M. G. Taylor (University of Michigan, Ann Arbor)
I agree with the authors that for some scientific applications involving longitudinal data it makes sense for the targets of inference to be parameters of a hypothetical drop‐out‐free world, whereas in other applications this may not make sense. A difficult question is whether we can consider a hypothetical drop‐out‐free world, when drop‐out is due to death. In cancer research a frequently used experiment is one in which tumours grow in laboratory mice, and the response variable is the size of the tumour at 12 months say. Such experiments may include both planned early sacrifices and sacrifices to prevent suffering in animals in which the tumour has grown large. I would be interested in hearing the authors’ view on whether it is still reasonable to consider a hypothetical drop‐out‐free world in this setting when evaluating the mean tumour size at 12 months.
The Diggle, Farewell and Henderson longitudinal model raises the issue of ‘what is a statistical model?’. One view, which is implicit in their specification, is that a model should be a plausible approximation to the mechanism that gave rise to the observations. Under this viewpoint, it should be a principle that the observations and drop‐out at a certain time cannot depend on the future. But if a model is simply viewed as a way to describe data, using a small number of parameters, then this principle seems less pertinent.
In longitudinal models distinction is made between subject‐specific, population‐average and transition models. The increments model of the authors has the flavour of a transition model. In continuous time, modelling increments in the response generalizes to that of modelling slopes. This then bears some resemblance to some of our previous work (Taylor et al., 1994). We assumed that the expected slope at time t evolved according to an Ornstein–Uhlenbeck process. This leads to a model for the measured response of the form Yit=X(t)β+ai+Wi(t)+eit where
and Wi(t) is an integrated Ornstein–Uhlenbeck process.
The good efficiency properties of the approach of Diggle and his colleagues compared with fully parametric joint modelling was interesting, but somewhat surprising to me. The very poor efficiency of the inverse probability weighting approach was also striking. Have the authors found similar efficiency comparison results in other applications and in simulations?
D. Zeng and D. Y. Lin (University of North Carolina, Chapel Hill)
We congratulate the authors on a clever and intriguing piece of work. The time‐specific conditional mean models avoid the ambiguity of counterfactual response after drop‐out, which can be an issue in joint modelling. Joint models, however, are useful for prediction and amenable to efficient estimation. We pose two questions.
- (a)
Since the model is conditional on the response history, is β(t) in equation (10) the most relevant quantity?
- (b)
Are there concrete examples that the drop‐out process satisfies the assumptions of Section 4.1.2 but violates missingness at random?
solves the equation

is asymptotically (multivariate) normal and the covariance matrix between
and
can be estimated by the sandwich estimator
(30)We then estimate the covariance matrix of
on 𝒯 by
, where
is the pt×pt sandwich covariance matrix estimator for
based on expression (30) and I is the p×p identity matrix. Thus, we can make inference about B(t) by using standard procedures for normal statistics. Since it is a very simple function of data, the sandwich estimator should provide accurate variance estimation in finite samples. It is not necessary to use the bootstrap, although the above arguments imply that the bootstrap is valid.
The authors replied later, in writing, as follows.
We thank all the discussants for their helpful and constructive comments and apologize if we have overlooked any of these in our reply. We have grouped our response under three headings: objectives, general modelling issues, including sensitivity and diagnostic checking, and issues that are specific to our proposed model class and its possible extensions.
Objectives
We agree with Hand that careful consideration of objectives is always important, and not at all specific to longitudinal studies. We suspect that all statisticians would agree, but that not enough statistics degree syllabuses give this topic the attention that it deserves.
Hogan, Cook and Lawless, Molenberghs and Verbeke, and Didelez all comment on the link between our discussion of potential outcomes and the wider topic of causal inference. Hogan asks whether our potential, but unrealized, outcomes are inherent characteristics of the subjects to whom they belong, or purely metaphysical. A partial answer is that this depends on the context. For the data that are analysed in Section 7 of our paper, and borrowing from Didelez's comments, we can easily conceive of an intervention, albeit an unethical one, that would prevent drop‐out. Perhaps a better example is long‐term follow‐up of dialysis patients, with drop‐out corresponding to transplantation. Kidney function in the absence of transplant is definitely a legitimate target for inference. In cases of this kind, our discussion simply makes explicit what is often glossed over—that any analysis treating drop‐out as ignorable is, nevertheless, making untestable assumptions about things that, by definition, cannot be observed. In some other contexts, most obviously when drop‐out equates to natural death, any inference about a hypothetical drop‐out‐free population is of dubious practical relevance. Nevertheless, our view is that this need not preclude including a potentially infinite sequence of measurements as part of a joint model for measurements and time of death. In answer to Taylor's question concerning animal experimentation, planned sacrifices are missing completely at random (MCAR), whereas sacrifices in response to an observed large tumour size are missing at random (MAR). Hence, in conventional terms both kinds of drop‐out are ignorable. However, simply to conduct a standard likelihood‐based analysis of the non‐missing data would be too glib, not because there is anything wrong with modelling a hypothetical drop‐out‐free process in this setting—on the contrary, this is the natural process that operates in the absence of any intervention by the experimenter—but because the implied target for inference is not necessarily the most sensible interpretation of what precisely is meant by ‘the mean tumour size at 12 months’.
Longford's Yc could be construed as a mixture of Ya and Yb, with the mixture proportion referring to the rate of compliance in an operational setting; however, we suspect that he is making the stronger point that what happens in a controlled trial setting may or may not be a reliable guide to what happens in clinical practice. This is a fair point, but not specific to the topic of our paper.
We completely agree with the point that was made by Cox, and by Cook and Lawless, that recording the reason for drop‐out should always be included in the study protocol. We also agree, as suggested by Didelez, that the reason for drop‐out may affect the assumed model for Y2a. In answer to Longford's question about longitudinal design, we would suggest that discussion of the likely drop‐out rate and how this might be minimized should feature strongly. By far the best way to deal with drop‐outs is to avoid them. However, since this is not an achievable goal, we would suggest the recording of any collateral information that, by its inclusion as an explanatory or otherwise classifying variable, might render censoring independent, or nearly so. Put another way, the non‐ignorability of drop‐out can arise in part through a failure to record explanatory variables that are associated both with the measurement process and with the drop‐out process, in the same way that random effects in regression models can be thought of as representing unmeasured subject‐specific explanatory variables.
Hogan asks whether we can use our dynamic linear increment model to make inferences about Y2b. The short answer is no. Our primary purpose in introducing Y2b is to acknowledge explicitly that it is different from Y2a. We can, however, easily imagine situations in which drop‐out does not imply loss to follow‐up and Y2b is an observable quantity.
General modelling, sensitivity and diagnostic checking
We agree with Carpenter's statement that statistical methods should be principled and give valid results under stated assumptions. We are confused, however, by his subsequent remarks on conflation of assumptions and analysis and our proposal leading to estimates that are valid under a mechanism that is different from that assumed. If our assumptions are correct we shall obtain the right answers. Otherwise we may not. We claim nothing more, and we claim nothing less.
Carpenter uses the two‐time‐point example to raise, we think, three issues: first, possible bias of the estimators (his equations (24) and (25)); second, which of the estimators to use; third, how assumptions about the missing data mechanism change with covariate selection. His discussion throughout is in terms of the familiar MCAR–MAR–NMAR drop‐out terminology instead of the censoring interpretation of our paper. Two of his technical claims are not correct in general: that his equation (24) is not valid when Y2 is MAR, and that his equation (25) is only valid ifY2is MAR (his italics). Using the notation in Section 5 of our paper, a counter‐example to the former is when var(ɛ1)=0 in the model that is defined by our equations (15)–(18). A counter‐example to the latter occurs for the non‐MAR data model

for which our residual‐adjusted estimator is unbiased. These are minor corrections but perhaps illustrate the danger of attempting to coerce one framework into another. We do not claim that our estimators are valid beyond our assumptions and, contrary to Carpenter's statement, we do not violate the principles that are advocated in the paper. If it is indeed true that our estimators are biased under missingness at random and missingness not at random (which contradicts the italicized statement above) then the issue is of robustness and not validity.
Nor do we conflate assumptions with analysis. Our fundamental drop‐out assumption is that of independent censoring: given the observed and unobserved past there is no further information in knowing that drop‐out did or did not occur at time t. This is quite separate from the modelling issue, which requires us to assume that all relevant aspects of the past are included in our linear model for increments. This no more conflates assumptions with analysis than, for example, first assuming a fundamental MAR mechanism, then taking a parametric model for the longitudinal responses (or a logistic model for drop‐out, if using inverse probability weighting) and choosing the terms to include in that model. The underlying mechanism may indeed be MAR, but if the response distribution is incorrectly specified or the chosen model does not include the correct terms, then the results may be biased. We are not aware of any approach in which Carpenter's statement ‘Assuming the data are MAR we perform a valid analysis…’ can be made without other assumptions.
Carpenter concludes that our assumed missingness mechanism is implausible, which is a strong statement given that our assumption of independent censoring underpins the majority of event history methodology. We do not claim that independent censoring is likely to be true in all applications. But nor do we accept that it is less likely to be true than many other drop‐out mechanisms. For instance, and in answer to Zeng and Lin's second question, in the random‐intercept model that is mentioned in the text following our equation (20) it seems to us highly plausible to assume that drop‐out probability is determined by the random effect M1 rather than the values of Y1 and Y2. It is worth noting here the contributions by Commenges and Taylor, as they both draw an extremely useful distinction between modelling the system and modelling the observations. The random effect M1 attempts in a very simplistic way to capture the underlying and unobserved health of the individual and is in the spirit of modelling the system, whereas a standard MAR–MNAR model based on Y1 and Y2 is closer to modelling the observations.
Returning to the discussion of the two‐time‐point example that is given by Carpenter, we agree that in general the estimators that are defined by his equations (24) and (25) cannot both be simultaneously unbiased and that if we use our first diagnostic for guidance then there will be occasions when the wrong choice is made. How often this will happen is unknown. Carpenter comments briefly on simulation results that indicate non‐trivial bias. Without further details, we cannot offer a specific reply but, as we have indicated above, unless the missingness at random mechanism was included in our independent censoring class we would not claim unbiasedness for either of his equations (24) or (25). Clearly, further work on robustness would be helpful; the same is true for diagnostics, which are an area we believe to be underdeveloped in drop‐out modelling. We welcome therefore the constructive suggestions of Gandy, Martinussen and van Houwelingen. Gandy's formulation of the second diagnostic as a transform means that we can choose K to provide sensitivity to specified alternatives. It would be useful also to investigate properties of diagnostics under those alternatives, which means careful thought about what alternatives or misspecification may be of interest. As mentioned above, choice of covariates is important, as is functional form for covariates. We therefore appreciate the easy‐to‐apply resampling suggestion of Martinussen, which we intend to pursue. We shall pursue also the suggestions of van Houwelingen, which show his characteristic good sense. Plotting trajectories of observed means is valuable for exploratory purposes, and the predictive plot should have diagnostic value. We are not so sure about the retrospective plot, because the martingale structure is not time reversible. For example, suppose that the martingale follows a random walk, Mt=ΣstZs, with Zt=±1, but those with Zt−1 always drop out at time t. Then, for subjects who are observed at t the increment between t−1 and t has mean 0, whereas the increment between t−2 and t−1 has mean 1, since all subjects in this group have Zt−1=1. So the pattern depends on t and the profiles will not be parallel for different drop‐out groups. If the drop‐out mechanism is simple and stationary then there should be similarities between the profiles, but this stationarity is not required under our proposal: in the present example, for instance, we could have positive steps causing drop‐out at some t and negative steps causing drop‐out at other t.
No matter how careful the diagnostics, in drop‐out modelling there is invariably the need for untestable assumptions of one sort of another and we agree with the suggestions of Hogan, Molenberghs and Verbeke, and Carpenter that sensitivity methods should be developed for our proposal. How best to do this is not immediately obvious given that we do not model the drop‐out process, so there is no single parameter to vary to introduce degrees of dependent censoring. Hogan's proposal to consider models for means is consistent with our moment‐based approach and provides an excellent foundation for further work. It also shows the attraction of thinking in terms of selection and mixture factorizations rather than models. Both can be used within the same analysis, for their own good purposes and in an entirely consistent way. In particular, we find Hogan's specific comments about pattern–mixture factorizations compelling from a diagnostic checking perspective, though less so from a modelling perspective because the identifying restrictions that are used in practice often seem unnatural.
The dynamic linear increment model and possible extensions
Aalen notes that the distinction between longitudinal data analysis and event history analysis is somewhat artificial, unified as they are by the central notion of time. We agree, and we feel that there is much to be gained from cross‐pollination between the two fields. We have unashamedly imported event history analysis methdology to tackle longitudinal data with drop‐out. Sousa's contribution is an example of the opposite case, where multivariate normal distribution machinery, which is used routinely in longitudinal data analysis, is extended to incorporate an event time also. Both Aalen and Commenges comment on the similarities between our work and that of Fosen et al. (2006b). Their work evolved contemporaneously with ours, and there does seem to be considerable potential for combining the two methodologies. It is often true that both a quantitative response and a sequence of ‘failure’ times are of scientific interest, in which case a joint model for events and the longitudinal response is needed.
Although Aalen favours the dynamic approach that we advocate, other discussants (Cook and Lawless, Pipper and Scheike, and Zeng and Lin) wonder whether a marginal model might be preferable. We note that the parameters of our dynamic model have a marginal interpretation in the case where only exogenous covariates are used: if rX denotes the history of these covariates, then E{ΔY(t)|rXt−}=X(t) β(t). This interpretation is lost when dynamic covariates are used. However, direct effects of treatment which, as Solis‐Trapala observes, are often of scientific interest, can still be recovered along the lines of Fosen et al. (2006b). Martinussen wonders why we aim at cumulative regression coefficients. We agree with Lin (and the body of event history analysis literature) that typically the cumulative coefficients B(t) are more stable and easier to interpret than the incremental coefficients β(t).
As Pipper and Scheike, and Lin correctly point out, special care is needed in the choice of covariates (dynamic or otherwise). Further, Longford highlights the fact that martingale residuals are not preserved under non‐linear transformations. We do indeed have a rich set of covariates, including dynamic covariates, but choosing between them, and between transformations of the response, still amounts to model building in the usual fashion. In some cases this may suggest simpler forms for the effects of covariates, such as the semiparametric models that are described by Pipper and Scheike, and by Molenberghs and Verbeke. The computationally undemanding nature of our dynamic modelling approach makes it easier in practice to pay more attention to model building, and to model criticism.
We must refute the claim by Scheike and Pipper that predictable censoring implies that the time of drop‐out is known at the start of the study. Although we do assume that R(t+1) becomes known at some point strictly before time t+1, we do not assume that this must be at time t. More formally, we do not insist that rRt−=rRt−1, and note that not just drop‐out, but anything that happens by time t−1, can influence drop‐out at time t.
Cox wonders whether our estimates may behave poorly if, in fact, drop‐out is (nearly) uninformative. Though we do make untestable assumptions about the drop‐out process, a further advantage to not modelling it explicitly is that we avoid the problems that he mentions when a joint model collapses to the singular case of separate analyses.
Zeng and Lin provide an elegant approach to inference. They give a closed form estimator for the variance of
, which would also combine naturally with Lunn's suggestion of a weighted least squares estimate of B(t). The fact that both these extensions would require only minimal effort to implement in standard statistical software is particularly pleasing to us, and we wonder whether the analogy with estimating equations and weighted least squares could be further exploited to extend the increments approach to non‐continuous and unbalanced data. Molenberghs and Verbeke point out that such extensions must recognize that unbalanced data further subdivide according to whether or not the observation times are genuinely stochastic, the former case being the more problematic since observation times may themselves be informative; see, for example, Lin et al. (2004).
Pipper and Scheike, and Solis‐Trapala comment that the inclusion of measurement error in our model is ambiguous and unclear. Though Sousa supplies the intuition behind the lagged error appearing in our equation (10), there is a sense in which the error term is ambiguous. Essentially, its only purpose is to show that it can safely be ignored; the slightly awkward treatment is required because uncorrelated error is not a martingale. We hope that, in future, a less cumbersome treatment of the error term may be devised.

It is certainly true that the random effects could now be any process whose differences are a martingale (e.g. a random slope). However, unlike when introducing first‐order differences, there is no corresponding gain in simplicity of treatment of drop‐outs; in fact, treating drop‐out becomes slightly more difficult. To make sense of such a model we would need to define Y(T+k)=Y(T)+k{Y(T)−Y(T−1)} so that post‐drop‐out ‘increments’ remain zero. Further, examining higher order increments can actually conceal structure in the data. As pointed out by Harrison (1973) in a classical time series setting, higher order differencing can be a very blunt instrument.
Taylor observes that modelling increments in discrete time is analogous to modelling slopes in continuous time. He cites a model where random effects on a slope form an Ornstein–Uhlenbeck process, leading to an integrated Ornstein–Uhlenbeck process on the responses. The more general point is that, by modelling response increments with a certain residual structure, we gain a smoother residual structure on the responses, which is often an appealing feature for longitudinal data. As Cox notes, it is imperative that we give serious consideration to the form of the random effects, especially when used over extended periods. That strong implications are associated with a martingale structure is no less true of alternatives such as the random intercept and slope.
Lunn's concern that she has misread our intentions in the simulation study is unfounded: the inconsistency in the drop‐out mechanism arises simply from having different authors responsible for the two scenarios! However, we believe that the increases in length in the box plots are due mainly to drop‐outs, and not to increased variability in the random effects (as she supposes). We share Taylor's surprise at both the efficiency of our approach and the inefficiency of inverse probability weighting, but we do not yet have sufficient experience with these methods to suggest why, or even whether, this is generally the case.
Appendix
Appendix A: Fitting dynamic linear models by using standard software

of β(t) for each t ∈ rT, which may be extracted by way of the coef method. The cumulative sum of these estimates

. Additionally, estimated standard errors


without the need for bootstrapping.
References in the discussion
Citing Literature
Number of times cited according to CrossRef: 30
- Rocha Gomes Carlos, Cintra Mailart Mariane, Santos Rocha Rafael, Lenin Benitez Sellan Pablo, Célia Mondragón Contreras Sheila, Di Nicoló Rebeca, Bühler Borges Alessandra, The influence of a liner on deep bulk-fill restorations: Randomized clinical trial, Journal of Dentistry, 10.1016/j.jdent.2020.103454, (103454), (2020).
- Diana Carolina Franco Soto, Antonio Carlos Pedroso de Lima, Julio Da Motta Singer, A Birnbaum-Saunders Model for Joint Survival and Longitudinal Analysis of Congestive Heart Failure Data, Revista Colombiana de Estadística, 10.15446/rce.v43n1.77851, 43, 1, (83-101), (2020).
- Amal Almohisen, Robin Henderson, Arwa M. Alshingiti, An Alternative Sensitivity Approach for Longitudinal Analysis with Dropout, Journal of Probability and Statistics, 10.1155/2019/1019303, 2019, (1-10), (2019).
- Isabelle Hansson, Anne Ingeborg Berg, Valgeir Thorvaldsson, Can Personality Predict Longitudinal Study Attrition? Evidence from a Population-Based Sample of Older Adults, Journal of Research in Personality, 10.1016/j.jrp.2018.10.002, (2018).
- Lan Wen, Graciela Muniz Terrera, Shaun R Seaman, Methods for handling longitudinal outcome processes truncated by dropout and death, Biostatistics, 10.1093/biostatistics/kxx045, 19, 4, (407-425), (2017).
- Jon Michael Gran, Rune Hoff, Kjetil Røysland, Bruno Ledergerber, James Young, Odd O. Aalen, Estimating the treatment effect on the treated under time‐dependent confounding in an application to the Swiss HIV Cohort Study, Journal of the Royal Statistical Society: Series C (Applied Statistics), 10.1111/rssc.12221, 67, 1, (103-125), (2017).
- Naomi H. Martin, Fowzia Ibrahim, Brian Tom, James Galloway, Allan Wailoo, Jonathan Tosh, Heidi Lempp, Louise Prothero, Sofia Georgopoulou, Jackie Sturt, David L. Scott, Does intensive management improve remission rates in patients with intermediate rheumatoid arthritis? (the TITRATE trial): study protocol for a randomised controlled trial, Trials, 10.1186/s13063-017-2330-8, 18, 1, (2017).
- Aidan G. O’Keeffe, Daniel M. Farewell, Brian D. M. Tom, Vernon T. Farewell, Multiple Imputation of Missing Composite Outcomes in Longitudinal Data, Statistics in Biosciences, 10.1007/s12561-016-9146-z, 8, 2, (310-332), (2016).
- OO Aalen, K Røysland, JM Gran, R Kouyos, T Lange, Can we believe the DAGs? A comment on the relationship between causal DAGs and mechanisms, Statistical Methods in Medical Research, 10.1177/0962280213520436, 25, 5, (2294-2314), (2016).
- Hannes Kröger, Johan Fritzell, Rasmus Hoffmann, The Association of Levels of and Decline in Grip Strength in Old Age with Trajectories of Life Course Occupational Position, PLOS ONE, 10.1371/journal.pone.0155954, 11, 5, (e0155954), (2016).
- Antonello Maruotti, Handling non-ignorable dropouts in longitudinal data: a conditional model based on a latent Markov heterogeneity structure, TEST, 10.1007/s11749-014-0397-z, 24, 1, (84-109), (2014).
- Songfeng Wang, Jiajia Zhang, Wenbin Lu, Sample size calculation for the proportional hazards model with a time-dependent covariate, Computational Statistics & Data Analysis, 10.1016/j.csda.2014.01.018, 74, (217-227), (2014).
- Ming Wang, Generalized Estimating Equations in Longitudinal Data Analysis: A Review and Recent Developments, Advances in Statistics, 10.1155/2014/303728, 2014, (1-11), (2014).
- Dimitris Rizopoulos, Emmanuel Lesaffre, Introduction to the special issue on joint modelling techniques, Statistical Methods in Medical Research, 10.1177/0962280212445800, 23, 1, (3-10), (2014).
- Bo Fu, Wenyang Zhang, A Smoothing Dynamic Model for Irregularly Time-Spaced Longitudinal Data, Journal of Biopharmaceutical Statistics, 10.1080/10543406.2014.901339, 24, 4, (944-957), (2014).
- David L Scott, Fowzia Ibrahim, Vern Farewell, Aidan G O’Keeffe, Margaret Ma, David Walker, Margaret Heslin, Anita Patel, Gabrielle Kingsley, Randomised controlled trial of Tumour necrosis factor inhibitors Against Combination Intensive Therapy with conventional disease-modifying antirheumatic drugs in established rheumatoid arthritis: the TACIT trial and associated systematic reviews, Health Technology Assessment, 10.3310/hta18660, 18, 66, (1-164), (2014).
- Milena Falcaro, Neil Pendleton, Andrew Pickles, Analysing censored longitudinal data with non‐ignorable missing values: depression in older age, Journal of the Royal Statistical Society: Series A (Statistics in Society), 10.1111/j.1467-985X.2011.01034.x, 176, 2, (415-430), (2012).
- Daniel Farewell, Tayyeb A. Tahir, Jonathan Bisson, Statistical methods in randomised controlled trials for delirium, Journal of Psychosomatic Research, 10.1016/j.jpsychores.2012.06.002, 73, 3, (197-204), (2012).
- K Pillai, Survival Analysis, A Handbook of Applied Statistics in Pharmacology, 10.1201/b13152, (143-149), (2012).
- Gabrielle H. Kingsley, Anna Kowalczyk, Helen Taylor, Fowzia Ibrahim, Jonathan C. Packham, Neil J. McHugh, Diarmuid M. Mulherin, George D. Kitas, Kuntal Chakravarty, Brian D. M. Tom, Aidan G. O'Keeffe, Peter J. Maddison, David L. Scott, A randomized placebo-controlled trial of methotrexate in psoriatic arthritis, Rheumatology, 10.1093/rheumatology/kes001, 51, 8, (1368-1377), (2012).
- Dimitris Rizopoulos, Bibliography, Joint Models for Longitudinal and Time-to-Event Data, 10.1201/b12208-12, (239-255), (2012).
- Odd O. Aalen, Kjetil Røysland, Jon Michael Gran, Bruno Ledergerber, Causality, mediation and time: a dynamic viewpoint, Journal of the Royal Statistical Society: Series A (Statistics in Society), 10.1111/j.1467-985X.2011.01030.x, 175, 4, (831-861), (2012).
- Ralitza Gueorguieva, Robert Rosenheck, Haiqun Lin, Joint modelling of longitudinal outcome and interval‐censored competing risk dropout in a schizophrenia clinical trial, Journal of the Royal Statistical Society: Series A (Statistics in Society), 10.1111/j.1467-985X.2011.00719.x, 175, 2, (417-433), (2011).
- Cyntha A. Struthers, Donald L. Mcleish, A particular diffusion model for incomplete longitudinal data: application to the multicenter AIDS cohort study, Biostatistics, 10.1093/biostatistics/kxq079, 12, 3, (493-505), (2011).
- Aidan G O’Keeffe, Daniel M Farewell, Brian DM Tom, Vernon T Farewell, Using linear increment models for the imputation of missing composite outcomes in randomized trials, Trials, 10.1186/1745-6215-12-S1-A60, 12, Suppl 1, (A60), (2011).
- Odd O. Aalen, Nina Gunnes, A dynamic approach for reconstructing missing longitudinal data using the linear increments model, Biostatistics, 10.1093/biostatistics/kxq014, 11, 3, (453-472), (2010).
- Daniel Farewell, Robin Henderson, Longitudinal perspectives on event history analysis, Lifetime Data Analysis, 10.1007/s10985-009-9137-1, 16, 1, (102-117), (2009).
- Entisar Elgmati, Daniel Farewell, Robin Henderson, A martingale residual diagnostic for longitudinal and recurrent event data, Lifetime Data Analysis, 10.1007/s10985-009-9129-1, 16, 1, (118-135), (2009).
- Nina Gunnes, Taral G Seierstad, Steinar Aamdal, Paal F Brunsvig, Anne-Birgitte Jacobsen, Stein Sundstrøm, Odd O Aalen, Assessing quality of life in a randomized clinical trial: Correcting for missing data, BMC Medical Research Methodology, 10.1186/1471-2288-9-28, 9, 1, (2009).
- E. A. Catchpole, B. J. T. Morgan, G. Tavecchia, A new method for analysing discrete life history data with missing covariate values, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 10.1111/j.1467-9868.2007.00644.x, 70, 2, (445-460), (2008).




