Information‐anchored sensitivity analysis: theory and application

Summary Analysis of longitudinal randomized clinical trials is frequently complicated because patients deviate from the protocol. Where such deviations are relevant for the estimand, we are typically required to make an untestable assumption about post‐deviation behaviour to perform our primary analysis and to estimate the treatment effect. In such settings, it is now widely recognized that we should follow this with sensitivity analyses to explore the robustness of our inferences to alternative assumptions about post‐deviation behaviour. Although there has been much work on how to conduct such sensitivity analyses, little attention has been given to the appropriate loss of information due to missing data within sensitivity analysis. We argue that more attention needs to be given to this issue, showing that it is quite possible for sensitivity analysis to decrease and increase the information about the treatment effect. To address this critical issue, we introduce the concept of information‐anchored sensitivity analysis. By this we mean sensitivity analyses in which the proportion of information about the treatment estimate lost because of missing data is the same as the proportion of information about the treatment estimate lost because of missing data in the primary analysis. We argue that this forms a transparent, practical starting point for interpretation of sensitivity analysis. We then derive results showing that, for longitudinal continuous data, a broad class of controlled and reference‐based sensitivity analyses performed by multiple imputation are information anchored. We illustrate the theory with simulations and an analysis of a peer review trial and then discuss our work in the context of other recent work in this area. Our results give a theoretical basis for the use of controlled multiple‐imputation procedures for sensitivity analysis.


Introduction
The statistical analysis of longitudinal randomised clinical trials is frequently complicated because patients deviate from the trial protocol.Such deviations are increasingly arXiv:1805.05795v1[stat.ME] 15 May 2018 referred to as inter-current events.For example, patients might withdraw from trial treatment, switch treatment, receive additional rescue therapy or simply become lost to follow-up.Post-deviation, such patients' data (if available) will often no longer be directly relevant for the primary estimand.Consequently, such post-deviation data are often set as missing; any observed post-deviation data can then inform the missing data assumptions.Nevertheless, however the analysis is approached, unverifiable assumptions about aspects of the statistical distribution of the post-deviation data must be made.
Recognising this, recent regulatory guidelines from the European Medicines Agency Committee for Medicinal Products for Human Use (2010) and a United States Food and Drug Administration mandated panel report by the National Research Council (2010) emphasise the importance of conducting sensitivity analyses.Further, the recent publication of the International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use (ICH) E9 (R1) addendum on estimands and sensitivity analysis in clinical trials (2017) raises important issues about how such sensitivity analyses should be approached.It highlights how in any trial setting it is important first to define the estimand of interest.This will inform what data are missing and how such missing data should be handled in the primary analysis.Sensitivity analysis, which targets the same estimand, should subsequently be undertaken to address the robustness of inferences to the underlying assumptions, including those made for the missing data.
We propose splitting sensitivity analyses for missing data into two broad classes.In both classes, one or more alternative sets of assumptions (or scenarios) are postulated and the sensitivity of the conclusions to these alternative scenarios is to be assessed.In our first class, the primary analysis model is retained in the sensitivity analysis.This enables the exclusive assessment of the impact of alternative missing data assumptions on the primary outcome of interest.For example, for our sensitivity analysis we may impute missing data under a missing not at random (MNAR) assumption, and fit the primary analysis model to these imputed data.When performed by Multiple Imputation (MI), class-1 sensitivity analyses are therefore uncongenial, in the sense described by Meng (1994) and Xie and Meng (2017).Conversely, in the second class, for each set of sensitivity assumptions an appropriate analysis model is identified and fitted.Hence, each such analysis model is consistent with its assumptions, which is why the analysis models generally change as we move from scenario to scenario.
In the first class of sensitivity analyses, the assumptions of the primary analysis model may be inconsistent to some degree with the data generating mechanism postulated by the sensitivity analysis assumption.Nevertheless, a strong advantage of such sensitivity analysis is the avoidance of full modelling under various, potentially very complex, missing data assumptions.However, when performing class-1 sensitivity analyses, the properties of an estimator under the primary analysis may change as we move to the sensitivity analysis.In particular, we will see that a sensible variance estimator for the primary analysis may behave in an unexpected way under certain sensitivity analysis scenarios, for example decreasing as the proportion of missing values increases.In regulatory work, particularly in class-1 sensitivity analyses, it is therefore important to appreciate fully the quantity and nature of any additional statistical information about the treatment estimate that may arise in the sensitivity analysis, relative to the primary analysis.This superficially abstract point can be readily illustrated.Suppose a study intends to take measurements on n patients Y 1 , . . ., Y n , from a population with known variance σ 2 , and the estimator is the mean.If no data are missing, then the statistical information about the mean is n/σ 2 .Now suppose that n d observations are missing.We will perform a class-1 sensitivity analysis, so that the estimator is the mean for both our primary and sensitivity analysis.Our primary analysis will assume data are missing completely at random, and the sensitivity analysis will assume that the missing values are from patients with the same mean, but a different variance, σ 2 m .Under our primary analysis assumption, we can obtain valid inference by calculating the mean of the n − n d observed values, or by using multiple imputation for the missing values.In both cases the information about the mean is the same: (n − n d )/σ 2 .
Under our class-1 sensitivity analysis, we multiply impute the missing data under our assumption, and again our estimator is the mean.Now, however, the statistical information will be approximately n 2 /{(n − n d )σ 2 + n d σ 2 m }.Further, the information about the mean from the sensitivity analysis depends on σ 2 m .Since σ 2 m is not estimable, this information is under the control of the analyst.This is illustrated by Figure 1, which shows how the information about the mean varies with σ 2 m , when n = 100, n d = 20 and σ 2 = 1.When σ 2 m < σ 2 , the information about the mean in the sensitivity analysis is greater than from the intended 100 observations; when 1 ≤ σ 2 m ≤ 2.25 then the information is greater than in the (n − n d ) observations we were able to obtain, and when σ 2 m > 2.25, the information is less than in the observed data (n − n d ) observations we were able to obtain.
We believe the ICH E9 (R1) addendum (2017) will lead to sensitivity analysis playing a much more central role; in this context we believe it important for statisticians and regulators to be aware of how-compared to the primary analysis-information can be removed or added in the sensitivity analysis.Our purpose in this paper is to: 1. Consider the information in sensitivity analyses, arguing that sensitivity analysis in a clinical trial should be information-anchored-as defined below-relative to the primary analysis, and 2. Demonstrate that using reference-and δ-based controlled multiple imputation, with Rubin's rules, to perform class-1 sensitivity analyses is information-anchored.
An important practical consequence of our work is that it provides a set of conditions that can be imposed on class-1 sensitivity analyses to ensure that-relative to the primary analysis-they neither create, nor destroy, statistical information.We believe this provides important reassurance for their use, for example in the regulatory setting.
The plan for the rest of the paper is as follows.Section 2 defines the concept of information-anchoring in sensitivity analysis.Section 3 considers class-1 sensitivity analysis by reference-and δ-based controlled multiple imputation, and presents our main theoretical results on information-anchoring within this setting.Section 4 briefly reviews class-2 sensitivity analyses from this perspective.In Section 5 we present a simulation study which illustrates our theory for information-anchored sensitivity analysis, which is then applied to a trial of training for peer reviewers in Section 6.We conclude with a discussion in Section 7.

Information-Anchored Sensitivity Analysis
We have seen in the simple example above how a sensitivity analysis can change the statistical information about a treatment estimate.We now define information-anchored sensitivity analyses, which hold the proportion of information lost due to missing data constant across the primary and sensitivity analyses.
Suppose that a clinical trial intends to collect data from 2n patients, denoted Y, in order to estimate a treatment effect θ.However, a number of patients do not give complete data.Denote the observed data by Y obs , and missing data by Y miss .Consistent with the ICH-E9 (R1) addendum (2017), we make a primary set of assumptions, under which we perform the primary analysis.We then make a sensitivity set of assumptions, under which we perform the sensitivity analysis.Both primary and sensitivity assumptions (i) specify the distribution [Y miss |Y obs ], (ii) could be true, yet (iii) cannot be verified from Y obs .
Let θobs, primary be the estimate of θ under the primary analysis assumption.Further, suppose we were able to observe a realisation of Y miss under the primary assumption.Putting these data together with Y obs gives us a complete set of observed data, which actually follows the primary assumption: we denote this by Y primary , and the corresponding estimate of θ by θfull, primary .We denote the observed information about θ by I( θobs, primary ) and I( θfull, primary ), respectively.Then, I( θobs, primary ) I( θfull, primary ) < 1, reflecting the loss of information about θ due to missing data.Defining corresponding quantities under the sensitivity assumptions for the chosen sensitivity analysis procedure (be this class-1 or class-2) we have, again reflecting the loss of information about θ due to missing data-but now under the sensitivity assumptions.
Comparing these leads us to the following definitions, When analysing a clinical trial, we believe an information-positive sensitivity analysis is rarely justifiable, implying as it does that the more data are missing, the more certain we are about the treatment effect under the sensitivity analysis.Conversely, while information-negative sensitivity analyses provide an incentive for minimising missing data, there is no natural consensus about the appropriate loss of information.Therefore, we argue that information-anchored sensitivity analyses are the natural starting point.In regulatory work they provide a level playing field between regulators and industry, allowing the focus to be on the average response to treatment among the unobserved patients.
The definitions above are quite general, applying directly to class-1 and class-2 sensitivity analyses, and all types of de jure (on-treatment) and de facto (as-observed) assumptions.We now discuss class-1 sensitivity analyses from the information perspective and present our theory for information-anchoring.

Class-1 Sensitivity Analysis and Theory for Information-Anchoring
While class-1 sensitivity analyses can be performed without using multiple imputation (Lu, 2014;Liu and Pang, 2016;Tang, 2017), multiple imputation is the most flexible approach, and often the simplest to implement (e.g., using the SAS software from www.missingdata.org.uk or Stata software by Cro et al. (2016)).This is generally called controlled multiple imputation, because the form of the imputation for the missing data is controlled by the analyst.So, for example, the analyst can control the imputed data mean to be δ below that under missing at random (MAR).See, for example, Mallinckrodt (2013) Ch. 10, O'Kelly and Ratitch (2014), p. 284-319 and Ayele et al. (2014).
One approach is to obtain information about parameters that control the departure from MAR from experts (Mason et al., 2017), but this is controversial (Heitjan, 2017), and challenging for longitudinal data where multiple parameters are involved.An alternative, as introduced by Little and Yau (1996) and developed and discussed further more recently by, among others, Carpenter et al. (2013); Ratitch et al. (2013); Liu and Pang (2016), is reference-based multiple imputation.In this approach, the distribution of the missing data is specified by reference to other groups of patients.This enables contextually relevant qualitative assumptions to be explored and avoids the need to formally specify numerical sensitivity parameters (these are implicit consequences of the appropriate reference for a patient).Some examples are listed in Table 1.For example, we may explore the consequences of patients in an active arm 'jumping to reference' post-deviation.In practice the appropriate imputation model depends critically on the particular clinical setting and what assumptions are considered credible.Such analyses can be performed using the reference-based MI algorithm in Appendix A implemented in Cro et al. (2016).Overall, this approach is both very flexible, and accessible, since patients' missing outcomes are specified qualitatively-by reference to other groups of patients in the study.This explains its increasing popularity (Philipsen et al., 2015;Jans et al., 2015;Billings et al., 2018;Atri et al., 2018).
The above papers all focus on clinical trials with continuous outcome measures that are collected longitudinally, and modelled using the multivariate normal distribution.We consider the same setting, and give criteria for class-1 sensitivity analysis using controlled multiple imputation with Rubin's variance formula to be information-anchored.This shows that most forms of δ-and reference-based imputation proposed in the literature are, to a good approximation, information-anchored.It also shows that, in class-1 settings, uncritical use of the conventional primary analysis variance estimator is often information-positive, which is undesirable in practice.
There are two principal reasons for this.The first is that class-1 sensitivity analyses retain the primary analysis model in the sensitivity analysis.However, in the sensitivity analysis data assumptions are not wholly compatible with those of the primary analysis model.In particular variance estimators may behave in unexpected ways.The second reason is that reference-based methods essentially use the data twice, for example, by using data from the reference arm (i) to impute missing data in an active arm and (ii) to estimate the effect of treatment in the reference arm.

Theoretical Results
The presentation of our theoretical results is structured as follows.We begin by describing our data, model, primary analysis and sensitivity analysis.We show in Corollary 2 that, when all data can be fully observed, for our treatment estimate θ, Theorem 1 then defines the information-anchored variance and derives a general expression for the difference between this and the variance from Rubin's rules.Finally, we show, in the remarks following the theorem, that in practice this difference is small.Imputes assuming that following dropout a patients mean profile follows that observed in the reference arm.Pre-drop out means come from the randomised arm.

Copy increments in reference (CIR)
Forms post-dropout means by copying increments in the reference arm.Pre-drop out means come from the randomised arm.

Last mean carried forward (LMCF)
Forms post-dropout means by carrying forward the randomised arm mean at dropout.

Copy reference (CR)
The conditional profile given the history is copied from the reference group i.e. imputes as if randomised to reference arm, pre-and post-drop out means come from the reference arm.

External information controlled MI methods:
The δ-method Impute under randomised arm MAR and subtract/add by fixed δ.

Trial Data
Consider a two-arm trial, which includes n patients randomised to an active arm and n patients randomised to a reference arm (total 2n patients within the trial).Outcome data are recorded at j = 1, ..., J visits, where visit j = 1 is baseline.For patient i in treatment arm z, where z = a indicates active arm assignment and z = r indicates reference arm assignment, let Y z,i,j denote the outcome at time j.
We wish to estimate the treatment effect at the end of the follow-up, time J.Our analysis model is the regression of the outcome at time J on treatment and baseline (i.e., ANCOVA).Now suppose a number of patients are lost to follow-up in the active arm (for simplicity, we assume for now the reference arm data are complete).Our primary assumption is MAR.
Our primary analysis uses all the observed values, imputes the missing data under MAR, fits the ANCOVA model to each imputed data set and combines the results (this is essentially equivalent to fitting a mixed model with unstructured mean and covariance matrix to the observed values, see Carpenter and Kenward (2008), Chapter 3).
Our sensitivity analysis uses controlled multiple imputation, as formally defined below.This could include a δ-based method or one of the reference-based methods given in Table 1; all reference-based MI methods can be implemented using the generic algorithm in Appendix A.
For each trial arm, we assume a multivariate normal model, with common covariance matrix, so that for patient i who has no missing values: , where z = a for the active patients and z = r for the reference patients.Now suppose all reference group patients follow the protocol, but n d = n − n o active patients deviate from the protocol.Suppose it was possible to continue to observe these n d patients, but now their post-deviation data follows the controlled model: The term 'controlled' means that the analyst controls the post-deviation distribution.
Here, for patient i, the first index indicates active/deviation, the second the time of deviation, and the third the visit number.Different patients can deviate at different times, and this general formulation allows the pattern of their post-deviation means to differ depending on their deviation time.This encompasses all the settings in Table 1, and others besides.
To present the theory, we first consider the case where the primary analysis does not adjust for baseline, extending to the baseline-adjusted case in Corollary 2.
Proposition 1 For the trial data described above, when the analysis model is a difference in means at the final time point with the usual sample variance estimate in both observed and controlled settings, then: (a) If all patients follow the protocol and no data are missing, then the expectation of the variance estimate is: n .
(b) If n d patients deviate and are observed following the controlled model ( 2) the expectation of the variance estimate is: where ∆ d,j = µ a,J − µ d,j,J , ∆ d,p,q = µ d,p,J − µ d,q,J and we let (n − 1) → n.

Corollary 1
For clinical trials designed to detect a difference of µ a,J − µ r,J = ∆, with a significance level of α and power β, at the final visit, J,

Proof:
First notice that the standard sample size formula implies Therefore, ∆ 2 is O(n −1 ).Further, since in any trial, all ∆ 2 d,p,J can be written as ∆ 2 d,p,J = κ d,p,J ∆ 2 for some constant κ d,p,j , we have ∆ 2 d,p,J = O(n −1 ).Following the same arguments, ∆ 2 d,j = O(n −1 ).Second, notice that n o /n is the proportion of active patients who complete the trial, and n d,j /n is the proportion who deviate at time j.Therefore, (3)

Corollary 2
Under the conditions of Corollary 1, if the primary analysis model is a linear regression of the outcome at the final time point, adjusted for baseline, then (3) still holds.

Proof:
Replace the unconditional variance, σ 2 J,J , with the variance conditional on baseline, We now use this result in the context of reference-based multiple imputation to calculate the difference between our defined information-anchored variance and Rubin's multiple imputation variance.
Theorem 1 Consider a two-arm trial which includes n patients randomised to an active arm and n patients randomised to a reference arm.Measurement data is recorded at j = 1, . . ., J visits (where visit 1 is baseline).The primary analysis model is a linear regression of the outcome at the final time point (visit J) on baseline outcome and treatment.Suppose all n of the reference arm are completely observed on reference treatment over the full duration of the trial (at all J visits) but in the active arm, only n o are observed without deviation.The remaining n d patients in the active arm deviate at some point during the trial post-baseline in a monotone fashion (such that n o + n d = n).Specifically, we assume a proportion π d,j = n d,j /n drop out at each visit, for j > 1 and their data are missing post-deviation.
Assume that the primary design-based analysis model satisfies (3), and that the variance covariance matrix for the data is the same in each arm.For the patient deviation pattern in the active arm beginning at time j, let Pa,d,j be the j × 1 mean vector of the n d,j responses at times 1, . . ., (j − 1) plus a 1 (to allow for an intercept in the imputation model).
Suppose the primary analysis is performed by MI assuming within-arm MAR.Let Vobs, primary denote the estimated variance for the treatment effect under the primary MAR assumption.Subsequently we perform class-1 sensitivity analysis via referencebased MI, i.e. under (2), using the imputation algorithm in Appendix A. This general formulation includes all the reference-based options in Table 1.As we are doing class-1 sensitivity analysis, the primary analysis model is used to analyse the imputed data.
Then the difference between the information-anchored variance of the sensitivity analysis treatment estimate, denoted by Vanchored , which by definition is ( Vobs, primary / Vfull, primary ) × Vfull, sensitivity and Rubin's MI variance, denoted by VRubin's, MI , is (4) Here V primary,j is the variance-covariance matrix of the parameter estimates in the primary MAR imputation model for deviation at time j and V sensitivity,j is the variancecovariance matrix of the parameter estimates in the imputation model for deviation at time j, defined by the reference-based sensitivity analysis assumption.Bprimary is the between-imputation variance and Ŵprimary is the within-imputation variance of the treatment effect in the primary analysis, both under MAR.
Theorem 1 establishes the difference between the information-anchored variance and Rubin's rules variance.To show that class-1 sensitivity analysis by reference-based multiple imputation is information-anchored, we need to consider how close expression (4) is to zero.
The key quantity driving the approximation is the first of the two terms.Notice that for each deviation time, j, the variance covariance matrix of the parameters of the on-treatment imputation model is V primary,j = Σ j /n o , where Σ j is the relevant submatrix of the variance-covariance matrix Σ of the J observations.The precise form of V sensitivity,j will depend on the sensitivity analysis imputation model.Consider data from the fully observed reference arm are used in the sensitivity imputation (e.g.copy reference).In this case, V sensitivity,j = Σ j /n, and Applying this line of argument to the other methods in Table 1 suggests that the error in the approximation will be small, and vanish asymptotically.
Thus we have established that class-1 referenced-based imputation sensitivity analysis is, to a good approximation, information-anchored.We illustrate this in the simulation study in Section 5.

Further Comments
(a) In the proof of Theorem 1, to simplify the argument, the variance-covariance matrix of the data Σ is assumed known in the imputation model.When-as will generally be the case-it has to be estimated, Carpenter and Kenward (2013), p. 58-59, show that, for the simple case of the sample mean, the additional bias is small, and vanishes asymptotically.This strongly suggests that any additional bias caused by estimating the variance covariance matrix will be small, and asymptotically irrelevant; this is borne out by our simulation studies below.
(b) For simplicity the theory treated the deviation pattern as fixed.We can replace all the proportions, π d,j by their sample estimates, and then take expectations over these in a further stage.As our results are asymptotic, the conclusions will be asymptotically equivalent.
(c) δ-method sensitivity analysis: We consider that at the final time point J imputed values for patients who deviate at time j (for j > 1) are edited by (J + 1 − j)δ to represent a change in the rate of response of δ per time point post-deviation.We now evaluate the size of the two terms in (4) separately.For the first term, when δ is fixed, the covariance matrix for the imputation coefficients under the primary analysis and the sensitivity analysis is identical for each missing data pattern j; the δ-method simply adds a constant to the imputed values.Consequently V primary,j = V sensitivity,j , thus π 2 d,j Pa,d,j [V primary,j − V sensitivity,j ] PT a,d,j = 0, and Rubin's rules give a very sharp approximation to the information-anchored variance.
However when δ is not fixed and we vary δ over the imputation set K, that is we suppose and the sensitivity analysis is information-negative.The extent of this is principally driven by the variance of δ k .Now consider the second term in (4).When the δ-method is used it is not necessarily the case that (3) holds, since ∆ d,j = µ a,J − µ d,j,J and ∆ d,p,q = µ d,p,J − µ d,q,J are not necessarily O(n −1 ).In the δ-based scenario, as outlined in Appendix B.1, Vfull, sensitivity = Vfull, primary + Q, where, Thus, for the δ-method the O(n −2 ) component in the second term of ( 4) is replaced with Q (as defined above).The composition of Q indicates that the informationanchoring performance of Rubin's variance estimate will also depend on the size of δ.Typically, the size of δ will not have a large effect since the terms in Q are all multiplied by components of the form n o n d,j /n 3 or n d,p n d,q /n 3 and thus will vanish asymptotically.Hence with a fixed δ adjustment, the information-anchoring approximation will be excellent.
(d) Improved information-anchoring: Remark (b) shows that, provided the underlying variance-covariance matrices of the data are similar, the key error term in the information-anchoring approximation is the difference in precision with which they are estimated.If all n patients are observed in the reference arm and n o in the active arm, this is 1 This suggests Rubin's rules will lead to improved information-anchoring if, instead of using all patients in the reference arm to estimate the imputation model for deviators at time j, a random n o are used.We have confirmed this by simulation, but the improvement is negligible when the proportion of missing data is < 40%, when simulations confirm the approximation is typically excellent.
(e) Theorem 1 suggests that, for a given deviation pattern, information-anchoring will be worse the greater the difference between the covariance matrix of the imputation coefficients under the primary and sensitivity analysis.However, we have not encountered examples where this has been a practical concern.
(f) We have not presented formal extensions of our theory to the case when we also have missing data in the reference arm.But this does not introduce any substantial errors in the information-anchoring approximation.With missing data in the reference arm, for each missing data pattern j, an additional component which depends on the difference between the variance of the imputation parameters in the primary on-treatment imputation model and sensitivity scenario imputation model for the reference arm, multiplied by the proportion of reference patients with that missing data pattern squared (denoted π 2 r,d,j ) is included.If reference arm data are imputed under within-arm MAR (as under CIR, CR or J2R) these terms will be zero.In the more general case, where different patterns of patients, across different arms, are imputed with different reference-based assumptions, additional non-zero error terms of the form as in the summation in (4) will be introduced; but again, for the reasons discussed above, these will typically be small.The covariance between the parameters of the active and reference arm sensitivity scenario imputation models for each missing data pattern also contributes to the sharpness of the approximation.The exact size of these additional error terms again depend on the specific sensitivity scenario and in some cases will be zero (e.g.LMCF).But each covariance term is always multiplied by the proportion of deviators in each arm with the associated missing data patterns (π d,j π r,d,j ), Pa,d,j and PT r,d,j (the j × 1 mean vector of the responses at times 1, . . ., (j − 1) for the reference patients deviating at time j, plus a 1 to allow for an intercept in the imputation model).Thus will be of a relatively small order in practice following the reasons discussed above.

Summary
Given a primary design-based analysis model, we have established in Proposition 1 a criterion which defines a general class of reference-based sensitivity analyses.If these sensitivity analyses are performed by MI, we have further established in Theorem 1 that they will be-to a good approximation-information-anchored, in line with the principles we set out in Section 2. We have also shown why the information-anchoring is particularly sharp for the δ-method of MI.

Class-2 Sensitivity Analyses and Information-Anchoring
A full exploration of information-anchoring for class-2 sensitivity analyses is beyond the scope of this article.Here, we focus on likelihood-based selection models (see, for example, Diggle and Kenward, 1994), and use the results of Molenberghs et al. (1998) to make links to pattern mixture models, which allows us to use the results we presented in Section 3.
Continuing with the setting in Section 3, consider a trial with scheduled measurement times of a continuous outcome measure at baseline and over the course of the followup.When data are complete, the primary analysis is the ANCOVA of the outcome measure at the scheduled end of follow-up on baseline and treatment group.Equivalent estimates and inferences can be obtained from a mixed model fitted to all the observed data, provided we have a common unstructured covariance matrix and a full treatmenttime and baseline-time interaction.Now suppose patients withdraw before the scheduled end of follow-up, and subsequent data are missing.The mixed model described in the previous paragraph then provides valid inference under the assumption that post-withdrawal data are MAR given baseline, treatment group and available follow-up data.A selection model that allows post-withdrawal data to be MNAR combines this mixed model with a model for the dropout process.Let R i,j = 1/0 if we observe/miss the outcome for patient i at scheduled visit j = 1, . . ., J.An illustrative selection model is: where the superscript 'R' denotes a selection model parameter, and the link function g is typically logit, probit or complementary log-log (the latter giving a discrete time proportional hazards model for withdrawal).
Usually there is little information on the informative missingness parameter δ R 2 in the data (Rotnitzky et al., 2000;Kenward, 1998), and this information will be highly dependent on the assumed data distribution.Therefore, in applications it is more useful to explore the robustness of inferences to specific, fixed, values of δ R 2 (δ R 2 = 0 corresponds to MAR).
For each of these specific values of δ R 2 , we may recast the selection model as a pattern mixture model, following Molenberghs et al. (1998).The differences between the observed and unobserved patterns are defined as functions of the fixed δ R 2 .However, these then become a particular example of the δ-method pattern mixture models considered in Section 3, which we have shown are information-anchoring.
More generally, local departures from MAR are asymptotically information-anchored.To see this, denote by θ the parameters in (5), apart from δ R 2 .For a fixed δ R 2 , let i( θ; δ R 2 ) be the observed information matrix at the corresponding maximum likelihood estimates θ.For regular log-likelihoods and a given data set, as we move away from MAR, for each element, i, of the information matrix i, the mean value theorem gives However, asymptotically the parameter estimates are normally distributed, so the third derivative of the likelihood (i.e. the RHS of ( 6)) goes to zero.Because the above holds when we use both the full data, and the partially observed data, it is sufficient to give information-anchoring.This is the basis for our intuition that, for most Phase III trials, class-2 sensitivity analyses can be treated as information-anchored for practical purposes.

Simulation Study
We now present a simulation study which illustrates the information-anchoring property of Rubin's variance formula, derived in Section 3. The simulation study is based on a double-blind chronic asthma randomised controlled trial conducted by Busse et al. (1998).The trial compared four doses of the active treatment budesonide against placebo on forced expiratory volume (FEV 1 recorded in litres) over a period of 12 weeks.FEV 1 measurements were recorded at baseline and after 2, 4, 8 and 12 weeks of treatment.The trial was designed to have 80% power (5% type-1 error) to detect a change of 0.23 litres in FEV 1 with 75 patients per arm, assuming a SD of 0.5 litres.We simulated longitudinal data, consisting of baseline and two follow-up time points (time 2 being week 4, and time 3 being week 12), from a multivariate normal distribution whose mean and covariance matrix were similar to those observed in the placebo and lowest active dose arm of this trial: µ placebo = [2.0,1.95, 1.9] , µ active = [2.0,2.21, µ a,3 ] (litres).
To test the approximation (3) we chose a sample size of n = 250 in each arm, giving a power of at least 90% in all scenarios.For each scenario, the analysis model was a linear regression of FEV 1 at visit 2 and baseline and treatment, and this was fitted to the full data.
Subsequently, for the active arm, we simulated monotone deviation completely at random.We varied the proportion of patients deviating overall from 0-50%.For each overall proportion deviating, around half the patients deviated completely at random before visit 2, and around half deviated completely at random before visit 3.All postdeviation data were set to missing.The reference arm was always fully observed.
For each simulated data set, the primary analysis assumed MAR, and we performed class-1 sensitivity analyses using each of the reference-based methods in Table 1.Fifty imputations were used for each analysis.For the δ-method, the unobserved data was postulated to be worse (than under MAR) by a fixed amount of δ = {0, −0.1, −0.5, −1}, for each time point post-deviation, where δ = 0 is equivalent to the primary, MAR analysis.Thus, for patients who deviated between visits 1 and 2, their MAR imputed observations at visit 2 were altered by δ and at time 3 by 2δ.For patients who deviated between visits 2 and 3, their MAR imputed observation at time 3 was altered by δ.
One thousand independent replicates were generated for each combination of µ a,3 and deviation.Our results focus on the visit 3 treatment effect and its variance.
In order to minimise the Monte-Carlo variability in our comparisons, we used the same set of 1000 datasets and deviation patterns for each sensitivity analysis.
Within each replication, for each sensitivity scenario, we also drew post-deviation data under this scenario, giving a complete scenario-specific data set.For each replication this allowed us to estimate the treatment effect and Vfull, sensitivity for each scenario.Then, we calculated the theoretical information-anchored variance, which by definition in Section 2 is Vanchored = ( Vobs, primary / Vfull, primary ) × Vfull, sensitivity .Rubin's variance estimate was calculated.Estimates were averaged over the 1000 simulations.All simulations were performed using Stata version 14 (StataCorp, 2015) and reference-based MI was conducted using the mimix program by Cro et al. (2016).

Simulation Results
Figure 2 shows the results, for each of the reference-based sensitivity scenarios in Table 1, and controlled multiple imputation with four values of δ.
The top four panels are for a moderate treatment effect of 0.3 (µ a,3 = 2.2), comparable to that found in the asthma trial.We see the results show excellent informationanchoring by Rubin's variance estimator for up to 40% of patients deviating.Notice the Sensitivity scenario: δ method, with δ = -1.0information-anchored variance is always greater than Vfull, sensitivity , the variance we would see if we were able to observe data under the sensitivity assumption.
These results are echoed by those with smaller and larger treatment effects (see Appendix C Figure 4).We conclude that, for realistic proportions of missing postdeviation data, reference-based multiple imputation using Rubin's variance estimator can be regarded as information-anchored.This is in contrast to the behaviour of the conventional variance estimator from the primary regression analysis.Across all four reference-based scenarios, this gets smallerand tends to zero-as the proportion of missing data increases, so yields increasingly information-positive inference as more data are missing!It is also smaller than the variance we would obtain if we were able to observe data under the sensitivity assumption.Therefore, (see Carpenter et al. (2014)) we believe this is not generally an appropriate variance estimator for class-1 sensitivity analyses.We return to this point below.Now consider the lower four panels of Figure 2, which show results for controlled multiple imputation using the δ-method.Again, consistent with the theory in Section 3, these show excellent information-anchoring by Rubin's variance estimator for all missingness scenarios for δ = 0, −0.1, −0.5 litres.Indeed, the information-anchoring approximation is better than for the reference-based methods above because the covariance matrix for the imputation coefficients under MAR and δ-based imputation are identical: term 1 in (4) disappears.
For contextually large δ = −1 litres, the approximation is excellent for up to 40% missing data.For greater proportions of missingness the approximation is not so sharp, and this is caused by the size of the second term in (4), which is larger with a bigger δ and greater proportion of missing post-deviation data.
For the δ-method we also see using the conventional variance estimator from the primary analysis is also information-anchored.The reason for different behaviour here than for reference-based methods is that reference-based methods borrow information from another trial arm, and they do this increasingly as the proportion of patients deviating increases.This causes the conventional variance estimator to be informationpositive.However, with the δ-method there is no borrowing between arms, so this issue does not arise.
To summarise, the simulations demonstrate our theoretical results, showing that for all the controlled MI methods outlined in Table 1 (reference-and δ-based), in realistic trial settings multiple imputation using Rubin's rules gives information-anchored inference for treatment effects.It is only with very high proportions of missing data (e.g.> 50%) that the information-anchoring performance of Rubin's variance begins to deteriorate.Such high proportions of missing data are unlikely in well designed trials, and would typically be indicative of other major problems.

Analysis of a Peer Review Trial
We now illustrate how the information-anchored theory outlined in Section 3 performs in practice, using data from a single blind randomised controlled trial of training methods for peer reviewers of the British Medical Journal.Full details of the trial are given in Schroter et al. (2004).Following concerns about the quality of peer review, the original trial was set up to evaluate no-training, face-to-face training or a self-taught training package.After consent, but before randomisation, each participant was sent a baseline paper to review (paper 1) and the review quality was measured using the Review Quality Index (RQI).This is a validated instrument which contains eight items and is scored from 1 to 5, where a perfect review would score 5.All 609 participants who returned their review of paper 1, were randomised to receive one of the three interventions.
Two to three months later, participants were sent a further article to review (paper 2).If this paper was reviewed a third paper was sent three months later (paper 3).Unfortunately, not all of the reviewers completed the required reviews, thus a number of review scores were missing.The main trial analysis was conducted under the MAR assumption, using a linear regression of RQI on intervention group adjusted for baseline RQI.The analysis showed that the only statistically significant difference was in the quality of the review of paper 2, where the self-taught group did significantly better than the no-training group.
Therefore, here we focus on examining the robustness of this purportedly significant result to different assumptions about the missing data.Assuming MAR, the analysis found that reviewers in the self-taught group had a mean RQI 0.237 points above the no-intervention group (95% CI 0.01-0.37,p = 0.001).Although this is relatively small, the self-taught intervention is inexpensive and may be worth pursuing.However, Figure 3 shows the quality of the review at baseline for (a) those who went on to complete the second review and (b) those who did not, for each of these two trial arms.The results suggest that a disproportionate number of poor reviewers in the self-taught group failed to review paper 2. This suggests the MAR assumption may be inappropriate, and data may be missing not at random.

Statistical Analysis
The primary analysis model was a linear regression of paper 2 RQI on baseline and intervention group (self-taught vs no-training), and the intervention effect estimate is shown in the first row of Table 2.
We conducted four further analyses: (a) We multiply imputed the missing RQI data assuming MAR, fitted the primary analysis model to each imputed dataset and combined the results for inference using Rubin's rules.The imputation model for RQI of paper 2 included the variables present in the primary analysis model (RQI at baseline and treatment group).(b) As it is reasonable to suppose that many of the reviewers in the self-taught group who did not return their second review ignored their training materials, we perform a class-1 sensitivity analysis assuming they 'copied no-training'.We used MI and Rubin's rules for information-anchored inference.(c) We reproduced a previous sensitivity analysis described by White et al. (2007).
They used a questionnaire to elicit experts' prior opinion about the average difference in review quality index between those who did, and did not, return the review of paper 2 (20 editors and other staff at the BMJ completed the questionnaire).The resulting distribution can be summarised as N (−0.21,0.46 2 ).We used this to perform a δ-method sensitivity analysis, where, for each imputation k, RQI values in the self-taught arm were imputed under MAR and then had δ k ∼ N (−0.21,0.46 2 ) added.This analysis is expected to be information-negative.(d) Our fourth analysis used the δ-method via MI for participants in the self-taught arm, but now fixed δ = −0.21(the mean expert opinion) to obtain informationanchored analysis.

Results
Table 2 shows the results.As theory predicts, rows 1 and 2 show that the primary analysis and analysis assuming MAR using MI give virtually identical results.In row 3, reference-based sensitivity analysis assuming copy no-training reduces the estimated effect to 0.172; compared to the primary analysis the information-anchored standard error (SE) is now very slightly reduced at 0.069.The effect of this is to increase the p-value by a factor of ten to 0.013.In contrast, using the expert's prior distribution (row 4), the point estimate is 0.195, but the standard error is much increased at 0.132, so the p-value is over 100 times greater than in the primary analysis.Lastly (row 5), again using the δ-method, but now fixing δ = −0.21gives a similar point estimate, but an information-anchored SE of 0.072.
Critically, comparing rows 4 and 5 shows that expert opinion loses a further of the information beyond that lost due to missing data under the primary analysis.Such information losses are not atypical (Mason et al., 2017).Since trials are often powered with minimal regard to potential missing data, such a loss of information must frequently lead to the primary analysis being overturned.By contrast, information-anchored sensitivity analysis fixes the loss of information across the primary and sensitivity analysis, at a level that is possible to estimate a-priori for any given deviation pattern.

Discussion
The recent publication of the ICH E9 (R1) addendum ( 2017) is bringing a sharper focus on the estimand.As the addendum recognises, this in turn leads to greater focus on the assumptions underpinning estimands.When we are faced with estimand relevant protocol deviations, or inter-current events (e.g.rescue medication) and loss to follow-up etc., such assumptions are at best only partially verifiable from the actual trial data.In such settings, a primary analysis assumption is made, and then the robustness of inferences to a number of secondary sensitivity assumptions will ideally be explored.
The assumptions underpinning the primary and sensitivity analyses should be as accessible as possible.This applies not only to assumptions about the typical, or mean, profile of patients post-deviation, but also to assumptions about their precision.
In this article, we have introduced the concept of information-anchoring-whereby the extent of information loss due to missing data is held constant across primary and sensitivity analyses.We believe this facilitates informed inferences and decisions, whatever statistical method is adopted.Information-anchoring allows stakeholders to focus on the assumptions about the mean responses of each patient, or group of patients, post-deviation, without being concerned as to whether we are injecting information into or removing information from the analysis (relative to that lost-due to patient deviations-in the primary analysis).For example, we believe this provides a good basis for discussions between regulators and pharmaceutical statisticians: the former can be reassured the sensitivity analysis is not injecting information, while the latter can be reassured that the sensitivity analysis is not discarding information.
We have differentiated between two different types of sensitivity analysis: class-1 and class-2.In class-1 the primary analysis model is retained in the sensitivity analysis; such sensitivity analyses can be readily (but need not be) carried out by multiple imputation.Controlled MI procedures, which combine a pattern-mixture modelling approach with MI, naturally fall into this first class.These include reference-based MI procedures, which impute missing data under qualitative assumptions for the unobserved data, based on data observed in a specified reference group.The primary analysis model is retained in the sensitivity analyses, fitted to each imputed data set and results combined using Rubin's rules.Consequently the assumptions of the primary analysis model are generally inconsistent with the data generating mechanism postulated by the sensitivity analysis assumption.Thus the usual justification for Rubin's MI rules does not hold.Instead, we have identified a new property of these rules, namely that for a broad class of controlled MI approaches, including both δ-and reference-based approaches, they yield information-anchored inference.In this regard, a practically important corollary of our theory is that the widely used δ-method (and associated tipping-point analysis) is information-anchored with fixed δ adjustment.
While we believe information-anchored sensitivity analyses provide a natural starting point, and will often be sufficient, in certain scenarios it may also be desirable to conduct information-negative sensitivity analysis.In such analyses a greater loss of information due to post-deviation (missing) data is imposed by the analyst in the sensitivity analysis relative to the primary analysis.One way to do this is by prior elicitation-i.e.incorporating a prior distribution on δ-as touched upon in the further comments following Theorem 1 and Section 6.The theory in Section 3 also shows how a greater loss of information can be imposed in sensitivity analysis via reference-based MI if required.This is done by reducing the size of the reference group used to construct the reference-based imputation models.
Whatever approach is taken, careful thought needs to be given, and justification provided, for the additional loss of information being imposed.As we discussed at the end of Section 6, the loss of information with prior elicitation can be substantial.Often it will be difficult to justify an additional amount of information loss to impose.
Conversely, we argue that information-positive sensitivity analysis, where a lower loss of information due to missing data post-deviation is imposed in the sensitivity analysis relative to the primary analysis, is rarely justifiable, if at all.This is because it goes against all our intuition that missing data means we lose (not gain) information: with information-positive sensitivity analyses, we gain more precise inferences the more data we lose!Our approach to determining the appropriate information in sensitivity analyses (which, as the simple example in the Introduction shows is under the control of the analyst), contrasts with some recent work.Lu (2014), Tang (2017) and Liu and Pang (2016) each developed alternative implementations of the reference-based pattern mixture modelling approach.Lu (2014) introduced an analytical approach for placebo-based (CR) pattern mixture modelling which uses maximum likelihood and the delta method for treatment effect and variance estimation.Tang (2017) derived different analytical expressions for reference-based models, also via the likelihood-based approach.Liu and Pang (2016) proposed a Bayesian analysis for reference-based methods which estimates the treatment effect and variance from the posterior distribution.
What these papers have in common is that, in the terminology developed here, they essentially choose to apply the primary analysis variance estimator across the sensitivity analyses.While this choice has a long-run justification, for the reference-based multiple imputation estimator, as our simulation results in Figure 2 show (and we have discussed elsewhere (Carpenter et al., 2014)), this choice also means information-positive inferences for reference-based scenarios.This is a consequence of (i) uncongeniality between the imputation and analysis model and (ii) the fact that reference-based methods borrow information from within and across arms.Thus we highlight here that if one of these alternative implementations is employed within sensitivity analysis information-positive inference will be obtained.
What are the implications of this for our approach?Necessarily, the variance estimate arising from the information-anchored sensitivity analysis via reference-based multiple imputation does not have a long-run justification for the reference-based multiple imputation point estimate.However, having determined that the information-anchored variance is appropriate, we can readily inflate the long-run variance of the referencebased multiple imputation estimator by adding appropriate random noise.In this way, having chosen to make our primary and sensitivity analysis information-anchored, we can derive a corresponding point estimator whose long run variance is the informationanchored variance.
If we wish to do this, we can proceed as follows.Recall that reference-based methods calculate the means of the missing values for each patient as linear combinations of the estimated treatment means at each time point under randomised arm MAR.Assume J follow-up visits, and denote these estimated means by the 2J × 1 column vector µ, with estimated covariance matrix V.It follows that, for some 2J × 1 column vector L, the maximum likelihood reference-based treatment estimate is given by L t µ, with associated estimated empirical variance σ2 M L = L t VL.If we denote the informationanchored variance by σ2 IA , take a draw from N (0, σ2 IA − σ2 M L ), add this to the treatment estimate obtained from the reference-based analysis by MI, this will result in an estimate with the information-anchored variance in a long-run sense.In practice σ2 M L could also be estimated using one of the implementations of Lu (2014), Tang (2017) or Liu and Pang (2016).In applications, however, we do not think this step is typically worthwhile.Note too that with the δ-method σ2 IA is well approximated by σ2 M L , so it is not necessary.This article has focused on the analysis of a longitudinal measure of a continuous outcome.For generalized linear models (GLMs), if we perform controlled MI on the linear predictor scale, then we can apply the theory developed here on the linear predictor scale.This suggests that for GLMs, controlled MI will be approximately informationanchored; preliminary simulations support this, and work in this area is continuing.We note, however, that issues may arise with non-collapsability when combining the component models in this setting.For survival data, we need to define the referencebased assumptions.This has been done in a recent manuscript we have submitted, which also contains simulation results suggesting promising information-anchoring properties for Rubin's rules in this setting.
When conducting class-1 sensitivity analyses via MI a natural question might be how many imputations to conduct.As remarked in the proof of Theorem 1 in Appendix B.2, the number of imputations does not materially affect the information-anchoring perfor-mance of Rubin's variance estimate.Thus we recommend determining the number of imputations required for primary analysis (under MAR) based on the required precision; these should estimate the information-anchored variance with similar precision in sensitivity analysis.To establish the number of imputations required to achieve a specific level of precision under MAR Rubin (1987) showed that the relative variance i.e. the efficiency of an estimate using only K imputations compared to an infinite number is approximately (1 + λ/K), where λ is the fraction of missing information.As discussed in Carpenter and Kenward (2008), p. 86-87, 5-10 imputations is sufficient to get a reasonably accurate answer for most applications.For more critical inferences, at least 50-100 imputations are recommended (see Carpenter and Kenward, (2013), p. 54-55).
Of course, to obtain information-anchored analyses Multiple Imputation does not have to be used.In principle we can perform information-anchored analysis by calculating the variance directly from the information-anchoring formula.However, to do this we need to calculate the expected value of the design-variance when we actually observe data under the sensitivity assumption.When the approach is used with its full flexibility (with different assumptions for different groups of patients) this is awkward.Multiple imputation provides a much more direct, computationally general, accessible approach for busy trialists, without the need for sophisticated one-off programming which is often required to directly fit MNAR pattern-mixture models or other MNAR models.
In conclusion, we believe that sensitivity analysis via controlled MI provides an accessible practical approach to exploring the robustness of inference under the primary assumption to a range of accessible, contextually plausible alternative scenarios.It is increasingly being used in the regulatory world (see, for example, the DIA pages at www.missingdata.org.uk, and the code therein; Philipsen et al. (2015), Jans et al. (2015), Billings et al. (2018), Atri et al. (2018), O'Kelly and Ratitch (2014) and references therein).Our aim has been to provide a more formal underpinning.Informationanchoring is a natural principle for such analysis, and we have shown this is an automatic consequence of using MI in this setting.

Acknowledgements
We are grateful to the Associate Editor and two referees whose comments have lead to a greatly improved manuscript.Suzie Cro was supported for her PhD by MRC London Hub for Trials Methodology Research, grant number MC EX G0800814.James Carpenter is supported by the Medical Research Council, grant numbers MC UU 12023/21 and MC UU 12023/29.

A. Appendix A: Algorithm for reference-based multiple imputation
For a continuous outcome, the generic algorithm of Carpenter et al. (2013) can be summarized in full as follows: (a) Separately for each treatment arm take all the observed data, and assuming MAR, fit a multivariate normal (MVN) distribution with an unstructured mean (i.e. a separate mean for each of the baseline and post-randomisation observation times) and variance covariance matrix using a Bayesian approach with an improper prior for the mean and an uninformative Jeffreys prior for the covariance matrix.(b) Draw a mean vector and covariance matrix from the posterior distribution for each treatment arm.Specifically we use the Markov-Chain Monte Carlo (MCMC) method to draw from the appropriate Bayesian posterior, with a sufficient burnin and update the chain sufficiently in-between to ensure subsequent draws are independent.The sampler is initiated using the Expectation-Maximization (EM) algorithm.(c) Use the draws in step 2 to form the joint distribution of each deviating individual's observed and missing outcome data as required.This can be done under a range of assumptions, in order to explore the robustness of inference about treatment effects.The options presented in Carpenter et al. (2013) that each translate to a relevant assumption are described in Table 1.(d) Construct the conditional distribution of missing (post-deviation) given observed outcome data for each individual who deviated, using their joint distribution formed in step 3. Sample their missing post-deviation data from this conditional distributions to create a completed data set.(e) Repeat steps 2-4 K times, resulting in K imputed data sets.
We now describe how step 3 works under 'jump to reference'.This leads to a brief presentation of the approach for the other options.Suppose there are two arms, active (indexed below by a) and reference (indexed below by r).In step 2, denote the current draw from the posterior for the 1+J reference arm means and variance-covariance matrix by µ r,0 , . . .µ r,J , and Σ r .Use the subscript a for the corresponding draws from the other arm in question (which will depend on the arm chosen as reference for the analysis at hand).
Under 'jump to reference', suppose patient i is not randomised to the reference arm and their last observation, prior to deviating, is at time d i , d i ∈ (1, . . ., J − 1).The joint distribution of their observed and post-withdrawal outcomes is multivariate normal with mean μi = (µ a,0 , . . ., µ a,di , µ r,di+1 , . . ., µ r,J ) T ; that is post-deviation they 'jump to reference'.We construct the new covariance matrix for these observations as follows.Denote the covariance matrices from the reference arm (without deviation) and the other arm in question (without deviation), partitioned at time d i according to the pre-and postdeviation measurements, by: and other arm: We want the new covariance matrix, Σ say, to match that from the active arm for the pre-deviation measurements, and the reference arm for the conditional components for the post-deviation given the pre-deviation measurements.This also guarantees positive definiteness of the new matrix, since Σ r and Σ a are positive definite.That is, we want subject to the constraints The solution is: Under 'jump to reference' we have now specified the joint distribution for a patient's pre-and post-deviation outcomes, when deviation is at time d i .This is what we require for step 4. For 'copy increments in reference' we use the same Σ as for 'jump to reference' but now For 'last mean carried forward', Σ equals the covariance matrix from the randomisation arm.The important change is the way we put together µ.Thus, for patient i in arm a under 'last mean carried forward', Finally for 'copy reference' the mean and covariance both come from the reference (typically, but not necessarily, control) arm, irrespective of deviation time.A SAS macro implementing this approach can be downloaded from, www.missingdata.org.uk(Roger, 2012) and Stata software from https://ideas.repec.org/c/boc/bocode/s457983.html (Cro, 2015;Cro et al., 2016).

B.1. Proof of Proposition 1
Here we outline the argument for Proposition 1.Consider the baseline (time 1) and J −1 follow-up setting where Y z,i,j denotes the continuous outcome measure for patient i in arm z (z = a indicates active arm allocation and z = r reference arm allocation) at time j for i = 1, ..., n and j = 1, ..., J. n d,j patients deviate at time j in a monotone fashion, for j > 1 such that n d = J j=2 n d,j .Interest lies in the unadjusted mean treatment group difference at time J. Conditioning on n d,j for j > 1, the expected value of the treatment estimate at time J when the post-deviation data can be observed is, estimate.When the deviating patients experience primary on-treatment behaviour postdeviation and are fully observed the expectation of the variance of the primary ontreatment estimand can be expressed as, Under the conditions of Proposition 1 and using Corollary 1 and 2, the variance estimator for the sensitivity estimand where post-deviation data are fully observed can be expressed as, E Vfull, sensitivity = a T D P ΣD T P a + O(n −2 ).( 8) We now suppose that post-deviation data are unobserved, i.e. the potentially observable primary on-treatment and sensitivity scenario entries in Y are missing for the n d active patients.We alternatively multiply impute these outcomes, using primary ontreatment (MAR) imputation and imputation under the sensitivity scenario.This gives For this we need appropriate imputation distributions for each missing data pattern under each scenario, with suitable posteriors for the included parameters.
Under our primary on-treatment assumption (MAR), the imputation model for patients deviating at time j, for each j > 1 is formed from the regression of Y a,J,o on P a,o,j where P a,o,j is the design matrix for the imputation model, which contains the values of the 1, ..., j − 1 outcomes and covariates included in the imputation model (excluding treatment) for the n o observed active patients, along with a vector of 1's to include an intercept in the model.This is appropriate since we are not imputing any interim missing outcomes here.We only consider monotone missing data patterns.We are interested in the treatment effect at time J.As described by Carpenter and Kenward (2013, p. 77-78), under MAR, each of the regressions will be validly estimated from those observed in the data set.The parameter estimates for the primary on-treatment (MAR) imputation model for the n d,j patients missing outcomes j to J for each j > 1 are found as βprimary,j = (P T a,o,j P a,o,j ) −1 P T a,o,j Y a,J,o with assumed known covariance matrix V primary,j = (P T a,o,j P a,o,j ) −1 σ 2 j .We assume the large sample posterior for the parameter estimates for the primary on-treatment imputation model, denoted βprimary,j , is normal and centered on the ML estimator βprimary,j with covariance matrix V primary,j .That is, βprimary,j |Y a,J,o ∼ N ( βprimary,j ; V primary,j ).The primary on-treatment imputation model for active patient i deviating at time j, for each j > 1 and imputation k can therefore be expressed as, Ỹa,i,J,k |Y a,J,o = P a,d,j,i βprimary,j + b primary,j,k + e i,j,k for i ∈ {DJ }, where, b primary,j,k ∼ N (0, V a,o,j ), e i,j,k ∼ N (0, σ 2 j ) and P a,d,j,i contains the values of the 1, ..., j − 1 outcomes and covariates included in the imputation model (excluding treatment, plus a 1 for the intercept) for each deviating active patient i, who deviates at time j.
For sensitivity analysis we conduct imputation under the proposed sensitivity scenario and assume the large sample posterior for the imputation parameters for the n d,j patients missing outcomes j to J for each j > 1, βsensitivity,j is normal and centered on the ML estimator βsensitivity,j with known covariance matrix V sensitivity,j , that is for each j > 1, βsensitivity,j |Y sensitivity,J ∼ N ( βsensitivity,j ; V sensitivity,j ), where Y sensitivity,J consists of the relevant observed outcome data under the particular sensitivity scenario setting of interest.The imputation model used in the sensitivity analysis for active patient i deviating following time j, for each j > 1 and imputation k can therefore be expressed as, Ya,i,J,k |Y sensitivity,J = P a,d,j,i βsensitivity,j + b sensitivity,j,k + e i,j,k for i ∈ {DJ }, where, b sensitivity,j,k ∼ N (0, V sensitivity,j ) and e i,j,k ∼ N (0, σ 2 j ).Under the assumption of equal variance-covariance matrix of baseline and follow-up by treatment arm we consequently assume the same variance for the residuals in the primary and sensitivity imputation models for patients deviating at the same time j, for each j > 1.
We are interested in imputation inference for, 1 Since E Ŵprimary = E Vfull, primary and using ( 7) and ( 8) that is, This gives the required result in the longitudinal trial setting with monotone missingness in the active treatment arm with K = ∞.In practice K = ∞, however the information-anchoring approximation results will still hold for finite K.For finite K the variance of our MI treatment estimate as estimated by Rubin's rules is, VMI, primary = Ŵprimary + 1 + 1 K Bprimary or VMI, sensitivity = Ŵsensitivity + 1 + 1 K Bsensitivity .We will therefore have additional terms in the difference between Rubin's variance estimator and the ideal information-anchored variance, but these will also be very small.They will be the same order of the terms already presented multiplied by K −1 , hence indeed smaller.Thus following the reasons discussed in the main text the approximation remains with finite K.
We note that when we relax the equal variance by trial arm assumption, we can no longer assume the variance of the residuals in the primary de jure imputation model for patients with missingness pattern j matches the variance of the residuals in the sensitivity de facto imputation model for patients with missingness pattern j, for each missing data pattern j.
In this setting we denote the variance of the residuals in the primary on-treatment imputation model for patients missing outcomes j, ..., J as σ 2 P,j and in the sensitivity imputation model as σ 2 S,j for j > 1.Then the information-anchoring performance of Rubin's MI variance estimator is driven by, 0 ≈ The additional components in the difference between Rubin's variance and the ideal information-anchored variance are driven by the degree of difference in the variance structure of the data by trial arm for each missingness pattern.Since the variance structure is not likely to differ too markedly by trial arm for each missingness pattern, and these extra components are each multiplied by π 2 d,j /n d,j , the overall impact will in practice be relatively small.

Fig. 3 .
Fig. 3.The quality of the baseline review D P Y k or 1 K K k=1 a T D S Y k .Letting the number of imputations, K → ∞, the variance of our MI treatment estimate as estimated by Rubin's rules is, VMI, primary = Ŵprimary + Bprimary or VMI, sensitivity = Ŵsensitivity + Bsensitivity where under the conditions required in the proposition,E Ŵprimary = E 1 K K k=1 a T D P Σk D T P a → a T D P ΣD T P a and, E Ŵsensitivity = E 1 K K k=1 a T D S Σk D T S a → a T D P ΣD T P a + O(n −2).Under primary (on-treatment) imputation, j (ē j,k − ēj ) + π d,j Pa,d,j b primary,j,k − Pa,d,j bprimary,j e i,j,k , ēj = 1 K K k=1 ēj,k , Pa,d,j = 1 nd,j i∈DJ P a,d,j,i and bprimary,j = 1 K K k=1 b primary,j,k .Which has expectation, j (ē j,k − ēj ) + π d,j Pa,d,j b sensitivity,j,k − Pa,d,j bsensitivity,j Vanchored = a T D P ΣD T P a + O(n −2 ) + E Bprimary E Ŵprimary a T D P ΣD T P a + O(n −2 ) = a T D P ΣD T P a + O(n −2 ) + E Bprimary + E Bprimary E Ŵprimary O(n −2 ).If Rubin's rules are information-anchoring and preserve the information loss in the primary analysis under MAR then the following holds,E Ŵsensitivity +E Bsensitivity ≈ a T D P ΣD T P a+O(n −2 )+E Bprimary + E Bprimary E Ŵprimary O(n −2 ).That is,a T D P ΣD T P a + O(n −2 ) + E Bsensitivity ≈ a T D P ΣD T P a + O(n −2 )Pa,d,j (V primary,j − V sensitivity,j ) PT a j + Pa,d,j (V primary,j − V sensitivity,j ) PT a,d,j + E Bprimary E Ŵprimary O(n −2 ) .

Table 2 .
Estimated effect of self-training vs no training on the paper 2 Review Quality Index, from the primary and various sensitivity analyses; † indicates information-anchored analysis.