A quantitative framework to inform extrapolation decisions in children

When developing a new medicine for children, the potential to extrapolate from adult efficacy data is well recognized. However, significant assumptions about the similarity of adults and children are needed for extrapolations to be biologically plausible. One such assumption is that of similar exposure–response (E–R‐) relationships. Motivated by applications to antiepileptic drug development, we consider how data that are available from existing trials of adults and adolescents can be used to quantify prior uncertainty about whether E–R‐relationships are similar in adults and younger children. A Bayesian multivariate meta‐analytic model is fitted to existing E–R‐data and adjusted for external biases that arise because these data are not perfectly relevant to the comparison of interest. We propose a strategy for eliciting expert prior opinion on external biases. From the bias‐adjusted meta‐analysis, we derive prior distributions quantifying our uncertainty about the degree of similarity between E–R‐relationships for adults and younger children. Using these we calculate the prior probability that average pharmacodynamic responses in adults and younger children, both on placebo and at an effective concentration, are sufficiently similar to justify a complete extrapolation of efficacy data. A simulation study is performed to evaluate the operating characteristics of the approach proposed.


Introduction
Leveraging existing data to optimize the design of a drug development programme is particularly appropriate when we develop medicines for small or vulnerable populations, such as children. The European Medicines Agency (EMA) defines extrapolation as ': : : extending information and conclusions available from studies in one or more subgroups of the patient population (source population) : : : to make inferences for another subgroup of the population (target population): : :' (European Medicines Agency, 2013. We focus on the extrapolation of adult efficacy data to children. Wadsworth et al. (2016a) reported the findings of a systematic review of statistical methods that are relevant for extrapolating efficacy and other data from adults to children. Gamalo-Siebers et al. (2017) illustrated methods that use adult data to design improved paediatric programmes and Petit et al. (2018) proposed a method for the design and analysis of paediatric dose-finding trials, with the dose range calculated by extrapolating from adult pharmacokinetic data. Weber et al. (2018) compared the use of Bayesian and frequentist methods for combining existing adult and paediatric data to inform decision making. Crippa et al. (2018) developed a one-stage approach to the meta-analysis of aggregated data which can be used for estimating non-linear dose-response models. Beyond extrapolation from adult data, Zheng et al. (2019) proposed a Bayesian hierarchical model synthesizing animal and human toxicity data to learn about the relationship between dose and toxicity risk in a phase I human oncology trial.
To justify the extrapolation of adult efficacy data to children, we must often make strong assumptions about the similarity of age groups in terms of disease progression, response to intervention and exposure-response (E-R-) relationships which we take to link a single outcome measure to a single summary of exposure. These assumptions are made explicit in the paediatric decision tree (see Food and Drug Administration (2003)) where judgements about the plausibility of each aspect of similarity determine whether, in the terminology of Dunne et al. (2011), a 'complete', 'partial' or 'no' extrapolation strategy is adopted. Fig. 1 explains the implications of the extrapolation strategy for the data that are generated in paediatrics. Safety data are required regardless of the extrapolation strategy that is adopted. Dunne et al. (2011) reviewed 370 paediatric studies submitted to the US Food and Drug Administration between 1998 and 2008 to identify cases in which efficacy data were extrapolated: of the 166 drug products that were considered, 14.5% followed a complete extrapolation strategy, 68% a partial extrapolation strategy and 17.5% did not extrapolate. Sun et al. (2017), in an update to this review, considered 388 paediatric studies that were submitted between 2009 and 2014. The proportion of products using partial extrapolation fell to 29%, whereas the proportions using no and complete extrapolation rose to 37% and 34% respectively.
Since 2006, the European Union paediatric regulation (European Union, 2006) has mandated that studies that are intended to support licensing of a medicine for children in the European Union must follow a paediatric investigation plan, which must be agreed ahead of time with the European Medicines Agency's Paediatric Committee. When selecting an extrapolation strategy, sponsors must ask themselves how plausible assumptions required for extrapolation are, given the data to hand. Hlavin et al. (2016) used a scepticism factor to represent uncertainty about the plausibility of complete extrapolation, where this factor could be established from existing data or expert opinion. This paper presents a framework using existing data to inform a decision on whether to perform a complete or partial extrapolation of efficacy data from adults to children. This decision will determine whether the sponsor will collect only pharmacokinetic data in children to support dose finding, or both pharmacokinetic and pharmacodynamic data. The framework proposed begins with sponsors prespecifying numerical criteria which E-Rcurves in adults and children must satisfy to be deemed 'similar'. The sponsor can then use existing information to quantify prior confidence in this degree of similarity.
We propose performing a Bayesian random-effects meta-analysis of existing E-R-data to derive priors for differences between E-R-relationships in adults and children. When studying small populations it is likely that few existing studies will be available for synthesis. Furthermore, 'external biases' (Turner et al., 2009) may be inherent in the existing data if there are differences between the source and target populations, e.g. if existing data are measurements on adults and adolescents but our question is whether E-R-relationships in adults and children aged 2-11 years are similar. This scenario may often arise in practice because drug development in adults and children is typically staggered. Furthermore, adolescents are also often recruited into adult trials in therapeutic areas such as epilepsy (Girgis et al., 2010;French et al., 2012;Marson et al., 2007a,b) and asthma (FitzGerald et al., 2016;O'Byrne et al., 2017). A draft guidance document for the inclusion of adolescents in adult oncology trials has recently been released by the Food and Drug Administration (Food and Drug Administration, 2018). To derive prior distributions for key parameters accounting for external biases, existing data could be downweighted according to a fixed weight (e.g. Ibrahim and Chen (2000), Tan et al. (2003) and Rietbergen et al. (2011)) or dynamically downweighted to a degree reflecting the commensurability of the new and existing data (Ibrahim and Chen, 2000;Hobbs et al., 2011;Neuenschwander et al., 2010). The challenges of dynamic downweighting were noted in Galwey (2017). Alternatively, one could model the external biases and define either empirical priors for the bias parameters (Welton et al., 2009) or elicit expert opinion on them (Turner et al., 2009). We adopt the latter approach here.
Throughout we illustrate the proposed extrapolation framework with applications to antiepileptic drug development. In this setting, there is broad agreement about the acceptability of extrapolating efficacy data in adults with partial onset seizures to older children with partial onset seizures, although there is some uncertainty about what age one can extrapolate down to (European Medicines Agency, 2010;Pediatric News, 2016;French et al., 2004;Wadsworth et al., 2016b). This paper proceeds as follows. In Section 2 we define a Bayesian bias-adjusted multivariate meta-analytic model to synthesize existing E-R-data and propose a quantitative criterion for defining similar E-R-relationships in adults and younger children. Section 3 describes a scheme for eliciting expert opinion on the external biases that may be inherent in the existing data. In Sections 4 and 5, we report a simulation study that was used to evaluate properties of our framework before concluding in Section 6 with a discussion.

Motivation
Suppose that E-R-data are available from H existing trials which recruited adults and adolescents. Let Y ij represent the response of the ith subject in study j, for i = 1, : : : , N j , and j = 1, : : : , H. Dropping the i-and j-subscripts for clarity, let A be a binary indicator of age which takes the value 1 for adolescents and 0 otherwise. Let where C is a measure of drug exposure, x 1 , : : : , x K are baseline covariates (such as weight) influencing response and g is the link function of the generalized linear model. If we can assume that regression parameters remain constant across studies, the relationship between exposure and the expected pharmacodynamic response, hereafter referred to as the E-R-'relationship' or 'curve', is identical in adults and adolescents if γ A = γ I = 0. The assumption of between-trial homogeneity is relaxed in Section 2.3, in which case γ A and γ I are interpreted as mean parameters.
To simplify the presentation of our methods, we shall assume throughout that the pharmacodynamic response of interest is normally distributed and that a generalized linear model is an adequate description of the underlying E-R-relationship: where ∼ N.0, σ 2 / is a random error term. Linear models have been used to analyse E-Rdata for the antiepileptic drugs oxcarbazepine (Nedelman et al., 2007) and topiramate (Girgis et al., 2010) setting Y = log.Z + 110/, where Z is the percentage change from baseline in seizure frequency and C represents the steady state trough concentration under repeated dosing (C min ). Consider now the data that we would accumulate if we performed an E-R-study, indexed by T , in adults and younger children. The International Conference on Harmonisation E11 guidance (International Conference on Harmonisation, 2001) defines children as aged 2-11 years, and adolescents as 12-16 or 18 years. If we made a complete extrapolation of efficacy data from adults to younger children, we would not need to perform study T but it is useful to consider the data that it would generate. Suppose that we measure pharmacodynamic responses Y iT , for i = 1, : : : , N T . Again, dropping the subscript i for clarity, let where T ∼ N.0, σ 2 /, x 1T , : : : , x KT are baseline prognostic covariates defined analogously to x 1 , : : : , x K , C T is a measure of exposure defined similarly to C and A T is a binary age covariate taking the value 1 for younger children and 0 otherwise. E-R-relationships in adults and younger children are identical if β A = β I = 0. We relate parameters in the source and target populations described by models (2) and (3) as Here δ A and δ I represent external biases arising because E-R-curves in adolescents and younger children may differ because of the effects of maturation and physical development on drug absorption, distribution, metabolism and elimination, and on the action of and response to a drug (Kearns et al., 2003). Stephenson (2005) noted that the responses of adults and children to many drugs have much in common, although there are exceptions, such as warfarin (Takahashi et al., 2000) and cyclosporine (Marshall and Kearns, 1999). An alternative to the additive bias model (4) is a proportional model stipulating β A = δ A γ A and β I = δ I γ I (Turner et al., 2009). We prefer an additive model since there may be differences between adults and younger children even if no differences between adults and adolescents exist. The existing data D E are said to be relevant for learning about likely differences between E-R-relationships in adults and younger children if δ A and δ I are both close to 0.

Extrapolation criterion
We propose criteria evaluating whether a summary measure of the distribution of pharmacodynamic responses in adults and younger children on placebo and at an effective exposure are sufficiently similar. Let C Å denote a level of exposure that is known to be effective in adults, e.g. the adult EC 90 , the exposure at which the expected adult response is 90% of the maximum. It may be more straightforward to specify equivalence margins with differences between a transformed outcome in mind. Thus, E-R-curves are said to be similar if where h.Y T |C T , A T / is a function of the pharmacodynamic response of a subject with observed exposure C T in age group A T , and M is a measure of location such as the mean or median. We require the distribution of pharmacodynamic outcomes in adults and younger children to be similar at the adult effective concentration. This is because, if E-R-relationships are judged to be similar between these two age groups, a suitable dose for children would be found by matching exposures. In practice, bounds η 1 and η 2 would be set on the basis of clinical judgement. Larger bounds imply that larger differences between the average pharmacodynamic responses of adults and younger children will be tolerated if we incorrectly perform a complete extrapolation and dose younger children targeting the adult effective concentration. Although different equivalence bounds can be applied at C T = 0 and C T = C Å , to simplify we set η 1 = η 2 .
The joint prior probability of extrapolation criteria (5)-(6), denoted by p E , can be used to measure the prior plausibility of an assumption that E-R-curves are sufficiently similar in adults and children to justify a complete extrapolation of efficacy data across these age groups. We speculate that p E in excess of 0.8 or 0.9 would be sufficient to support the immediate adoption of a complete extrapolation strategy. Lower probabilities would prompt a sponsor to collect additional E-R-data in younger children to verify similarities and to facilitate dose finding, where the exact sample size could be determined according to an expected value of information calculation (Willan and Pinto, 2005;Wilson, 2015). A very low value of p E could be consistent either with extreme uncertainty about the relevance of the existing data or a strong degree of scepticism about the similarity of E-R-curves. In both cases, the most appropriate strategy would be to plan an E-R-study in younger children sized to support independent dose finding in this age group.

Bayesian bias-adjusted meta-analytic model for existing data
We begin the process of quantifying what is known about differences between adults and younger children by performing a Bayesian meta-analysis of adult and adolescent E-R-data to learn about γ A and γ I . We assume that individual patient data are available but aggregate data could be used if maximum likelihood estimates and their standard errors are available for all parameters in the linear predictor of model (2). At the first level of the model, data from study j, j = 1, : : : , H, enrolling adults and adolescents are modelled as where ij ∼ N.0, σ 2 / and for ease of presentation we assume that the only baseline covariate prognostic for outcome is age. To limit model complexity, we regard γ 01 , : : : , γ 0H and γ C1 , : : : , γ CH as study-specific intercepts and effects of exposure respectively and make no assumption about exchangeability.
For the remaining parameters in model (7), we assume that pairs of study-specific parameters .γ A1 , γ I1 /, : : : , .γ AH , γ IH / are exchangeable and are samples from a bivariate normal randomeffects distribution with mean μ = .γ A , γ I / and covariance matrix Σ. One approach would be to place an inverse Wishart prior on Σ. However, our investigations found that the results of meta-analyses are very sensitive to the choice of the inverse Wishart scale matrix; decreasing the diagonal elements of this matrix reduces the variances of the marginal posterior distributions of γ A and γ I . Gelman (2006) showed that inverse-gamma(", ") priors with " ≈ 0 are informative for variance parameters in hierarchical models and suggested that inverse Wishart prior distributions for covariance matrices incur similar issues. To avoid this sensitivity, we adopt an alternative parameterization (Medical Research Council Biostatistics Unit, 2017) for the bivariate normal random-effects distribution which gives the analyst more flexibility in how they specify priors for the variance parameters. For j = 1, : : : , H, we define . 8/ Under this representation, γ I = λ 0 + λ 1 .γ A −γ A / and we allow for a correlation between γ Aj and γ Ij , for each j = 1, : : : , H.
The meta-analytic model is completed by defining priors for all unknown parameters. For each j, j = 1, : : : , H, the study-specific intercept and effect of exposure, γ 0j and γ Cj , are assigned independent N.0, ζ 2 / priors. We define average parameters γ 0 and γ C as Σ H j=1 γ 0j =H and Σ H j=1 γ Cj =H so that they represent means for the intercept and effect of exposure across the H existing studies. For the residual precision we stipulate σ −2 ∼ gamma.a, b/, with a and b chosen to define a weakly informative prior. For the parameters of random-effects distribution (8), we place an N.0, 100/ prior on γ A and specify priors ξ 1 ∼ gamma.
In the examples that we have considered, we have chosen hyperparameters to ensure that the prior for the correlation between each pair .γ Aj , γ Ij / has a bucket shape, placing probability mass at −1 and 1, and furthermore that prior probability mass is placed on a range of plausible values for the between-trial standard deviations of the γ Aj -and γ Ij -parameters (Neuenschwander et al., 2010).
The Bayesian meta-analytic model can be fitted by using Markov chain Monte Carlo (MCMC) sampling. The joint posterior distribution of .γ 0 , γ C , γ A , γ I / will not be of a standard form. To justify a complete extrapolation decision, the prior probability of criteria (5) and (6) would probably need to be reported in the paediatric investigation plan, trial protocols and journal publications. Using an approximation to the joint posterior distribution which has a closed form would allow reviewers of these documents to reproduce p E more easily. Otherwise, one would need to rerun the original meta-analysis to generate p E , which would require access to subject level data which may not be publicly available. Therefore, to facilitate communication and reproducibility of the joint posterior, we approximate it as a finite mixture of K four-dimensional multivariate normal distributions (Schmidli et al., 2014) using the flexmix package (Leisch, 2004;Leisch, 2007, 2008) where φ 4 .μ, Σ/ is the four-dimensional multivariate normal probability density function with mean μ and variance Σ, and Y 1 , : : : , Y H are vectors representing the adult and adolescent data from existing studies 1, : : : , H. Increasing K in approximation (9) increases the accuracy of the finite mixture approximation as measured by the Kullback-Leibler divergence (Kullback and Leibler, 1951;Schmidli et al., 2014). However, these increases diminish with K and must be balanced against increases in model complexity. In our investigations, we have found that setting K = 2 in approximation (9) is adequate. If we consider .γ A , γ I / in model (2) to be systematically biased for the parameters .β A , β I / in model (3), then we can elicit expert opinion on the size of these external biases. We assume that prior opinion on the bias parameters can be modelled as a bivariate normal distribution, written as δ ∼ N 2 .ν, Π/, where δ = .δ A , δ I /. By sampling pairs .γ A , γ I / and .δ A , δ I / from f.γ 0 , γ C , γ A , γ I |Y 1 , : : : , Y H / and φ 2 .ν, Π/ respectively, we can generate samples from the prior distribution of .β A , β I / given the existing data. Fitting these Monte Carlo samples by using maximum likelihood estimation, we obtain the approximate prior .10/

Overview
In this section we describe our proposal for eliciting an individual expert's opinion on δ A and δ I . We envisage eliciting opinion at a face-to-face meeting of experts by using a two-step process: first elicit the beliefs of individuals; then in a group discussion use behavioural aggregation to obtain a consensus opinion (Hampson et al., 2014). If experts cannot agree on a consensus prior, mathematical aggregation of individual expert opinion could be considered. Experts should be subject matter specialists, such as consultant level clinicians with a relevant specialism. We recommend identifying such individuals through research groups and networks, trying to achieve a good coverage of potential viewpoints. In this context, we suggest involving clinicians who specialize in treating adults and those who treat paediatrics so that we can draw on the experience of both groups to interpret the adult and adolescent data appropriately. Our elicitation scheme has four components.
(a) Part 1: present to each expert the fitted dose-response curves for adults and adolescents derived from a meta-analysis of completed trials. (b) Part 2: elicit each expert's prior modal guess at the dose-response curve in younger children, in light of the data that are presented in part 1. (c) Part 3: elicit from the expert their uncertainty about their answer to part 2 as a 90% credible interval. (d) Part 4: use the expert's answers to derive a fitted prior for δ A and δ I . Feed back summaries of fitted priors for the dose-response relationship in younger children. Allow the expert to revise their answers until they are happy that their fitted prior captures their beliefs.
Note that we frame elicitation questions in terms of the dose-response, rather than E-Rrelationship, since clinicians are likely to be more familiar expressing beliefs about the former. In our experience, serum concentrations of antiepileptic (and other drugs) are not typically measured in routine clinical practice, so clinicians tend to be more familiar with dose than with concentration. Answers to elicitation questions can be translated to opinions on E-Rparameters assuming that the relationship between dose and exposure is known. This might be derived by using existing E-R-data or through a further elicitation exercise with pharmacometricians.
In our examples, we assume that dose proportionality holds over the dose range of interest. Letting d denote dose, we can write exposure as C = κd, with known κ.

Rationale for the elicitation scheme
We now show that, under certain assumptions, we can deduce, from an expert's conditional opinions on the dose-response curve in younger children given in parts 2 and 3 of the elicitation scheme, the joint prior distribution for .δ A , δ I /.
It is reasonable to suppose that, when presented with the fitted adult and adolescent doseresponse curves, the expert will take these to be the true response curves for these age groups, disregarding any estimation error. Ignoring for the moment between-study heterogeneity in E-Rparameters, model (2) stipulates that, at dose d Å , the fitted average pharmacodynamic response in adults is F 1 d AE = γ 0 + γ C κd Å and the fitted average response in adolescents is Therefore, from the presentation of the existing data, at dose d Å an expert can deduce (a) the average response in adults, F 1 d AE , and (b) the difference between the adult and adolescent expected responses F 2 d AE = γ A + γ I κd Å . Assuming bias model (4) and assuming no drift in the parameters of the adult E-R-relationship so that, in a future E-R-study enrolling adults and younger children β 0 = γ 0 and β C = γ C , model (3) stipulates that the expected response of a younger child given dose d Å would be Conditioning on what has been learnt from the existing data, we have Assuming that prior opinion on .δ A , δ I / is independent of opinion on other E-R model parameters and can be modelled as N 2 .ν, Π/, with ν = .ν A , ν I / and Π = π 2 A π AI π AI π 2 I , then at dose d Å we obtain A + 2π AI κd Å + π 2 I .κd Å / 2 }: In parts 2 and 3 of the elicitation scheme, we ask the expert for their conditional opinion on the average response of younger children on placebo, a 'medium' or a 'high' dose, denoted by d 0 , d M and d H respectively. The proposed wording of the elicitation questions is given in the on-line supplementary appendix A and the Shiny application can be found on GitHub (https://github.com/iwadsworth/ElicitBiasPrior). In practice, d M and d H could be chosen on the basis of adult dose-finding studies or, if the drug has already been licensed in adults, using World Health Organization lists of defined daily doses (World Health Organization Collaborating Centre for Drug Statistics Methodology, 2016). If an expert expresses the consistent opinion that the average response in younger children is similar to the fitted average in adolescents, this suggests that they believe that δ A and δ I are small, i.e. that the existing adult and adolescent data are highly relevant for informing our understanding of likely differences between adults and younger children. Note that the scheme proposed asks for opinions on the relevance of the existing data after an expert has seen how supportive they are of a complete extrapolation of efficacy data from adults to adolescents. To increase the credibility of beliefs that are elicited in this way, one could interview independent experts who are not directly involved with the drug development programme. Furthermore, the assumption that opinion on .δ A , δ I / is independent of opinion on other E-R model parameters is a pragmatic one which ensures that elicitation questions have a direct interpretation and can be answered by non-statisticians.
By asking an expert for their best guesses at the average pharmacodynamic responses in children on placebo and a 'high' dose, we deduce fitted values of ν A and ν I . To find ν A , one can subtract the fitted value of γ 0 + γ A , obtained from the meta-analysis of existing data, from the expert's best guess at the average response on placebo. Meanwhile, ν I is obtained by subtracting the fitted value of γ C + γ I , also obtained from the meta-analysis, from the slope calculated by dividing the difference between the expert's best guesses at the expected pharmacodynamic responses on a high dose and placebo by κd H . An expert's uncertainty about the average pharmacodynamic response in younger children on dose d Å , for d Å ∈ .d 0 , d M , d H /, is established by asking questions to establish the fifth, 25th, 75th and 95th percentiles of their prior distribution for this quantity. Given values of ν A and ν I , we can then adapt the approach of Neuenschwander et al. (2008) to search over configurations of π 2 A , π 2 I and π AI to find the triplet which defines a positive definite variance matrix and minimizes the absolute difference between percentiles of the fitted prior and the expert's stated percentiles. To ensure positive definiteness, Π is represented in the optimization routine by using the Cholesky decomposition.

Example: application to antiepileptic drug development
The prior elicitation protocol and accompanying R Shiny application that are described in the on-line supplementary appendix A underwent several rounds of testing, with one author (IW) asking neurologists with experience of treating adult and/or paediatric epilepsy about doseresponse curves for an antiepileptic drug. Testing included face-to-face pilot runs with eight neurologists attending the International League Against Epilepsy British and Irish Chapters Meeting (Dublin, October 2016). The final version of the protocol was also piloted on three neurologists via web conference. The figures in supplementary appendix B are of the application tailored to the antiepileptic drug development application.
A few comments on the application of our elicitation scheme to the antiepileptic drug example are needed. The pharmacodynamic response, Y = log.Z + 110/, is the log-transformed percentage change in seizure frequency from baseline. Since a log-transformed percentage change is difficult to give opinions on, we elicited beliefs on the percentage change in seizure frequency Z instead. It seems natural to think that, if we took an expert's best guess at the relationship between dose and E.Z/ and then transformed it, we would obtain their best guess at the relationship between dose and E.Y/. Therefore, the prior mode for E.Y/ at a particular dose was obtained by transforming the prior mode of E.Z/. The percentiles of an expert's prior distribution for E.Z/ were similarly transformed to obtain percentiles of their prior for E.Y/ (since the transformation was monotonic).
Using our elicitation procedure and Shiny application, bias priors E1 and E2 below were elicited from two epileptologists who were presented with simulated individual participant data on a licensed antiepileptic drug shown in Fig. 2. These were generated from the fitted models that were presented in Girgis et al. (2010). Assuming that κ = 1, the E-R-relationship is equivalent to the dose-response relationship: Prior E1 reflects the opinion that it is most likely that the average pharmacodynamic responses of adolescents and younger children are the same. Prior E2 is consistent with the belief that the E-R-curve in younger children lies slightly above that of adolescents (indicating a worse average response), so differences between E-R-curves in adults and younger children are larger than those between adults and adolescents. However, as can be seen from Fig. 2(b), both experts were uncertain about the dose-response curve in younger children given the existing data. The implications of this for extrapolation decisions are explored in Section 4.

Simulation study
Simulation scenarios were informed by applications to antiepileptic drug development for partial onset seizures (Girgis et al., 2010;Nedelman et al., 2007).

Epilepsy application extrapolation criterion
In all simulation scenarios, E-R-curves were said to be similar in two age groups if differences between median percentage changes from baseline in seizure frequency were less than 10%: where M represents the median and C Å is the adult EC 90 . Our choices for η 1 and η 2 were based on clinical feedback on acceptable differences in average responses. We wrote the similarity criteria in terms of the transformed pharmacodynamic end point to make it easier to elicit similarity bounds. We chose the median as our summary measure of response since, if Y follows a log-normal distribution with median m Y , the median of Z = exp.Y/ − 110 is given by m Z = exp.m Y / − 110, thus simplifying the mapping of properties from Z to Y.

Simulating existing E-R-data in adults and adolescents
We simulated the pharmacodynamic responses of adults and adolescents according to model (7). Setting the residual variance σ 2 = 0:0243 ensured that the transformed response Z lay within ±10% of its median given the patient's age group and level of exposure with probability 0.95. We simulated age group indicators A ∼ Bern.0:15/ so that on average 15% of existing trial participants were adolescents. This proportion appears reasonable on the basis of the studies that were cited in Girgis et al. (2010). Furthermore, we assigned 10% of patients in each study to placebo. For patients who were allocated to the drug, we sampled log.C min / from a normal distribution with mean log.2:94/ and variance 0.921, truncating samples above by log.17:27/.
In this way, we generated C min -values with quartiles and first and 99th percentiles similar to those reported by studies cited in Girgis et al. (2010) where C min -values ranged between 0.19 and 17:27 μg ml −1 .
For each existing study, study-specific parameters of E-R-model (7) were generated by sam- setting γ 0 = 4:4469 and γ C = −0:0627, which are the maximum likelihood estimates of these parameters taken from Girgis et al. (2010). Let P and C represent the difference between M.Z|A = 1, C/ and M.Z|A = 0, C/ when C = 0 and C = C Å respectively. We chose values for γ A and γ I such that P and C , when evaluated under these average parameters, spanned a realistic range of differences. We considered pairs . P , C / ∈ {.0, 0/, .5, 5/, .10, 10/, .20, 20/, .5, 10/, .5, 20/} which correspond to the six pairs of .γ A , γ I / labelled in Table 1 as E-R-models S1-S6. Table 1. Population means of the effects of age, γ A , and the interaction between age and exposure, γ I , for adults and adolescents in the six E-R simulation models, with the interpretation of each model †
The variances of study-specific E-R-parameters were chosen to characterize low, moderate, high and very high levels of between-trial heterogeneity. We chose σ 2 0 such that a study-specific value of M.Z|A = 0, C = 0/ lay within ±10% of the median of Z calculated by setting the E-R-model parameters equal to their population means with probability 0.6 (very high heterogeneity), 0.7 (high), 0.8 (moderate) or 0.95 (low). Fixing σ 2 0 , σ 2 C was then set to ensure that the study-specific value of M.Z|A = 0, C = EC 90 / lay within ±10% of the median of Z calculated by setting the E-R-model parameters equal to their population means with the same probability. We chose σ 2 A and σ 2 I to fix the probability that an individual existing trial will be consistent with an assumption of similar E-R-curves in adults and adolescents according to criteria (11)-(12). Specifically, we chose σ 2 A such that, with probability 0.6, 0.7, 0.8 or 0.95, the true difference in a study between M.Z|A = 0, C = 0/ and M.Z|A = 1, C = 0/ lay within ±10%. For a particular choice of σ 2 A , we then fixed σ 2 I such that, with probability 0.6, 0.7, 0.8 or 0.95, the true difference between study-specific values M.Z|A = 0, C = EC 90 / and M.Z|A = 1, C = EC 90 / lay within ±10%. Different configurations of the heterogeneity parameters are listed in Table 2.
Simulation scenarios considered different numbers of existing trials .H = 2, 3, 4, 5, 10, 20/ and numbers of subjects per trial (N = 30, 170). Numbers of existing trials were chosen to explore a plausible range: Davey et al. (2011) reported that 75% of the 22453 meta-analyses that were listed in the Cochrane database of systematic reviews at 2011 were based on five or fewer studies, and 1% were based on 28 or more. Values of N were informed by four industry-sponsored trials of an antiepileptic drug, the average sample size of which was 168 patients.

Meta-analysis of simulated existing E-R-studies
For each of the 288 simulation scenarios, we simulated 1000 sets of trials and fitted the Bayesian meta-analytic model of Section 2.3 to each data set. All simulations were performed in R (R Development Core Team, 2015) fitting the meta-analytic model by calling OpenBUGS version 3.2.3 (Lunn et al., 2009) using the R2OpenBUGS package (Sturtz et al., 2005). We fitted the Bayesian model by running three chains using a thinning rate of 5, running the chain for 30000 iterations including a burn-in of 10000 iterations. The coda package (Plummer et al., 2006) was then used to extract posterior samples from the OpenBUGS output.
The meta-analytic model was fitted stipulating the priors that are given in Table 3. Hyperparameters for ξ 1 and ξ 2 , defining the variability of the random-effects distribution (8), were chosen so that E.ξ 1 / and E.ξ 2 / were equal to our choices for the moderate between-trial standard deviation for γ A and γ I given in Table 2. Then 95% of the probability mass for the ξ 1 -prior was between (0.012, 0.250), and 95% of the probability mass for the ξ 2 -prior was between (0.001, 0.050). Therefore, low weight was given to very low and high between-trial variances. We  Table 3. Prior distributions placed on parameters of model (7) and random-effects distribution (8)

Meta-analysis of existing data
As can be seen from on-line supplementary Tables ST9-ST16, the Bayesian multivariate metaanalysis of the existing adult and adolescent data produces accurate estimates of γ A and γ I , with low bias, empirical standard deviations and mean-squared error in most scenarios. In all cases, the accuracy increases with the sample size per existing study. Empirical standard deviations are highest under the highest level of between-trial heterogeneity, but the bias remains small. The intercept and effect of exposure are also estimated with small bias and high precision (the results are not presented).

Effective sample sizes of the approximate joint prior for parameters representing differences between adults and younger children
On-line supplementary appendix C explores how the average effective sample sizes of the bivariate normal mixture approximation to the joint prior for .β A , β I / is influenced by an expert's uncertainty about the bias parameters. Under bias prior E1, information from the existing adult and adolescent data is heavily downweighted; for example, if each existing study enrolled 170 patients, assuming low between-trial heterogeneity, the effective sample size of the prior for .β A , β I / would be 24. The effective sample size of the prior for .β A , β I / increases as prior uncertainty about the external biases decreases. For more details, the reader is referred to supplementary appendix C.

Prior probability that E-R-curves are similar in adults and younger children
First we look at how the prior probability of the extrapolation criteria (11)-(12) (referred to as p E ) varies with the true E-R-relationship in adults and adolescents. On-line supplementary Tables ST17-ST20 present the means and empirical standard deviations of p E for a range of scenarios. Figs 3(d)-3(f) illustrate E-R-relationships in adults, adolescents and younger children under simulation models S1, S3 and S4 and bias prior E1. Figs 3(a)-3(c) illustrate how p E changes as differences between adult and adolescent E-R-relationships increase from none (model S1), to moderate (model 3) to large (model 4) under bias prior E1. A general trend is that larger values of p E are recorded in scenarios where the true E-R-curves in adults and adolescents are more closely aligned. Under bias prior E1 and models S1 and S3, when differences between Average prior extrapolation probabilities under bias prior E1 and simulation scenarios S1, S3 and S4 respectively ( ,˙1 empirical standard deviation of the observed probabilities) and (d)-(f) median responses in adults, adolescents and younger children under simulation models S1, S3 and S4 respectively: also plotted are the lower bounds of 90% credible intervals for the median response in younger children, consistent with bias prior E1 when its variance matrix is unscaled, or scaled by a factor of 0.5 or 2; credible intervals are calculated conditioning on true values of adult and adolescent E-R-parameters; bars at the placebo and EC 90 represent similarity bounds given criteria (11) and (12) adults and adolescents are sufficiently small to satisfy criteria (11) and (12), p E increases as N increases, although differences diminish with H. Under model S1 and bias prior E1, p E reaches a maximum of 0.572 when the between-trial heterogeneity is low and data are available from H = 20 existing studies, each having recruited N = 170 subjects. Curves representing cases when the bias prior variance matrix is scaled by a factor of 0.5, 0.01 or 0 show that, if uncertainty about external biases were to be significantly reduced, p E would increase. If the prior variance matrix is scaled by a factor of 0, we take this to mean that the expert knows that δ A = ν A and δ I = ν I ; therefore, their joint prior places probability mass 1 on the configuration .δ A , δ I / = .ν A , ν I / and assigns zero probability to all other pairs. For bias prior E1, a scale factor of 0 would reflect the opinion that we are certain that differences between adult and adolescent E-R-curves reflect differences between curves for adults and younger children. There is a question of whether it is plausible that an expert would be sufficiently confident in their beliefs for us to attain a high value of p E . Suppose that p E = 0:8 would be sufficient to support a complete extrapolation strategy. From Fig. 3(a), we see that, under model S1 with low between-trial heterogeneity and 170 subjects per trial, if we scale the bias prior variance matrix by 0.5, p E reaches 0.8 when H > 5. From Fig. 3(d), we can see what this scaling factor would correspond to in terms of the level of confidence that an expert must have in the location of the E-R-curve in younger children. We speculate that experts could have this level of confidence in practice. With enough existing data and strong, but still feasible, expert opinion, a high prior extrapolation probability is plausible.
Prior probabilities of extrapolation under models S2, S5 and S6 are provided in the on-line supplementary materials as Figs SF9-SF11. Similar patterns are seen in the results that are generated under models S2 and S5 to those under model S1, although values of p E tend to be lower overall, reflecting the larger differences between adult and adolescent E-R-curves. A similar comment applies to results generated under models S4 and S6.
We have repeated our investigations by using bias prior E2. Comparing results generated under the two priors, we see that the prior probability of extrapolation under prior E2 is lower in all scenarios, demonstrating that it is not only an expert's uncertainty about external biases which influences the probability of extrapolation but also the expert's opinion on the direction of differences between E-R-curves in adolescents and younger children.

Discussion
This paper proposes a quantitative framework for using existing pharmacological data to inform our understanding of likely differences between E-R-relationships in adults and younger children. The prior probability of acceptably small differences between these relationships is used to inform a decision of whether to perform a complete or partial extrapolation of adult efficacy data to younger children. Currently, we propose that prior extrapolation probabilities in excess of 0.8 or 0.9 would support a decision to adopt a complete extrapolation strategy, although further work will explore whether the choice of this cut-off can be refined and formalized through the use of a decision theoretic argument. Such an approach would consider various risks and costs, including the risk to children of incorrectly adopting a complete extrapolation strategy and the costs to patients and the sponsor of failing to perform a complete extrapolation when this is appropriate. When it is unclear whether a complete extrapolation strategy should be adopted or not, one could use an expected value of information analysis (Briggs et al., 2006;Heath et al., 2017) to quantify the value, in terms of improved decision making, of collecting varying numbers of additional E-R-data in younger children given the risks that were outlined above.
When performing the bias-adjusted meta-analysis on which the prior probability of extrapolation is based, it is essential that the studies included should have been identified through a process of systematic review according to a prespecified protocol (Higgins and Green, 2011;Khan et al., 2011). We suggest that the eligibility criteria for the systematic review should include trials that recruit patients with the indication of interest, assess the same drug and collect follow-up data that allow the chosen outcomes to be analysed. To inform expert opinion, during the elicitation meeting we could broaden out to present data on related drugs or indications.
Our current approach assumes that E-R-relationships can be captured by models which represent age as a categorical variable, i.e. that assume that there are no important differences within an age group. Although this assumption will never hold exactly, we do expect it to hold approximately for suitably defined age groups. If important differences were expected to occur within an age group, then a more suitable approach would be to consider each homogeneous age group in turn and to select an extrapolation strategy for each by application of the methods that were described in Sections 2 and 3. Although the motivating example for this work has been extrapolating across age groups, a similar framework could be used to inform the extrapolation of efficacy data across ethnic groups or geographic regions, where subgroups in this setting are naturally discrete.
In this paper, we have considered the case of linear E-R-models, though generalized linear models would be accommodated in our framework with appropriate adjustments to the elicitation protocol. However, over the therapeutic window of interest, the E-R-relationship is likely to be approximately linear in many cases, even if the complete E-R-relationship follows an Emax model (Macdougall, 2006): a non-linear model which is often used to model the relationship between exposure and response. If this is not so, extending to non-linear E-R-models could be possible, though one would need to consider (a) how to parameterize the more complex E-R-models for adults, adolescents and younger children, (b) how to represent differences between the various E-R-relationships and to define decision criteria governing extrapolation decisions and (c) how one would devise a scheme to elicit opinion on biases affecting parameters governing the similarity of E-R-relationships.
Additionally, our prior extrapolation probability would probably need to consider a moderate exposure level, such as EC 50 , along with a placebo and a higher exposure.