Probabilistic index models
Abstract
Summary. We present a semiparametric statistical model for the probabilistic index which can be defined as P(YY*), where Y and Y* are independent random response variables associated with covariate patterns X and X* respectively. A link function defines the relationship between the probabilistic index and a linear predictor. Asymptotic normality of the estimators and consistency of the covariance matrix estimator are established through semiparametric theory. The model is illustrated with several examples, and the estimation theory is validated in a simulation study.
1. Introduction
Consider the class of studies in which a single response variable is measured simultaneously with some covariates. Let Y and X denote the response variable and the d‐dimensional covariate respectively, and let fYX and fY|X denote the density functions of the joint distribution and the conditional distribution of Y given X respectively. We use the same notation for the probability mass functions when Y or X are discrete variables. When Y is a continuous random variable, most statistical methods focus on the conditional mean of Y, given X. For example, in linear regression models E(Y|X)=ZTβ, where Z is a p‐dimensional vector with elements that are functions of the covariates and where β is a p‐dimensional parameter vector. Sometimes the complete conditional distribution of Y given X is specified (e.g. the normal regression model), allowing for likelihood‐based inference, but this is often asymptotically replaced by some mild assumptions on the higher order moments of the conditional distribution so that the likelihood is no longer defined and semiparametric theories are required for inference.
(1)
are independently distributed with density fYX. We introduce the notation
for the PI as defined in expression (1). When Y is continuous
and the PI simplifies to
. Definition (1) is also meaningful and convenient when the response is ordinal. Our definition implies that
for both continuous and ordinal responses.
(2)An interesting special case arises when X is a binary design variable which refers to two populations. With
model (2) becomes
which is the parameter of interest in the Wilcoxon–Mann–Whitney (WMW) test. In particular, under the general two‐sample null hypothesis
, the PI equals
when the response variable is continuous, and thus β=0. Under mild conditions, the WMW test is consistent against the alternative
or β≠0. The class of models that is presented here can be considered as extensions of the WMW setting. Just as a linear regression model and the t‐tests for testing the covariate effects in the linear model embed the two‐sample t‐test when the linear regression model has only one 0–1 dummy covariate, so do the tests for testing covariate effects in the PIM result in a WMW‐type test in a two‐sample design. Our models also extend the work of Brumback et al. (2006), who proposed models for the PI, but with the restriction that Y and Y* are continuous response variables that always belong to two different populations or treatment groups. In terms of our formulation this restriction could be expressed as X and X* being distinct in at least one component which is a binary indicator for two treatment groups. Brumback and colleagues thus provided a WMW‐type test for comparing two treatment groups, while controlling for one or more covariates. Our methods do not impose any particular restriction on the covariate vector X. Moreover, the methods that are proposed in this paper further improve on Brumback et al. (2006) by being directly applicable to both continuous and ordinal response variables, and by providing a consistent estimator of the variance–covariance matrix of the parameter estimators so that no computationally intensive bootstrap procedure is required.


(a) Scatter plot of BDI improvement versus dose (
, linear regression model based on least squres;
, linear regression model based on Huber's robust M‐estimator), and histograms of the improvements in BDI for (b) small doses and (c) large doses
Using the methods that are described in this paper, we find the estimate
with estimated standard deviation 0.0398. The p‐value for testing H0:β=0 versusH1:β≠0 is smaller than 0.0001, and thus at the 5% level of significance the null hypothesis is rejected. Therefore we conclude that patients who are treated with a larger dose of quetiapine are more likely to benefit from the treatment. In particular, when the dose is increased by 5 g, the estimated PI equals
, i.e., when comparing a group of patients treated with quetiapine with a group that received an extra 5 g of quetiapine, we conclude that, with probability 70.2%, the BDI of a patient from the high dose group shows a larger improvement than for a patient from the low dose group. At first sight the reader might think that the data could just as well have been analysed with a (linear) regression model, but, as illustrated in Fig. 1, the linearity assumption would be violated; a transformation or non‐linear regression techniques may resolve this problem. However, Figs 1(b) and 1(c) further demonstrate that the dose affects not only the mean response, but also the variance and the skewness of the BDI distribution. The PI acts here as a quantity that summarizes the covariate effect on the res ponse distribution in a meaningful effect size measure. Another important characteristic of the example is that BDI is basically an ordinal score variable. Although the BDI scale counts 64 levels, the mean BDI does not necessarily have an unambiguous interpretation. Regression techniques that focus on the conditional mean of the BDI are thus not to be recommended. The interpretation of the PI, in contrast, applies to both continuous and ordinal variables. Cumulative or adjacent categories logistic regression models (McCullagh, 1980) may also be used for the analysis of ordinal data; see, for example, Agresti (2007) or Liu and Agresti (2005) for extensive overviews on methods for ordinal data. Some other examples of response variables for which classical regression models are not the most appropriate are briefly discussed in the next paragraph.
There are many examples of response variables that are measured on an ordinal scale; we name just one more example. In pain management the effectiveness of treatments is often measured on an ordinal scale. Patients may be asked to fill out a questionnaire with questions related to their (subjective) pain experience, resulting in a pain score that has only an ordinal meaning. The scale of Turk et al. (1993), for example, is a 0–10 rating scale. The analysis of pain scores with PIMs would result in probabilities that quantify how likely it is that the pain will decrease as a function of a set of covariates. Pain may also be measured on the visual analogue scale of Wallerstein (1984). For this the patient is presented with a horizontal line of 10 cm, anchored by the words ‘no pain’ and ‘very severe pain’ at the two ends. The patient is asked to mark the point on the line that best represents his or her level of pain at that moment. The distance, which is measured in millimetres, between the left‐hand end of the line and the point marked by the patient is the numerical value that is used as a measure of pain. This is an example of a response variable that may be interpreted as being ordinal, but it may just as well be considered as a continuous response variable. However, not every variable that is measured on a continuous scale is necessarily an interval or ratio scale variable. For example, a patient with a visual analogue scale pain score of 4 does not necessarily have twice as much pain as someone with a pain score of 2. Thus, again the mean does not have a meaning, but statements involving order comparisons, such as
do make sense. See Myles et al. (1999) for more details of the visual analogue scale.
PIMs may also turn out to be useful for analysing genuine continuous response variables on a ratio scale for which classical regression models also seem to be appropriate. Beyerlein et al. (2008) observed that a child's body mass index may be affected by several risk factors that, however, do not act only on the mean body mass index. In particular, the skewness of the body mass index distribution may change with covariate patterns. As illustrated in the BDI example, the PI summarizes the covariate effects on the shape of the response distribution, while remaining a very informative interpretation of the covariate effect sizes. Hence, PIMs could be a valuable alternative for body mass index data. Beyerlein et al. (2008) suggested analysing the body mass index data with quantile regression methods. Quantile regression (Koenker, 2005) is another important class of models. It focuses on the quantile distribution of Y given X, QY|X(·|X), say. Without the complete specification of the joint distribution of Y and X, the τth quantile of the distribution of Y given X is modelled as
. These models are also semiparametric as the distribution of Y given X is not completely specified or parameterized.
The examples of the previous paragraphs already give a flavour of the usefulness of the PIM. In particular, the response variables were defined on an ordered scale, which could be discrete or continuous, for which the mean of the difference Y−Y* did not have a proper interpretation as an effect size, but for which the PI did. More generally, the PIM may be the statisticians’ method of choice whenever the PI is considered as a meaningful parameter for quantifying effect sizes.
In Section 6 three example data sets are worked out in detail to demonstrate the scope of PIMs.
, where F1 and F2 are the distribution functions of Y|X=x1 and
respectively. Suppose that Y is a continuous response variable and that F1 and F2 have the same support,
, say. Then the area under the curve becomes
(3)
independently distributed; usually we shall drop the index
from the probability operator. In the context of receiver operating characteristic curves, we refer to Dodd and Pepe (2003), who proposed regression models for the area under the curve which have formed the theoretical basis of the work of Brumback et al. (2006), which has been referred to earlier in this section. The PI is also closely related to stochastic ordering. A distribution F1 is said to be stochastically smaller than F2 if and only if
for all
and with strict inequality for at least a subset of
. When F1 is stochastically smaller than F2, equation (3) immediately implies that
The implication does not hold necessarily in the other direction. Stochastic ordering is thus a stronger property than PI>0.5, but the PI has the advantage of being a very informative effect size measure, as argued by many researchers; see Acion et al. (2006), Browne (2010), Laine and Davidoff (1996) and Zhou (2008), among others. This is further illustrated in the examples that are included in this paper.
After the class of PI models has been formally defined in Section 2 and the parameter estimation and asymptotic distribution theory are presented in Section 3, we discuss in Section 4 the relationship between the PIM and several other statistical methods such as linear regression, Cox proportional hazards regression, the WMW test, rank regression and the Hodges–Lehmann estimator. Note, however, that these connections to other statistical methods are given only to gain a better understanding of the PIMs and to motivate certain PI model formulations. We do not claim that PIMs should replace other statistical models, but they may be a valuable addition to the statisticians’ toolbox, particularly when the research question allows a natural formulation with the PI as an effect size measure. The validity of the asymptotic theory is assessed in a simulation study in Section 5. More examples are presented in Section 6, and conclusions are formulated in Section 7.
2. The model and its interpretation
(4)
and
, i.e. m must be antisymmetric about 1. The former restriction is guaranteed to hold because of the definition of the PI as in expression (1). When m does not satisfy the antisymmetry condition, the model may still be coherent when equation (4) is only defined for all X ≺ X* or XX*. The former refers to an order relation between the covariate patterns; so does the latter, but it includes X=X*. Suppose that
is a vector of dimension 2. Then an example of such an order relation is the lexicographical ordering, i.e.
if
, or
and
. By applying this definition recursively we can extend this order relation to vectors of dimension larger than 2. See Fishburn (1974) for more information about the lexicographical order. To avoid having always to make throughout the paper the distinction between models for which the antisymmetry condition holds and models for which an order restriction is imposed, we introduce the set
of elements (X,X*) for which model (4) is defined. We use the notation
when no order restriction is imposed, which is further referred to as the ‘NO’ order restriction. To summarize, the PIM is defined as
(5)This model expresses restrictions on the conditional distribution of Y given X, but it does not fully specify this distribution. Hence, it is a semiparametric model. When P(Y=Y*)=0 model (5) may just as well be defined in terms of
.
(6)Although ZTβ may include an intercept or an offset, we sometimes choose to write the linear predictor as
, where β0 is an offset. If the scope of the PIM includes X=X* and the response is continuous, the offset β0 must be set to a constant so that
=0.5 The offset thus depends on the link function. For example, when Z=X*−X the offsets for the logit, probit and identity link become β0=0, β0=0 and β0=0.5 respectively.
3. Parameter estimation and statistical inference
3.1. Parameter estimation
in which I(Y<Y*) and I(Y=Y*) denote the usual indicator functions evaluated for the events Y<Y* and Y=Y* respectively. The PIM (5) can then be written as
(7)
. When
,
denotes a sample of n independent identically distributed (IID) random variables with joint density function fYX, model formulation (7) suggests that the β parameter vector can be estimated by using the set of pseudo‐observations
for all i,j=1,…,n for which
. In particular, model (7) resembles a conditional moment semiparametric model (see for example Chamberlain (1987), Newey (1988) or chapter 4 of Tsiatis (2006)), in which the conditional mean of the pseudo‐observations is specified. We therefore propose to estimate the parameters by solving the estimating equations
(8)
is the set of indices (i,j) for which
, and A(Zij;β) is a p‐dimensional vector function of the regressors Zij. Let
denote the estimator. Although perhaps more efficient choices for A exist, we shall consider only
(9)
, with ν a scale parameter. This choice corresponds to the quasi‐likelihood estimating equations as used, for example, in the analysis of longitudinal data (Liang and Zeger, 1986; Zeger and Liang, 1986), where they are also referred to as generalized estimating equations. In the present setting, however, the conditional mean does not refer to the mean of the conditional distribution of the response, but it refers to the mean of the pseudo‐observations. Moreover, despite the close relationship between our method of estimation and generalized estimating equations, the asymptotic distributional properties of the estimator
do not follow immediately from these theories, for the pseudo‐observations Iij have a more complicated dependence structure than, for example, block independence as in clustered or longitudinal data. Lemmas 1 and 2 of Section 3.2 state that the pseudo‐observations have the sparse correlation structure of Lumley and Hamblett (2003). This result makes the semiparametric theory of Lumley and Hamblett (2003) directly applicable to our setting. Theorems 1 and 2 that we present in Section 3.3 summarize the most important distribution theory results for the PIM.
When
the solution of equations (8) for the NO order restriction is identical to the solution for a lexicographical order restriction. Therefore, when m satisfies the antisymmetry condition, the lexicographical ordering is preferred over the NO order restriction for only half of the pseudo‐observations are needed. This also demonstrates that the estimator is independent of the order in which the covariates appear in the definition of the lexicographical ordering.
3.2. Sparse correlation
In this section we shall show that the pseudo‐observations are sparsely correlated, but we start with the defining sparse correlation in the context of pseudo‐observations. A more general definition can be found in Lumley and Hamblett (2003).
Definition 1. LetIij
denote a set of pseudo‐observations. For each pseudo‐observation Iij a set of pairs of indices Sij
is defined such that (k,l)∉Sij and (i,j)∉Skl implies that Iij and Ikl are independent. Let Mnij denote the number of pairs in Sij, let Mn=
and let mn denote the size of the largest subset T such that
for all pairs (i,j),(k,l) ∈ T. Then the set of pseudo‐observations is called sparsely correlated if we can choose Sij
so that
, with
the number of pseudo‐observations.
In the following lemmas we demonstrate that the pseudo‐observations are sparsely correlated when no order restriction or the lexicographical order restriction is imposed.
Lemma 1 (sparse correlation: NO order restriction). The NO ordered pseudo‐observations have the sparse correlation structure.
is correlated with 4n−7 other pseudo‐observations. Indeed, let k=1,…,n with k≠i and k≠j; then Iij is correlated with
and with itself. Thus
. The largest set of pseudo‐observations that are mutually independent consists of any Iij and all other Ikl with i,j,k and l mutually distinct. The size of this set is thus ⌊n/2⌋, i.e. the largest integer not larger than n/2. Suppose that n is even. Then

Since
, lemma 1 holds for n even. Similarly, when n is odd,
.
Lemma 2 (sparse correlation: lexicographical order restriction). The lexicographical ordered pseudo‐observations have the sparse correlation structure.
Proof. The lexicographical pseudo‐observations Iij for which
can be obtained by sorting the data (Y,X) on the basis of lexicographical ordering on X and then considering the pseudo‐observations
. Each pseudo‐observation Iij is correlated with 2n−4 other pseudo‐observations. Indeed Iij is correlated with
- (a)
Iik where k=i+1,…,n and k≠j,
- (b)
Ikj where k=1,…,j−1 and k≠i,
- (c)
Iki where k=1,…,i−1,
- (d)
Ijk where k=j+1,…,n
. The largest set of pseudo‐observations that are mutually independent consists of any Iij and all other Ikl with i<j,k<l mutually distinct. The size of this set is thus ⌊n/2⌋. Suppose that n is even. Then

Since
, lemma 2 holds for n even. Similarly, when n is odd,
.
3.3. Asymptotic normality of the parameter estimators
(10)The regularity conditions in the statement of theorem 1 imply the existence of β0.
Theorem 1 (asymptotic normality). Consider the PIM (7) with predictors Zij taking values in a bounded subset of
. We make the following assumptions.
Assumption 1. The pseudo‐observations are sparsely correlated, with mn as in lemma 1 or lemma 2.
Assumption 2. The link function g and the variance function V have three continuous derivatives.
Assumption 3. The true parameter β0, as defined by equation (10), is in the interior of a convex parameter space.

Assumption 5.
.
Then, as n→∞,
converges in distribution to a multivariate Gaussian distribution with zero mean and some positive definite variance–covariance matrix Σ.

4. Relationship with other methods
In this section we show how the PIMs are related to other statistical methods. In Sections 4.1 and 4.2 we demonstrate that the parameters of linear regression models and Cox proportional hazard models have simple relationships with the parameters of a PIM with particularly chosen link functions and linear predictors. The connection between hypothesis tests in the semiparametric PIM framework and the WMW rank test is explored in Section 4.3, and the link between the PIM parameter estimators and rank regression is the topic of Section 4.4. We do not suggest that the PIM methodology is a direct competitor of these other methods, but by understanding these relationships the reader may gain a better appreciation of the PIMs’ position in the landscape of statistical models, and he or she may find arguments for choosing one or other link function.
4.1. Linear regression models


Consider now the PI for this class of regression models,

(11)This relationship for linear regression models immediately suggests the link function
, for which a PIM with linear predictor Z=X*−X and β=α is obtained.

With the probit link function (g(·)=Φ−1(·)) and with Z=X*−X, a simple relationship between α and β is established: β=α/√2σ, which expresses that β is proportional to α. Under the normality, linearity and homoscedasticity assumptions of the regression model we therefore conclude that β has also an interpretation in terms of the effect of X on the conditional mean of the response. When the regression model assumptions do not hold, the parameter β in the PIM still has the interpretation in terms of the PI.

(12)
This expression illustrates that the effect of X on the distribution of Y diminishes as X increases, at least in terms of the PI. In the normal regression model, the increasing residual variance does not affect the covariate effect on the mean response, whereas it results in a negative effect modulation in terms of the PI. This is further illustrated with a real data example in Section 6.3. This was also noted by Brumback et al. (2006) and it suggests that we should take care in interpreting the α‐parameter in a normal regression model with non‐constant variance because the importance of the covariate effect may actually depend on the covariate value.
4.2. Cox proportional hazard model
, where T1i and T2i are paired survival times (e.g. from twin studies) with covariates
. Under the assumption of proportional hazards in the absence of censored or tied data, they found that

in which the parameter β originates from the hazard function λ(t|X)=λ0(t) exp (βX). Note, however, that in the PIMs that are presented in this paper it is assumed that all observations are mutually independent, whereas Holt and Prentice (1974) developed their method for paired response variables (paired survival times).
Also the marginal likelihood formulation of Kalbfleisch and Prentice (1973), which is related to the ranks of the survival times, is closely related to a PIM and the parameters are again interpretable in the proportional hazard model.
, in which λ0(y) is the baseline hazard function that does not depend on the covariate X. Thus, within the class of proportional hazard models the survival function is of the form
(13)
is the support of Y. Straightforward algebra then gives


This illustrates that the PIM with a logit link and with Z=X*−X arises naturally from a widely applicable class of distributions. A straightforward example is the exponential distribution with rate parameter γ which has survival function S(y)= exp (−γy). Equation (13) is satisfied with S0(y)= exp (y) and γ(X)= exp (XTβ).
Equation (13) characterizes this class of distributions through its survival function, but its form immediately suggests that, for distributions for which
holds, a PIM also results.
4.3. Two‐sample problem
, i=1,…,n, with Yi continuous. Without loss of generality assume that the sample of Y observations does not contain ties and that the observations are ordered so that the first n1 observations belong to the first group and the last
to the second. Let Xi=0 if 1in1 and Xi=1 if n1+1in. Consider the PIM with identity link,
(14)
is not strictly necessary, because the scope of the model does not include X=X*. However, by having it in the model, the traditional two‐sample null hypothesis becomes equivalent to β=0, which is the default null hypothesis in most statistical software. The order relation restriction in
implies that only X=0 and X*=1 are allowed so that the model can be reformulated in a more convenient form. We use the notation Y(1) and Y(2) to denote two independent observations from the first (X(1)=0) and the second (X(2)=1) group respectively. The model is now reformulated as
(15)
. With
denoting the Mann–Whitney test statistic, we see immediately that
. The traditional Mann–Whitney test, however, is usually based on the standardized test statistic
, where σ0 is the standard deviation of MW under the two‐sample null hypothesis
, with F1 and F2 the distribution functions of Y(1) and Y(2) respectively. Under this restrictive null hypothesis
. Using a variance which is obtained under the null hypothesis is related to score tests, whereas using a variance estimator that is more generally consistent is related to the Wald test. The advantage of using a more generally consistent variance estimator is that the test may then also be used for testing the null hypothesis
versus
(i.e. H0:β=0 versusH1:β≠0). Such a variance estimator was proposed by Fligner and Policello (1981) and, using the equality
, their results give immediately a variance estimator for
which can be written
(16)

It can be easily shown that the sandwich variance estimator of lemma 2 gives exactly the same expression. The PIM and the inference based on the estimating equations thus include the Wald‐type WMW test of Fligner and Policello (1981). We refer to chapter 9 of Thas (2009) for more information about the use of the WMW test in a semiparametric setting.
We started this section by assuming that the response Y is continuous, resulting in a simplification of
and
However, when the continuity assumption on Y is dropped and ties are allowed, the relationship with the WMW test statistics still holds, but with midranks instead of ranks.
For the K‐sample problem, the PIM can be similarly parameterized so that each parameter,
, say, corresponds to
, with MWkl the Mann–Whitney test statistic for comparing groups k and l and nk (or nl) the sample size of group k (or l), k<l and k,l=1,…,K. The equivalence between a PIM with this parameterization and the Kruskal–Wallis test is based on an equivalent representation of the Kruskal–Wallis statistic in terms of Mann–Whitney statistics; see Fligner (1985) for more details.
4.4. Rank regression and Hodges–Lehmann estimators
(17)
denotes the rank of the residual
among the n residuals. The estimate of α is thus obtained by solving the estimating equation (based on the partial derivative of expression (17))
(18)
(19)
and with a very simple index function
,
(20)
(21)By comparing the two estimating equations (19) and (20) with the left‐hand side of the latter replaced by expression (21), we note that the major difference is that in rank regression the linear predictor
appears within the indicator function, whereas for the PIM estimation method the linear predictor
appears outside the indicator function. Thus, in rank regression the parameter α is estimated as
so that, after subtracting
from the responses Yi, the estimated PI equals
. The estimator of β in the PIM makes on average, for each
, the estimated PI deviate from
by
. Another interesting observation is that the scores Xi and
are interchangeable in the PIM estimating equation. This also holds true asymptotically in the estimating equation (19) of the rank regression estimator. Thus pseudo‐observations with equal covariate patterns do not contribute to the estimation of the parameter.


, as in Section 4.3.
5. Simulation study
A generic problem in the set‐up of simulation studies for the evaluation of semiparametric methods is that a semiparametric model encompasses a large class of data‐generating models. Moreover, in the class of data‐generating distributions of the PIMs there may be a complicated relationship between the parameters of both models. Here, we have chosen to generate data with a normal linear regression model, an exponential generalized linear model and multi‐nomial regression model. For the first two models the relationship with the PIM is provided in Section 4, and for the last more details will be given later. Since for each of the three settings the data‐generating model is known, their parameters can also be estimated by means of maximum likelihood. Variances of the maximum likelihood estimators and powers of the Wald tests using the maximum likelihood estimators will also be reported in this section. These variances and powers need to be interpreted as optimistic benchmarks as they give only an impression of the parametric lower bound of the variances and upper bound of the powers. Moreover, it is unfair to compare variances and powers from a semiparametric method with their counterparts from a parametric method because the former methods will usually only be applied when the data‐generating mechanism is unknown or incompletely specified so that no parametric statistical analysis is advised. Moreover, we remind the reader that we have introduced PIMs as a flexible class of semiparametric models to be used when the focus is on the PI as an effect size measure. In the absence of strong parametric assumptions no parametric methods can be used for this purpose.
All computations have been performed with the R software (R Development Core Team, 2010) and all PIMs are defined for the lexicographical order relation because they all satisfy the antisymmetry condition; see Section 3 for more information.
5.1. Checking asymptotic properties of the estimators
The theoretical properties of the estimators of Section 3 are evaluated in a simulation study. Since a PIM does not represent a unique data‐generating model we simulate data from two models for which we have established a relationship with the PIMs: a normal linear model and an exponential model.
5.1.1. Normal linear model
(22)
are IID
. Sample sizes of n=25, n=50 and n=200 are considered. The predictor X takes equally spaced values in the interval [0.1,u] where u=1 or u=10. The parameter α equals 1 or 10. Table 1 presents the results for a constant standard deviation, i.e. σ(X)=σ, with σ=1 or σ=5. The corresponding PIM is given by

, and it is further referred to as the PIM estimator. Table 1 shows for each simulation setting the true β‐parameter and the average of the simulated estimates. The latter is an approximation of the true mean of the estimator. Table 1 also reports the average of the simulated sandwich variance estimates, which is an approximation of the expectation of the sandwich estimator, and the sample variance of the 1000 estimates
, which is an approximation of the true variance of the estimator
. The empirical coverages of 95% confidence intervals are also reported. As a result of the identity β=α/√2σ, β can also be estimated through the estimation of α and σ in model (22) by means of least squares and maximum likelihood. In the normal linear regression model least squares and maximum likelihood give the same point estimator of α, but their estimators of the residual variance σ2 are different up to a factor (n−1)/n. Hence, the methods give difference estimators of β, particularly in small samples.
| α | u | σ | β |
|
|
|
EC |
|
|
|
|
|---|---|---|---|---|---|---|---|---|---|---|---|
| n=25 | |||||||||||
| 1 | 1 | 1 | 0.707 | 0.736 | 0.33900 | 0.27877 | 92.0 | 0.729 | 0.06814 | 0.744 | 0.07098 |
| 1 | 1 | 5 | 0.141 | 0.130 | 0.32438 | 0.27008 | 92.8 | 0.135 | 0.05817 | 0.138 | 0.06059 |
| 1 | 10 | 1 | 0.707 | 0.721 | 0.00990 | 0.01184 | 93.0 | 0.729 | 0.01214 | 0.745 | 0.01265 |
| 1 | 10 | 5 | 0.141 | 0.149 | 0.00332 | 0.00248 | 90.2 | 0.145 | 0.00106 | 0.148 | 0.00111 |
| 10 | 1 | 1 | 7.071 | 7.309 | 1.55061 | 1.22519 | 85.7 | 7.320 | 1.36451 | 7.471 | 1.42136 |
| 10 | 1 | 5 | 1.414 | 1.463 | 0.40365 | 0.29884 | 88.7 | 1.444 | 0.10516 | 1.474 | 0.10954 |
| n=50 | |||||||||||
| 1 | 1 | 1 | 0.707 | 0.736 | 0.16640 | 0.15048 | 92.9 | 0.718 | 0.03465 | 0.725 | 0.03536 |
| 1 | 1 | 5 | 0.141 | 0.148 | 0.14905 | 0.14542 | 93.5 | 0.148 | 0.02759 | 0.150 | 0.02815 |
| 1 | 10 | 1 | 0.707 | 0.714 | 0.00615 | 0.00634 | 94.4 | 0.714 | 0.00568 | 0.721 | 0.00580 |
| 1 | 10 | 5 | 0.141 | 0.147 | 0.00148 | 0.00139 | 93.4 | 0.145 | 0.00052 | 0.146 | 0.00054 |
| 10 | 1 | 1 | 7.071 | 7.224 | 0.78701 | 0.67363 | 89.1 | 7.171 | 0.59224 | 7.244 | 0.60433 |
| 10 | 1 | 5 | 1.414 | 1.465 | 0.18646 | 0.16191 | 92.5 | 1.439 | 0.05014 | 1.454 | 0.05117 |
| n=200 | |||||||||||
| 1 | 1 | 1 | 0.707 | 0.716 | 0.03803 | 0.03942 | 95.3 | 0.710 | 0.00798 | 0.712 | 0.00802 |
| 1 | 1 | 5 | 0.141 | 0.145 | 0.04048 | 0.03817 | 94.8 | 0.145 | 0.00673 | 0.146 | 0.00676 |
| 1 | 10 | 1 | 0.707 | 0.709 | 0.00179 | 0.00170 | 94.3 | 0.709 | 0.00128 | 0.710 | 0.00128 |
| 1 | 10 | 5 | 0.141 | 0.141 | 0.00037 | 0.00036 | 95.6 | 0.141 | 0.00013 | 0.142 | 0.00013 |
| 10 | 1 | 1 | 7.071 | 7.110 | 0.19105 | 0.17489 | 93.2 | 7.089 | 0.14540 | 7.107 | 0.14613 |
| 10 | 1 | 5 | 1.414 | 1.427 | 0.04400 | 0.04308 | 95.0 | 1.421 | 0.01164 | 1.424 | 0.01170 |
-
†β is the true parameter,
the average of the β‐estimates according to the semiparametric PIM theory,
the sample variance of the simulated
,
the average of the sandwich variance estimates according to the semiparametric PIM theory, EC the empirical coverage of a 95% confidence interval for β,
the average of the least squares estimates,
the sample variance of the simulated
,
the average of the maximum likelihood estimates and
the sample variance of the simulated
.
From Table 1 we conclude that the PIM estimator of β is nearly unbiased, particularly for sample sizes of 50 and more. A similar conclusion holds for the sandwich variance estimator. The empirical coverages of the 95% confidence intervals are close to their nominal level for sample sizes of 50 and more. The simulation study also reveals that the sample distribution of
is close to normal (the results are not shown). As expected the parametric estimators are more efficient, but, when α or the range of X increases, the difference in efficiency decreases.

| α | u | σ | β | Av
|
var
|
Av
|
EC | Av
|
var
|
Av
|
var
|
|---|---|---|---|---|---|---|---|---|---|---|---|
| n=25 | |||||||||||
| 1 | 1 | 1 | 1 | 1.052 | 0.34771 | 0.27673 | 91.2 | 1.097 | 0.12945 | 1.053 | 0.10286 |
| 1 | 1 | 5 | 0.2 | 0.192 | 0.31399 | 0.26122 | 92.8 | 0.206 | 0.09299 | 0.198 | 0.08389 |
| 1 | 10 | 1 | 1 | 1.045 | 0.05487 | 0.03584 | 90.1 | 1.096 | 0.05970 | 1.051 | 0.03285 |
| 1 | 10 | 5 | 0.2 | 0.206 | 0.02317 | 0.01884 | 92.2 | 0.219 | 0.01163 | 0.209 | 0.00963 |
| 10 | 1 | 1 | 10 | 9.268 | 0.50991 | 1.75345 | 93.9 | 10.987 | 4.94362 | 10.563 | 2.79136 |
| 10 | 1 | 5 | 2 | 2.080 | 0.46761 | 0.32145 | 88.4 | 2.169 | 0.27392 | 2.086 | 0.17884 |
| 10 | 10 | 5 | 2 | 2.088 | 0.13541 | 0.10231 | 85.5 | 2.209 | 0.23559 | 2.114 | 0.12025 |
| n=50 | |||||||||||
| 1 | 1 | 1 | 1 | 1.032 | 0.17125 | 0.15259 | 92.9 | 1.044 | 0.06014 | 1.026 | 0.05177 |
| 1 | 1 | 5 | 0.2 | 0.210 | 0.14692 | 0.14205 | 94.4 | 0.214 | 0.03981 | 0.211 | 0.03839 |
| 1 | 10 | 1 | 1 | 1.025 | 0.02554 | 0.01967 | 90.0 | 1.039 | 0.02407 | 1.019 | 0.01525 |
| 1 | 10 | 5 | 0.2 | 0.208 | 0.01086 | 0.01034 | 94.4 | 0.212 | 0.00533 | 0.208 | 0.00464 |
| 10 | 1 | 1 | 10 | 9.410 | 0.22462 | 0.95066 | 96.0 | 10.471 | 1.99398 | 10.244 | 1.18719 |
| 10 | 1 | 5 | 2 | 2.063 | 0.20438 | 0.17953 | 92.5 | 2.093 | 0.11833 | 2.056 | 0.08404 |
| 10 | 10 | 5 | 2 | 2.046 | 0.06469 | 0.05539 | 91.4 | 2.089 | 0.08120 | 2.047 | 0.04754 |
| n=200 | |||||||||||
| 1 | 1 | 1 | 1 | 1.010 | 0.03905 | 0.04005 | 95.1 | 1.010 | 0.01361 | 1.006 | 0.01161 |
| 1 | 1 | 5 | 0.2 | 0.204 | 0.03891 | 0.03740 | 95.2 | 0.206 | 0.00939 | 0.205 | 0.00921 |
| 1 | 10 | 1 | 1 | 1.006 | 0.00568 | 0.00557 | 93.6 | 1.013 | 0.00557 | 1.005 | 0.00345 |
| 1 | 10 | 5 | 0.2 | 0.198 | 0.00271 | 0.00275 | 95.8 | 0.201 | 0.00118 | 0.200 | 0.00111 |
| 10 | 1 | 1 | 10 | 9.576 | 0.04093 | 0.26446 | 97.1 | 10.098 | 0.47093 | 10.051 | 0.28679 |
| 10 | 1 | 5 | 2 | 2.016 | 0.05006 | 0.04843 | 94.1 | 2.022 | 0.02577 | 2.014 | 0.01907 |
| 10 | 10 | 5 | 2 | 2.007 | 0.01548 | 0.01465 | 94.1 | 2.020 | 0.01913 | 2.008 | 0.01061 |
-
†β is the true parameter,
the average of the β‐estimates according to the semiparametric PIM theory,
the sample variance of the simulated
,
the average of the sandwich variance estimates according to the semiparametric PIM theory, EC the empirical coverage of a 95% confidence interval for β,
the average of the least squares estimates,
the sample variance of the simulated
,
the average of the maximum likelihood estimates and
the sample variance of the simulated
.
5.1.2. Exponential model
be IID Exponential{γ(Xi)} with
(23)
(24)
. As a result of the identity β=α, the parameter β can also be estimated on the basis of the semiparametric proportional hazards theory, resulting in
. The R package survival (Therneau and Lumley, 2010) is used for fitting the proportional hazards model. The estimator of β based on maximum likelihood theory is denoted by
. From Table 3 we conclude that the PIM estimator of β and the sandwich variance estimator are nearly unbiased, particularly for sample sizes of 50 and more. The empirical coverages of the 95% confidence intervals are close to their nominal level for sample sizes of 50 and more. For large ranges of X the efficiency of the PIM estimator is close to the efficiency of the semiparametric proportional hazards estimator.
| α | u | σ | β |
|
|
|
EC |
|
|
|
|
|---|---|---|---|---|---|---|---|---|---|---|---|
| n=25 | |||||||||||
| −2 | 1 | 1 | −2 | −2.226 | 1.19067 | 0.89060 | 90.4 | −2.178 | 0.87454 | −1.963 | 0.10657 |
| 0.1 | 10 | 1 | 0.1 | 0.110 | 0.00902 | 0.00630 | 91.1 | 0.110 | 0.00720 | 0.104 | 0.00130 |
| n=50 | |||||||||||
| −2 | 1 | 1 | −2 | −2.083 | 0.54166 | 0.47159 | 93.7 | −2.083 | 0.41978 | −1.986 | 0.05564 |
| 0.1 | 10 | 1 | 0.1 | 0.103 | 0.00337 | 0.00333 | 95.0 | 0.103 | 0.00262 | 0.103 | 0.00060 |
| n=200 | |||||||||||
| −2 | 1 | 1 | −2 | −2.023 | 0.12394 | 0.12220 | 94.7 | −2.018 | 0.08917 | −1.999 | 0.01460 |
| 0.1 | 10 | 1 | 0.1 | 0.098 | 0.00090 | 0.00087 | 94.6 | 0.100 | 0.00072 | 0.100 | 0.00015 |
-
†β is the true parameter,
the average of the β‐estimates by using the semiparametric PIM theory,
the sample variance of the simulated
,
the average of the sandwich variance estimates by using the semiparametric PIM theory, EC the empirical coverage of a 95% confidence interval for β,
the average of the semiparametric proportional hazards estimates,
the sample variance of the simulated
,
the average of the maximum likelihood estimates and
the sample variance of the simulated
.
5.2. Power
(25)
are 0–1 dummy variables that, for example, code for two treatment groups, active treatment and placebo, say, and X2 and
refer to a continuous covariate, age, say. The no‐treatment‐effect null hypothesis
is of interest. It expresses that, among patients of the same age, the chance that a treated patient's response is better than the response of an untreated patient is 50%. To our knowledge hardly any statistical tests have been described in the literature for this problem. In Section 1 we have discussed the most important competitors. In this simulation study we have opted for the test of Brumback et al. (2006). Their test is also semiparametric, but it is limited to testing the no‐treatment‐effect null hypothesis in the presence of covariates, whereas our framework allows for a broad range of extensions. Their method can be embedded in a particular PIM,
(26)
. Their test is based on the test statistic
, where
is their estimator of δ1 and S1 is an estimator of the standard error of
which is obtained by the bootstrap. For computational reasons we limit the bootstrap procedure to 200 runs.
Three simulation scenarios are described next. For each scenario a parametric or a semiparametric test is included as a competitor test.
- (a)
are IID
, 1). The data are analysed by least squares in a marginal linear model with conditional mean
, by the PIM (25) with probit link function and by Brumback's bootstrap test based on equation (26) with probit link. The least squares results serve as an indication of the best powers that can be expected. The geepack R package (Hø jsgaard et al., 2005) is used to fit the marginal model.
- (b)
are IID Exponential
. The data are analysed by partial likelihood in a proportional hazards model with hazard function
, by the PIM (25) with logit link and by Brumback's bootstrap test based on equation (26) with logit link. The powers with the partial likelihood method may be considered as corresponding to a semiparametric competitor of the PIM, although the proportional hazard model does not coincide with the class of PIMs: they express different restrictions on the distribution fY|X.
- (c)
are IID Logistic(
), for which the latent response variable Zi is discretized into four ordered categories as described in section 6.2 of Agresti (2007). The resulting ordinal response is denoted by Yi. The data are analysed by maximum likelihood in the proportional odds model
, by the PIM (25) with logit link and by Brumback's bootstrap test based on equation (26) with logit link. The R package MASS (Venables and Ripley, 2002) is used to fit the proportional odds model.
The following design is considered. The covariate X1 is a 0–1 balanced dummy variable, X2 is equally spaced over [0.1,10] and α1 takes the values 0, 0.5 and 1 whereas α2 is fixed at 1. Sample sizes of 20, 50 and 200 are considered. All tests described above are applied for testing
versus
,
versus
or
versus
. All tests are applied at the 5% level of significance. Table 4 shows the empirical powers based on 1000 Monte Carlo simulation runs. For a small sample size (n=20) Brumback's test shows complete breakdown by showing virtually no power, and the tests based on the PIM are liberal. The tests based on least squares are also liberal under model 1. When n=50 all tests have sizes that are not too far from the nominal level of 5%, but the PIM‐based tests are often still slightly liberal and Brumback's test is often still conservative (although not for model 3). When n=200 all tests are nearly unbiased. The powers of the tests in the PIM framework are generally larger than those of Brumback's test. The test based on least squares (model 1), partial likelihood (model 2) and maximum likelihood (model 3) are slightly more powerful, as expected.
| α 1 | Powers for the following data‐generating models: | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| 1 | 2 | 3 | |||||||
| PIM | LS | BT | PIM | PL | BT | PIM | ML | BT | |
| n=20 | |||||||||
| 0.0 | 7.6 | 9.5 | 0.0 | 9.7 | 4.3 | 0.0 | 10.8 | 4.5 | 2.2 |
| 0.5 | 15.0 | 27.3 | 0.0 | 22.7 | 16.2 | 0.0 | 14.1 | 7.3 | 2.8 |
| 1.0 | 45.9 | 72.3 | 0.2 | 42.3 | 44.4 | 0.0 | 25.3 | 16.8 | 4.8 |
| n=50 | |||||||||
| 0.0 | 5.7 | 6.4 | 2.0 | 8.1 | 6.4 | 3.3 | 7.7 | 5.1 | 4.9 |
| 0.5 | 35.3 | 50.6 | 24.4 | 30.1 | 38.4 | 17.5 | 18.3 | 15.6 | 12.9 |
| 1.0 | 89.5 | 97.5 | 78.7 | 76.0 | 89.2 | 57.6 | 39.7 | 37.5 | 33.4 |
| n=200 | |||||||||
| 0.0 | 4.7 | 5.3 | 4.2 | 4.8 | 4.7 | 4.1 | 7.1 | 6.1 | 6.4 |
| 0.5 | 93.4 | 98.0 | 91.0 | 77.1 | 93.3 | 75.3 | 36.8 | 37.5 | 35.6 |
| 1.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 88.5 | 88.8 | 87.4 |
- †For each scenario the power of PIM is compared with a traditional regression technique: least squares, LS, partial likelihood, PL, maximum likelihood, ML, or the bootstrap, BT.
6. Examples
To illustrate the interpretation of the PIM we present several examples. In Section 6.1 we present the data analysis for a continuous response and two predictors showing interaction. The example of Section 6.2 has an ordinal response variable and two predictors with no interaction. An example data set with a continuous heteroscedastic response variable and one single continuous regressor is presented in Section 6.3. All PIMs are defined for the lexicographical order relation and they all satisfy the antisymmetry condition; see Section 3 for more information. For notational convenience we drop the conditioning in the PI notation. All hypothesis tests are performed at the 5% level of significance and all computations are performed with the R software (R Development Core Team, 2010).
6.1. Childhood respiratory disease study
The ‘Childhood respiratory disease study’ is a longitudinal study following pulmonary function in children. We consider only the part of this study that was provided by Rosner (1999). The response variable is the forced expiratory volume FEV, which is an index of pulmonary function measured as the volume of air expelled after 1 s of constant effort. Along with FEV (litres), the covariates AGE (years), HEIGHT (inches), SEX and SMOKING status (1 if the child smokes; 0 if the child does not smoke) are provided for 654 children of ages 3–19 years. See Rosner (1999), page 41, for more information. The primary focus is on the analysis of the effect of smoking status on pulmonary function. Fig. 2 displays FEV as a function of the AGE and SMOKING status; note that all very young children are non‐smokers. The WMW test is a natural choice. However, it is believed that age may be a potential confounder, and thus the effect of smoking on FEV should be adjusted for age. This is illustrated in Fig. 3, which shows density estimates of the FEV distributions for several combinations of smoking status and age. Fig. 3 also suggests an interaction between age and smoking status. It is also of interest to quantify the effect of age.

FEV as a function AGE for (a) smokers and (b) non‐smokers

Kernel density estimates of the FEV‐distributions for smokers (
) and non‐smokers (
) of age (a) 12 years, (b) 13 years, (c) 14 years and (d) 15 years: the densities are estimated by using a Gaussian kernel with a bandwidth of 0.5; beneath each kernel density plot is a rug plot to identify better the individual sample observations that are used for the density estimation
(27)Table 5 gives the model fit with ordinary least squares. Since the residual plot (which is not shown) indicates non‐constant variance of the error, we also analyse the data by using weighted least squares (see Table 5). The weights were obtained by fitting the absolute residuals of ordinary least squares in a linear regression model with the fitted values of ordinary least squares as the regressor.
| Parameter | Estimate | Standard error | p‐value |
|---|---|---|---|
| Linear regression model ordinary least squares | |||
| Intercept (α0) | 0.25 | 0.083 | 0.002 |
| AGE (α1) | 0.24 | 0.008 | <0.001 |
| SMOKE (α2) | 1.94 | 0.41 | <0.001 |
| AGE * SMOKE (α3) | −0.16 | 0.03 | <0.001 |
| Linear regression model weighted least squares | |||
| Intercept (α0) | 0.32 | 0.054 | <0.001 |
| AGE (α1) | 0.24 | 0.007 | <0.001 |
| SMOKE (α2) | 1.84 | 0.51 | <0.001 |
| AGE * SMOKE (α3) | −0.15 | 0.03 | <0.001 |
| PIM | |||
| AGE (β1) | 0.61 | 0.03 | <0.001 |
| SMOKE (β2) | 5.31 | 1.04 | <0.001 |
| AGE * SMOKE (β3) | −0.46 | 0.08 | <0.001 |
With weighted least squares the effect of smoking on the mean level of FEV, while controlling for age, is estimated as 1.84−0.15 AGE. If we consider, for example, the age categories 12, 13, 14 and 15 years from Fig. 3, the effect of smoking on the mean FEV is estimated by 0.01, −0.14, −0.29 and −0.45 respectively, and the 95% confidence intervals are given by [−0.19,0.21], [−0.33,0.05], [−0.49,−0.09] and [−0.68,−0.21]. Thus for the ages of 14 and 15 years the mean FEV of non‐smokers is significantly larger. When the smoking status is fixed, the mean FEV is estimated to change by 0.24−0.15 SMOKE when age increases by 1 year. For non‐smokers this effect is thus estimated by 0.24 with a 95% confidence interval of [0.22,0.25], whereas for smokers this is 0.082 with 95% confidence interval [0.009,0.156]. Fig. 3 suggests that, while controlling for age, smoking not only affects the mean. The effect of smoking is also visible in higher order moments. The PI is well suited to quantify effects that do not act on one single moment of the response distribution.
(28)The model has no intercept, because, when AGE*=AGE and SMOKE*=SMOKE, the model must give
. The parameter estimates are presented in Table 5. For a fixed age, the probability of having a smaller FEV, as a non‐smoker as compared with a smoker, is estimated as
. This illustrates that the effect of smoking on the PI depends on age. For the age categories 12, 13, 14 and 15 years from Fig. 3, the estimated probabilities of having a smaller FEV for a non‐smoker are 46%, 35%, 26% and 18% respectively, with 95% confidence intervals [35%,57%], [26%,45%], [18%,35%] and [11%,27%]. Thus if the age increases it becomes less likely that smokers have a larger FEV than non‐smokers. This effect is significant at the 5% level of significance for ages of 13, 14 and 15 years.
In contrast, if the smoking status is fixed, the probability of having a larger FEV when age increases by 1 year is estimated as
. Thus for non‐smokers this probability is estimated by expit(0.61)=65% whereas for smokers this drops to expit(0.15)=54%. The 95% confidence intervals are given by [63%,66%] and [50%,57%] respectively.
The PIM, just like any parametric or semiparametric regression model, expresses restrictions on the joint distribution of the response and the covariates. As for any other regression model, it is important to assess the validity of the model for a given data set. For this purpose we propose a simple graphical diagnostic tool which is based on a lack‐of‐fit method for logistic regression models (Hosmer and Lemeshow, 1980; Lemeshow and Hosmer, 1982; Hosmer et al., 1988). When the model fits the data well, we expect that the predicted probabilities are close to the observed (empirical) probabilities. Thus a plot of the former versus the latter could serve for graphical model fit assessment. Hosmer and Lemeshow (1980) proposed to calculate the empirical probabilities within groups of observations. In particular, observations with similar predicted probabilities are grouped by partitioning the [0,1] interval of the predicted probabilities on the basis of their deciles. For each interval the average predicted probability and the empirical probability are calculated. Fig. 4 shows the diagnostic plot; it suggests that the PIM fits the data well. As the pseudo‐observations are not mutually independent, the distribution theory of the Hosmer–Lemeshow goodness‐of‐fit test does not directly apply to our setting.

Diagnostic plot for the respiratory disease data: the plot shows the average predicted PI according to the fitted PIM versus the empirical PI; the grouping is based on the deciles of the predicted PI
6.2. Mental health study
(29)
(30)| Parameter | Estimate | Standard error | p‐value |
|---|---|---|---|
| PIM (29) | |||
| SES (β1) | −0.74 | 0.34 | 0.03 |
| LI (β2) | 0.20 | 0.07 | 0.006 |
| Cumulative logit model (30) | |||
| Intercept 1 (μ1) | −0.28 | 0.64 | 0.66 |
| Intercept 2 (μ2) | 1.21 | 0.66 | 0.07 |
| Intercept 3 (μ3) | 2.21 | 0.72 | 0.002 |
| SES (α1) | 1.11 | 0.61 | 0.07 |
| LI (α2) | −0.32 | 0.12 | 0.008 |

Diagnostic plot for the mental health data: the plot shows the average predicted PI according to the fitted PIM versus the empirical PI; the grouping is based on the deciles of the predicted PI
The PIM analysis shows that, at the 5% level of significance, SES and LI have significant effects on the MI‐score in terms of the PI. With
we conclude that, of people with equal LI, someone with a high socio‐economic status has an estimated probability of expit(−0.74)=32% to have a larger MI‐score than someone with a low socio‐economic status and a 95% confidence interval is given by [20%,48%]. People with a low socio‐economic status are thus more likely to be mentally impaired than others with a high socio‐economic status, while all having the same LI. The effect of LI on MI can be estimated by the probability
. In particular, among people with the same SES, those with an LI of 1 unit smaller than the LI of another group of people have a smaller MI‐score with estimated probability expit(0.2)=55%, with a 95% confidence interval of [51%,59%]. Thus, the larger the LI, the more likely someone is to be mentally impaired.
(31)

6.3. Food expenditure data set
(32)
(p<0.001 and 95% confidence interval [0.34,0.44]). This analysis supports Engel's hypothesis. Indeed if the household income is 500 BEF then the probability of larger food expenditure with a household income of 600 BEF is estimated as 76% with a 95% confidence interval of [74%,79%]. When we compare households with 1200 and 1300 BEF this estimated probability drops to 69% with a 95% confidence interval of [66%,71%]. This is an example of the negative effect modification of the increasing error variance (see Section 4.1). Figs 6(b) and 6(c) illustrate this phenomenon. As the data set contains no two households with exactly the same income, an observation with income u is assigned to income v if |u−v|<50 BEF.

(a) Scatter plot of the food expenditure data with a fitted linear regression line, and non‐parametric Gaussian kernel smoother density estimates with bandwidths (b) 20 and (c) 50 of the food expenditure for household incomes (b) 500 (
) and 600 (
) and (c) 1200 (
) and 1300 BEF (
): beneath each kernel density plot is a rug plot to identify better the individual sample observations that are used for the density estimation; the notation P{Y(500)<Y(600)} and P{Y(1200)<Y(1300)} is used as a compact notation for the PI
The diagnostic plot is presented in Fig. 7; it shows convincingly a very good fit.

Diagnostic plot for the food expenditure data: the plot shows the average predicted PI according to the fitted PIM versus the empirical PI; the grouping is based on the deciles of the predicted PI
7. Conclusion
We have introduced a general class of semiparametric models for the PI. The models apply to continuous and ordinal response variables. The parameters of the PIM have direct and informative interpretations that have been illustrated on four data sets. The PIM framework may be considered as a generalization of the area under the curve regression models of Dodd and Pepe (2003) and of the related covariate‐corrected WMW test of Brumback et al. (2006). It extends these methods by providing a more flexible model formulation that
- (a)
not only applies to the comparison of response variables for two treatment groups,
- (b)
is not restricted to continuous responses and
- (c)
includes a consistent estimator of the covariance matrix of the parameter estimators without relying on the bootstrap method.
The asymptotic theory that we have presented is based on the work of Lumley and Hamblett (2003), using the concept of sparse correlation. The estimating equations make use of the score function of regression models under the working independence condition. Although this choice results in consistent and asymptotically normally distributed parameter estimators, it does not guarantee semiparametric efficient estimators. In future research we plan to improve the methods further by the construction of efficient score functions. The results of our simulation study demonstrate that the theoretical properties of the parameter and variance estimators apply well to moderately sized samples, and that the powers of our tests are quite good.
The semiparametric PIMs are flexible, but, as for all regression models, they impose some restrictions on the conditional distribution of the response variable. Therefore we have proposed a simple graphical diagnostic tool that is based on the ideas of Hosmer and Lemeshow (1980). The development of more formal lack‐of‐fit tests for the PIMs may be an interesting direction for future research. In particular, we believe that the ideas of Deschepper et al. (2006) and Hart (1997) may be helpful. Another restriction in the present definition of the PIMs is the linearity in the predictor, which in our current model is formulated in the spirit of generalized linear models (McCullagh and Nelder, l989). Future extensions may involve non‐parametric regression terms which may be estimated by means of kernel smoothers, splines or any other type of non‐parametric estimator, eventually resulting in PIMs that resemble generalized additive models (Hastie and Tibshirani, 1990) for the PI.
In the present paper PIMs are only defined for use with mutually independent observations. Extensions to clustered and longitudinal data would also be very useful. This may involve the introduction of random‐effect terms in the linear predictor, or it may be accomplished through extensions of the estimating equations.
Finally, we want to stress that PIMs are not to be considered as a competitor of other classes of statistical models. We rather think that the PIM framework is a valuable addition to the statisticians’ toolbox which may be used whenever the PI is chosen as a meaningful scale for the formulation of the research question.
Acknowledgements
This research was supported by Interuniversity Attraction Pole research network grant P6/03 of the Belgian Government (Belgian science policy). The authors also thank Stijn Vansteelandt for interesting discussions, and the referees for very constructive comments.
References
Discussion on the paper by Thas, De Neve, Clement and Ottoy
Thomas Alexander Gerds (University of Copenhagen)
I am pleased to welcome this paper to the Society. At a first glance the probabilistic index model (PIM) is the instrument that has always been missing in my toolbox: a multiple‐regression model which generalizes the Wilcoxon rank sum test. If it is as indicated, we can now leave the multiple (normal) linear regression model and use a PIM as a robust alternative. I believe that this class of models will have a significant influence on applied statistical work. Let me outline two arguments.
- (a)
PIMs will be used by young statisticians. Let us think about a young statistician as someone working with a default toolbox equipped with default tools composed according to their type and place of education. Such a person aims to apply the correct tool to a given problem and may believe that it is wrong, say, to apply a t‐test when the outcome is a discrete variable. Hence, a young statistician applies a PIM when the task is to do multiple‐regression analysis of apparently not normally distributed outcomes. However, a PIM certainly cannot solve all the problems of experienced statisticians who know how to apply the wrong tools and still arrive at sound conclusions.
- (b)
PIMs will find their way into medical statistics. The generic argument of Thas and his colleagues is that their models provide effect measures which have an intuitive interpretation. They also make an important connection to area under the curve regression. The relationship can be extended to the concordance index which generalizes the area under the curve. The concordance index is widely used to assess the discrimination ability of risk prediction models (Harrell et al., 1996). It is usually defined as the probability that the risk predicted for person i is greater than that for person j given that the event occurs earlier for person i, i.e.
From the PIM perspective a more natural formulation is to condition the order of the outcome on the order of the predicted risks:
This latter formulation is appealing since one does not condition on the future. It can be noted that if both the predicted risks and the event times are continuous variables then
and hence the two formulations are equivalent. A potentially interesting new application of PIMs is to test whether a biomarker X improves the predictive ability of the prediction model with a suitably formulated ‘concordance index model’, e.g.
Thas and his colleagues discover a fascinating relationship between the PIM and the Cox model in Section 4.2 but they do not deal with censored data. In survival analysis we observe only min
where Ci is the censoring time. A necessary first step is to truncate the pseudovalue at a time t where the probability of being uncensored is positive:
Still, the value of the pseudovalue is unknown for pairs where
. To deal with censored data one possibility would be to apply inverse probability weighting. Here I propose a different approach. The idea is to construct a pseudovalue for the pseudovalue following Andersen et al. (2003). Apart from correction terms it is given as.where
is the Kaplan–Meier estimate calculated with all the data and
is the Kaplan–Meier estimate when the data from the jth patient have been removed. I conjecture that if censoring is independent of the covariates and the event times then one can argue by using a second‐order von Mises expansion of the Kaplan–Meier estimator as in Graw et al. (2009) to show that
Under the usual regularity conditions on the link function, estimating equations based on the pseudo‐pseudovalue will be asymptotically consistent. Note that in uncensored data the pseudo‐pseudovalues are equal to the pseudovalues. I close my discussion with a couple of remarks.
- (i)
Thas and his colleagues treat ties rigorously throughout their paper. A potentially important further distinction is between ties that occur due to observational imprecision and real ties where the underlying characteristics are equal for some individuals.
- (ii)
In Section 6, pseudocalibration plots are used to assess goodness of fit. Thas and his colleagues note a problem with the arbitrary grouping that is inherent in these plots and in the Hosmer–Lemeshow test. To avoid arbitrary grouping one could use non‐parametric smoothing (Le Cessie and Van Houwelingen, 1991), or measure the calibration by the expected value of a strictly proper scoring rule (Gneiting and Raftery, 2007). For example, one could measure calibration by the average mean‐squared pseudoerror
The PIM should score below the benchmark of 25% which is obtained when 50% chance is predicted for the event
independently of i and j.
- (iii)
To improve the interpretation of PIMs further one could introduce an offset into the probabilistic index: P(Y*Y−ɛ). Then the regression parameters in a suitably defined PIM would express the effects of predictor variables on the probability that the outcome will be reduced by at least ɛ, which could be a clinically meaningful change. In summary, I think that Thas and his colleagues have provided us with a new hammer for the default toolbox. It gives me great pleasure to propose the vote of thanks.
Stephen Senn (Centre de Recherche Public de la Santé, Strassen)
We need many ways of looking at data and a technical exploration of an alternative approach to modelling with a chance for discussion should always be welcome to this Society. As such it is a pleasure to second the vote of thanks for this interesting paper. It is, however, the tradition of this Society for seconders to be critical and although I think that it will be good for the applied statistician to know that these techniques exist and have been developed in detail I also think that will usually be wise for the statistician not to use them (Senn, 2011).
Before explaining why, I draw attention to some connections. The authors use the term probabilistic index (PI). They refer to the fact that individual exceedence probability has been used before but do not give the reference, which I now provide (Senn, 1997). Recently Buyse (2010) has proposed a multivariate version called the proportion in favour of treatment. More important, however, is that, in the context of longitudinal data and factorial experiments, there is an extensive treatment of relative treatment effects using normalized empirical distribution functions in the beautiful book by Brunner et al. (2001). I urge the authors to study this as I believe that they will find many interesting connections to their work. The indicator function that is defined at the beginning of Section 3.1 is essentially, of course, the Heaviside function H(d),d=Y*−Y, and this raises the possibility of a close connection to the very extensive theory of counting processes applied to survival analysis. Of course the authors themselves develop a connection in Section 4.2 and, indeed, Kalbfleisch and Prentice (1980) to whom they refer used H(d).
I illustrate my reasons for distrusting the PI as a measure of effect by looking at the first example. First, note that violation of the linearity assumption for this example is a red herring. The technique proposed does nothing to deal with this. If the conditional distribution of the response depends on the dose in the way implied, then there is no universal effect of a 5‐g change whether measured by the PI or more conventionally. Furthermore, the reference to the ordinal nature of the Beck depression inventory is also misleading. This is a sum of 21 items and since change from baseline is used a 21st linear operation has been added unnecessarily to the 20 already performed in its construction. If the Beck depression inventory change score is truly ordinal it only is so because it is interval. Furthermore, the use by the authors of change scores immediately raises a worrying issue. Suppose that we take the simplest case where we assume that although baseline is related to outcome it is unrelated to dose (if this is not so then any modelling approach will have to tread delicately). If this is so then it makes no difference (in expectation) to the conditional estimate of the ‘effect’ of dose by using conventional least squares whether we use raw outcomes only, or differences from baselines only or condition on the baselines by using them as a covariate. However, even in the best behaved of cases the PI would be quite different since it is, essentially, a signal‐to‐noise ratio: the degree of overlap depends not just on the signal but also on the noise.
The authors seem to see this as a good thing. I cannot agree. Consider a placebo‐controlled trial of an angiotensin‐converting enzyme inhibitor in hypertension with diastolic blood pressure as the outcome measures. Any one of the following will change the value of the PI even if the effect as conventionally measured is stable across patients: narrowing or broadening the inclusion criteria; taking more precise measurements; using the average of a number of measurements; using the difference from baseline; stratifying. Can this be a good thing? Can physicians let alone patients interpret the resulting PI? Consider the authors’ first example. It is certainly a challenge to explain what this 70.2% is. Is it the probability that a randomly chosen patient will improve his or her value if given 5 g extra?: no. Is it the probability that such a patient will benefit from taking 5 g extra?: no. Is it an inherent property of the treatment?: no. It is a combined property of the treatment, the variability of the way that we measure it and the variability of the patients we happen to have recruited into a study for which we did not use random sampling. I shall repeat what I have said in discussion of such measures previously: only those who misunderstand them will find them simple (Senn, 2006).
Of course, all statisticians use measures that are not collapsible: odds ratios and hazard ratios (Ford et al., 1995; Gail et al., 1984; Robinson and Jewell, 1991) are cases in point. However, one justification for using such measures is that they should be interpreted ultimately in terms of predictions (Lee and Nelder, 2004; Senn, 2004). Whatever problems such measures have the PI will have much worse.
However, now that my grumble is over, I return to my opening comments. I found this paper interesting and stimulating, as I am sure will do many who read it and as did all who heard it delivered. In encouraging us to think again it increases our understanding of what we are doing in statistical modelling. I applaud the authors’ final remarks that the approach to analysis is not a superior substitute for conventional approaches but a possible alternative or perhaps supplement on occasion. If this is so one cannot but welcome this exposition and exploration and I am very pleased to second the vote of thanks.
The vote of thanks was passed by acclamation.
Ingrid Van Keilegom (Université catholique de Louvain)
I first congratulate the authors for this interesting paper and important contribution to the area of semiparametric regression modelling. The model proposed has many links with other known models in the literature and has the advantage of allowing the response to be discrete and even ordinal.
(33)
, E(ɛ|Z)=0 and var(ɛ|Z)=σ2(Z). Indeed, we can write
. This shows that the probabilistic index model (PIM) is a special case of a transformation model. The transformation methodology has been quite successful and a large literature exists on this subject for parametric models; see for example Carroll and Ruppert (1988) among many others. To estimate β, we can now proceed as follows. Define

and
, and then define the estimator
by solving the system of equations

can be obtained from Chen et al. (2003), who developed primitive conditions for the asymptotic normality of any semiparametric Z‐estimator.
(34)
and
. By writing the PIM in this way, it becomes a special case of the model that was studied by Grigoletto and Akritas (1999), except that the function φ depends on X here.
Since models (33) and (34) are well known and have been well studied in the literature, they can be helpful in determining the semiparametric efficiency bounds of the PIM. However, the estimation of these models builds almost inevitably on the estimation of conditional functions (h and σ2 for model (33), and φ for model (34)), which can be a difficult task involving for example the delicate choice of smoothing parameters, whereas the estimation method that is proposed by the authors does not rely on any smoothing methods.
Lori E. Dodd (National Institute of Allergy and Infectious Diseases, Bethesda) (© US Government)
The probabilistic index (PI) arises naturally where relative orderings of outcomes from pairs of observations can be assigned. PI‐like indices have been used extensively in psychophysics, in which subjective readers may be unable to assign scores directly but can rank pairs of images with respect to ‘signal’ or ‘noise’ in what are referred to as two‐alternative forced choice experiments (Green and Swets, 1966). In clinical trials, the PI has been proposed as a clinically intuitive way of combining multiple outcomes (Follmann, 2002). For example, a rule‐based method to combine the outcomes of death and hospitalization might proceed as follows.
- (a)
Death is the worst outcome; earlier death worse than later.
- (b)
Among survivors, hospitalization for disease is the worst outcome, with early hospitalization worse than later.
Patient outcomes can be naturally ordered and covariate effects evaluated by using ‘pairwise ordering regression’ (POR) (Follmann, 2002), which is similar to probabilistic index models (PIMs).
Thas and his colleagues nicely demonstrate the relationship between PIMs and standard linear regression, proportional hazards (PH) and rank regression models. For PHs, the PIM covariate effects provide an alternative interpretation that may be easier to explain to collaborators. Follmann (2002) showed a similar connection between logistic PIMs and PH models in POR, but POR allows for censoring. Alternative models of the PIM can be obtained by considering the PIM as an expected placement value—i.e. the expected ‘place’ in the conditional survivor distribution function,
(Pepe and Cai, 2004; Cai and Dodd, 2008), This interpretation suggests alternative estimating equations that may be more efficient than those developed by Thas and his colleagues.
(35)
(36)It was not clear whether the authors would expect a lack of coherency, in the sense that
for (SES,SES*)=(1,0). However, the fitted PIs for this case demonstrate coherency for all (SES,SES*). Now, consider two different orderings SES*SES and no ordering. Within a given ordering, coherency holds, as can be seen in Table 7. Results are presented in terms of β1 and
because they describe the effect on SES and SES* respectively.
| SESSES* | SES*SES | No ordering | |
|
(−0.04, 0.60) | (−0.53, 1.09) | (0.70, 0.89) |
The estimates in Table 7 imply different relationships between the covariates and the PI. What is the preferred ordering? It may be SESSES* because the PI is defined as P(MIMI*) but the motivation should be made more explicit.
I conclude with two final cautionary notes about PIs. First, it is well known that, when receiver operating characteristic curves cross, conclusions about covariate effects on the PI become more complex, as this phenomenon can mask true covariate relationships on SY|X. Graphical procedures displaying receiver operating characteristic curve regression models may provide a complementary tool for diagnosing this phenomenon (Pepe, 2000). Additionally, Hand (1992) cautioned against using the PI for causal effects and provided examples for which the PI would lead to the incorrect conclusion about which of two treatments is better.
Wicher Bergsma (London School of Economics and Political Science) and Marcel Croon, Jacques A. Hagenaars and Andries van der Ark (Tilburg University)
We would like to point out the relationship to Bergsma et al. (2009), where probabilistic index models (PIMs) were introduced under the name of Bradley–Terry‐type models, and full maximum likelihood for fitting and testing with categorical variables was used. Below we also point out possible interpretational problems with certain PIMs, and how to avoid these.
(not necessarily independently or identically distributed). Being ordinal, the Yi are only meaningful comparatively, i.e. an individual Yi has no meaning. However, a set of meaningful sufficient statistics is




We see that models based on the Lij or the PIij are truly ordinal, in contrast with, for example McCullagh's logistic models and normal threshold models, which assume that ordinal data are realizations of some underlying interval level variable.
’. However, a problem is that it is possible that

and
, i.e. the inequality relation is intransitive. For PIM (31) in the paper an intransitive solution arises if
,
>0, in which case MIi<
and
.

(37)
(38)
(39)
Bergsma et al. (2009) considered a very broad class of models, which includes PIMs, and derived multinomial ‘maximum’ likelihood equations. These equations apply to PIMs for the case that the response variable is categorical. However, the Lagrangian algorithm that was described there (and implemented in Bergsma and Van der Ark (2009)) appears to suffer from numerical problems when covariates are continuous. We wonder how a full likelihood method could be implemented for the continuous case.
Stijn Vansteelandt (Ghent University and London School of Hygiene and Tropical Medicine)
I thank the authors for an interesting and stimulating paper. When interest lies in the effect of treatment A (1, treatment; 0, no treatment) on outcome Y, covariate‐adjusted probabilistic indices have been suggested to avoid attenuation of the estimated treatment effect (Brumback et al., 2005), to boost its precision (Brumback et al., 2005) or to adjust for confounding by Thas and his colleagues. I shall reflect on these various suggestions.
I remind the reader that covariate adjustment is a subtle consideration, even when the treatment is randomly assigned. The unadjusted analysis then targets the marginal probabilistic index
, which expresses how likely it is, if one picks two random individuals and randomly chooses to treat one (in which case we observe Y*(1)) but not the other (in which case we observe Y(0)), for the untreated individual to score lower. The adjusted analysis targets the same comparison, but for two random individuals with the same covariate value L. As one adjusts for increasingly more baseline predictors of the outcome, the covariate‐adjusted probabilistic index will tend to move increasingly further from 0.5, and to come increasingly closer to the within‐subject comparison
, which expresses how likely it is that a random individual would score lower if untreated than if treated. Although such within‐subject comparison may be the statistician's ultimate dream, interpretation as such is always hindered by the fact that one will never know how close the approximation is.
. Adjustment for confounding may now alternatively happen by calculating
(40)This has the advantage that it relies instead on a model for the propensity score P(A|L), which is arguably easier to specify than the dependence of the probabilistic index on the covariate values of two individuals. Using semiparametric efficiency theory under the model defined by the sole restrictions of a propensity score model, more efficient estimators of
can be constructed. Application of a semiparametric efficient estimator would guarantee that the adoption of auxiliary covariate information boosts precision. When the exposure is randomly assigned, the resulting inference would be (asymptotically) distribution free, because of the estimator's reliance on the known randomization probabilities P(A|L) (see for example expression (40) and Zhang et al. (2008)), in contrast with inference under (covariate‐adjusted) probabilistic index models which does not exploit that knowledge.
The following contributions were received in writing after the meeting.
David Draper (University of California, Santa Cruz)
The authors have offered an interesting semiparametric approach to regression modelling based on their probabilistic index (PI). However, I do not see that this technique offers significant gains when compared with existing Bayesian non‐parametric fitting methods. Consider, for instance, the authors’ example examining the relationship between Beck depression inventory (BDI) improvement and dose of quetiapine in Section 1, which is illustrated in the paper's Fig. 1(a). The authors point out correctly that their PI analysis is superior to a naive linear regression, in two ways: their approach attempts to capture non‐linearity in a particular way, and it also attempts to respond to the evident heteroscedasticity. However, Fig. 8 presents the results from fitting a treed Gaussian process model (Gramacy and Lee, 2008) to this data set, using the freeware R function btgp‘straight out of the box’, with no special tuning or other user intervention. This is a Bayesian non‐parametric technique that finds an optimal partition of the X‐space, for fitting Gaussian process regression models to the separate regions identified by the partition. The treed Gaussian process analysis automatically adapts to the heteroscedasticity and non‐linearity in this data set, and in so doing it reveals an important scientific finding that was not discovered with the authors’ PI approach: the improvement in BDI is constant in the quetiapine dose up to about 19 g, above which it is approximately linear with a slope of about 0.5 BDI points per gram.

Bayesian treed Gaussian process, fit to the relationship between DOSE and BDI improvement in Fig. 1:
, estimated underlying mean function;
, 90% uncertainty bands;
, optimal partition
In obtaining these results, I did not instruct btgp to find a specified number of partition sets, as defined by the dose variable, or where to locate the ‘change‐points’; the algorithm correctly deduced that the optimal number of separate Gaussian process models to fit in this case is 2. I say ‘correctly’ and ‘optimal’ because—in an analysis not presented here in more detail, because of space limitations—I generated 100 simulated data sets, each with 49 observations, matching the structure of the BDI improvement by dose data set (with one change‐point randomly located between 16 and 22 g, a constant relationship to the left of the change‐point at a value varying randomly from 2 to 8 BDI improvement points, a linear relationship to the right of the break point with a slope varying randomly from 0.25 to 0.75, and heteroscedasticity values chosen randomly from ranges similar to those in the observed data set), and btgp identified the correct structure in 93 of these 100 replications.
Michael P. Fay (National Institute of Allergy and Infectious Diseases, Bethesda) (© US Government)
Although Thas and his colleagues discussed the k‐sample case, it is helpful to compare the probabilistic index models (PIMs) and linear models in the simple three‐sample case to point out some non‐intuitive behaviour of the PIMs. Let Y(a) be a random response from group a, and the associated covariate be X(a), a 3×1 vector with the ath element equal to 1 and the others 0. The associated linear model has E(Y|X)=XTμ, where
, and the model imposes no additional structure on the means. For comparing groups a and b in the linear model, we use the difference
. So knowing μ allows us to obtain any pairwise comparison between the groups.
Now consider a PIM for the three‐sample case. Let
and let
. Suppose that our model of Pab is
, where
. For this model, the comparison between group a and b gives
. As with the linear model, knowing β we can model all three pairwise comparisons,
and β13, and if we know two of the three pairwise comparisons we can obtain the third. Further, since there are only three unique pairs for comparisons, it would appear that three parameters would not impose any additional structure. This is not true, since there are distributions for which the PIM model above does not fit the data.
Consider three discrete distributions, each with three possible values which occur with equal probability. Here are the possible values: group 1 (1,5,9), group 2 (2,6,7) and group 3 (3,4,8). Then
and
. This is an example of the intransitivity of the PI (see for example Brown and Hettmansperger (2006)). If we try to fit the model
to this scenario, then there are no values of the parameter vector β such that
. So with these three distributions our model is misspecified.
Asymptotically, with equal numbers in the three groups, it seems that the estimates of
and β3 would all approach 0. For large samples, would we erroneously conclude that
in the presence of the third group, but conclude that P12≈5/9 if we did not observe the third group? Perhaps diagnostic plots are important even in very simple cases where we might not use them in linear models.
Dean Follmann (National Institutes of Allergy and Infectious Diseases, Bethesda) (© US Government)
I very much liked this paper. It gave a thoughtful development of a flexible probabilistic index model (PIM) approach, explored connections with other methods, had nice theoretical results and gave three substantial examples. I also am hopeful that this approach becomes part of an applied statistician's toolbox because I think that there are settings where it will be the perfect choice.
In this comment I wanted to expand on an aspect of this approach that I became painfully aware of when working on a similar method (Follmann, 2002). Under a simple version of a PIM, one postulates that the probability that outcome i is better than j is given by a logistic regression with intercept 0 and covariate
. I applied this pairwise logistic approach (PLA) to a clinical trial by using standard software and waited for the result. After a while, I quit waiting as I realized what the hitherto esoteric expression O(n2) (the order of the number of terms in the PLA likelihood) truly meant for a data set with n=4228. And, even if I were patient, I would have had to wait even longer for the covariance estimate based on O(n3) operations. Being impatient and needing an example, I decided to analyse a subgroup of 645 diabetics to illustrate the method. Unfortunately, this is not a universal solution to the problem of large n.
If we assume a proportional hazards (PH) model for the outcomes, then the pairwise logistic regression model obtains. The PLA does not imply a PH model, and thus the PH model requires a stronger assumption. But there are tempting reasons to make this assumption. First, we can just run Cox regression software on the data. Under no censoring this should involve O(n) terms for the partial likelihood. Another reason is that, under the PH model, partial likelihood gives more efficient estimates than from the PLA. To crystallize these points, I conducted one small simulation in R, for the two‐group setting with n=20 and then n=200 per group, X=0 or X=1 the group indicator, exponentially distributed outcomes and no censoring. On the basis of 1000 replications, the ratio of mean‐squared errors for the pairwise to partial likelihoods was 1.66 (n=20) and 1.31 (n=200) whereas the ratio of computation times was about 14.3 (n=20) and 1.15×104 (n=200). The PH assumption has real advantages and it is not exactly clear what additional flexibility the weaker assumption of the PLA buys us. And the PH model still allows us the nice PIM interpretation of our parameters.
Vanda Inácio (Lisbon University), Miguel de Carvalho (Ecole Polytechnique Fédérale de Lausanne and Universidade Nova de Lisboa) and Antónia Amaral Turkman (Lisbon University)
(41)
(42)
, for t ∈ T, with β=α/σ√2 being a functional parameter in this context. To estimate the functional PIM in equation (42) we only need to estimate α and σ. Cardot et al. (1999) proposed to estimate α on the basis of functional principal components, using the estimator

are the eigenfunctions associated with the K largest eigenvalues
of the empirical covariance operator of the sample
. For further details see Cardot et al. (1999). Estimation of the PIM in equation (42) is completed after obtaining

Recently, Inácio et al. (2012) have extended receiver operating characteristic curve regression methodology to the functional context. They investigated how the accuracy of gamma glutamyl transferase, as a diagnostic test to detect metabolic syndrome, is affected by the nocturnal arterial oxygen saturation, which was measured densely over the patient's sleep. It would be interesting to study this relationship by means of a (functional) PIM. For example, it would be interesting to use an estimate of the probability in model (42) as an index to compare the gamma glutamyl transferase values of someone with a ‘high’ curve of arterial oxygen saturation against someone with a ‘low’ curve of arterial oxygen saturation.
We illustrate our thoughts by means of a numerical experiment, where we simulated 100 independent data sets (sample size 100) according to model (41); Fig. 9(a) gives an idea of the shape of the predictor curves X(t), whereas Fig. 9(b) represents a hypothetical difference curve X(t)−X*(t). The true probability in model (42) under our simulated scenario is 0.710 and its average estimate (2.5%, 97.5% simulation quantiles) is 0.712 (0.677,0.746).

(a) 100 simulated predictor trajectories and (b) hypothetical difference curve X(t)−X*(t)
Tom King and Lara E. Harris (Southampton University)
For subjective listening ratings, Wolfe and Firth (2002) showed the need for modelling personal response scales. The ABX listening test remains a popular approach for subjective listening experiments for this reason and other bias problems (Zielinski et al., 2008). This is a type of two‐alternative forced choice test that was mentioned by Dodd such that listeners are presented with two excerpts A and B and asked to identify which is X. In a more general version, listeners are asked to identify which of A and B are most similar to X, repeating these tests for multiple iterations of A and B from a finite list of excerpts.
Standard approaches to analysing results test null hypotheses of no audible difference by using exact binomial probabilities (Leventhal, l986). These also allow for an estimate of the proportion of correct identifications to be made (Burstein, 1989), assuming equal allocation of forced ‘don't knows’. Multiple comparisons mean losing power without borrowing strength by using covariates. A density could be estimated by using more advanced methods to estimate a ranking but this would be opaque to many working in audio. Non‐parametric methods might be able to test for a preferred ranking but would not afford much insight into the relative support for different rankings, or the influence of covariates.
The probabilistic index model should be ideal for this type of data. The question in this instance is to test preference of bass reproduction through digital simulation of a number of loudspeaker designs. The probabilistic index model should be able to incorporate all the relevant covariates and to estimate specific preferences and to estimate design preferences as well as identifying preference variation. More details are given in Harris et al. (2012).
A. J. Lawrance (University of Warwick, Coventry)
I enjoyed this paper at the meeting but, in spite of the attractive presentation and a little reflection afterwards, I still have a few points of query. As a person without previous knowledge of the area, it is still not clear to me why a probabilistic index model (PIM) is in general a natural non‐parametric regression way to go which stands on its own two feet. I do understand that quite a few well‐known methods can be cast in the PIM way and be extended via a PIM, but this does not make it natural. The topic is regression so one would expect to see a connection to the conditional distribution of response given covariates, even if not fully specified. It seems very curious that this appears to be absent, at least on the surface, and even more so that the PIM focuses on the difference distribution of two independent response variables. That seems a very awkward way to relate to the conditional regression distribution. Nor do I know what information is being neglected by a PIM by this formulation. The lack of a connection to the conditional distribution would appear to be the reason why no sort of likelihood is available. Finally, to ride my graphical hobby horse, can I plead for common scales in comparative graphs such as in Fig. 3 and between Figs 6(b) and 6(c)? Discussion at the meeting illustrated high regard for the work and I quite expect the authors to be able to answer all my main points satisfactorily, and I look forward to the revelations.
Chenlei Leng (National University of Singapore) and Guang Cheng (Purdue University, West Lafayette)
We congratulate the authors on developing an interesting class of semiparametric models, i.e. probabilistic index models (PIMs), that directly relates the probabilistic index to the covariates. The construction of a PIM is well motivated by the ordinal response variable. We shall comment on the semiparametric efficiency issues.
, the PIM is essentially a special case of the semiparametric conditional moment model. The authors thus propose to estimate β on the basis of the quasi‐likelihood estimating equation (8) in the presence of the nuisance function fxy. For the longitudinal data modelled in the marginal generalized estimating equation framework, i.e.
for i=1,…,n and j=1,…,mi, it is not difficult to derive the efficient score function of β as
(43)
,
and
, by only assuming the conditional moment restrictions and bounded mis. The semiparametric efficiency bound trivially follows from expression (43). However, efficiency bound calculation in this paper is non‐trivial owing to the more complicated dependence structure, i.e. sparsely correlated data. We suggest that the authors modify equation (8) as an (approximate) efficient score function, according to Hansen (1985), who considered the efficiency bound under weakly dependent data, which may be solved to obtain the efficient estimate.
(44)
for the m(·) in the PIM of interest. Linear transformation models (44) have a long history. Bickel and Ritov (1997) proposed an efficient estimation of β based on rank statistic methods. Cheng et al. (1995) proposed a class of estimating equations for β under possibly right‐censored observations. Moreover, Han (1987) even allowed F to be unknown and gave the maximum rank correlation estimate
(45)Thomas Lumley (University of Auckland)
The authors use the results of Lumley and Hamblett (2003) in their proofs, I believe from my suggestion when I visited Ghent. Since that time, I have found out that related central limit theorem results were proved much earlier in the probability literature where the concept that we called ‘sparse correlation’ is described in terms of ‘graph‐structured dependence’. In particular, Baldi and Rinott (1989) gave a bound for the departure from normality of a sum of random variables in terms of the maximal degree of the dependence graph.
Jorge Mateu (University Jaume I, Castellón) and Carlos Diaz‐Avalos (National University of Mexico, Mexico City)
The authors present in a clear manner the definition and theoretical issues related to probability index models, and how they can be used to model the effect of covariates, with emphasis on the cases of categorical non‐ordered covariates. We were pleased to read the clear review of the subject that they gave and the examples shown in the paper. These are enlightening and motivate the reader to follow the subject further.
We believe that the methods shown in the paper are applicable in the area of spatial analysis. The advent of geographical information systems now makes information about spatial covariates easily available, and for spatial variables of interest, say Z, models of the type E[Z(u)]=Xβ where
are becoming common. Testing
by using probability index models is an attractive choice if one is interested in deciding whether, at some set of points
, a spatial random variable Z(u) is below a prescribed threshold value Z* representing an upper limit for water quality, for instance. Z(u) may represent a random field, a Markov random field or the intensity function of a spatial point process. Another application could be in testing the significance of spatial covariates. This is an issue of interest in several fields, such as plant ecology. To our knowledge, in the field of spatial point process modelling little has been done regarding significance tests for covariates included in the parametric models for the intensity function. Few references (Rathbun et al., 2004; Waagepetersen, 2007) have considered such problems from the fully parametric point of view but rely on their significance tests in confidence intervals resulting from asymptotic assumptions that may not be realistic in applications, so the power of the tests may be overestimated.

Joseph W. McKean (Western Michigan University, Kalamazoo)
Thas and his colleagues have presented an interesting procedure for semiparametric models. Their probability index model (PIM) relates the simple Wilcoxon–Mann–Whitney probability P(Y<Y*) to a linear function of predictors through a link function. Although, as the authors caution, it is not necessarily a competitor to robust procedures for linear or specified non‐linear models, it seems useful for a wide variety of semiparametric models. I confine my remarks to rank‐based estimation and a few remarks on pseudonorms.
The authors compare their PIM procedure with several procedures, including rank‐based (rank regression) procedures for linear models. These estimates for Wilcoxon scores are obtained by minimizing the dispersion function given in expression (17). This is equivalent to minimizing a pseudo‐norm of the residuals as discussed in section 5 of McKean and Schrader (1980); see, also, chapter 3 of Hettmansperger and McKean (2011). Abebe et al. (2010) extended these rank‐based estimates to a general estimating equation model which was discussed in Liang and Zeger (1986); see, also, section 5.5 of Hettmansperger and McKean (2011) for a sketch of this development. On the basis of their asymptotic theory, as well as empirical studies, these rank‐based generalized estimating equation estimates are robust and highly efficient. An appropriate choice of weights results in estimates that are robust in factor space. Also, the theory holds for general scores, so optimal procedures for skewed as well as symmetric error distributions are feasible. Although the asymptotic theory assumes continuous responses, the estimates can be obtained for discrete responses. Hence, a comparison of these rank‐based generalized estimating equation estimates with the authors’ PIM estimates over continuous and discrete response models should prove interesting.


So least squares estimates are invariant to observations with the same vector of covariates, similar to rank‐based and PIM estimates.
I thank the authors for their presentation of the PIM procedure. I look forward to applying it to data sets on which I am consulting and to comparing it with other procedures.
Hannu Oja (University of Tampere)
I congratulate the authors for an interesting and inspiring piece of work. It is always good to have new and different tools for statistical inference. What I consider important for further analysis and development of the approach are as follows.
- (a)
The dependence between Y and X is described through a function
such that
It is remarkable that
for all strictly increasing functions g, and therefore the tests and estimates for unknown HY|X should depend on
only through their ranks
. To find a realistic parametric model for
in a practical data analysis is a demanding task indeed.
- (b)
A natural next step could be to consider triples
instead of pairs and to define
and so on. Finally, the partial likelihood function that is used for Cox's proportional hazard model is, in the continuous case, the probability
where
are observed inverse ranks, i.e.
.
- (c)
The estimating equations in expression (8) use variances of Iij but ignore the non‐zero covariances between Iij and
. More efficient estimates could be obtained if the whole variance–covariance matrix was used to give weights for
. This is what is planned for future research.
- (d)
I am a little worried about how you define the true β‐parameter β0. In my mind, the true population value to be estimated should depend only on the conditional distribution fY|X or the joint distribution fY,X or
. It should not be a function of the sequence of design values (Xn).
I hope that the authors will be interested to develop their approach further.
Emanuel Parzen and Subhadeep Mukhopadhyay (Texas A&M University, College Station)
We are inspired by this outstanding paper about the probabilistic index (PI) to discuss an extension, the comparison mid‐probability index (CMPI). Our research (extending research by Parzen (1979, 1994, 2004) on non‐parametric quantile data modelling) is currently developing (Mukhopadhyay et al., 2011; Parzen and Mukhopadhyay, 2012) comprehensive approaches to the classification–dependence problem: observe continuous or discrete variables (Y dimension 1; X dimension p); model the conditional distribution of Y given X, the dependence between Y and X, and influential subsets of X.
To unify continuous and discrete cases, define the mid‐distribution function Fmid(y;Y)=Pr(Y<y)+0.5 Pr(Y=y). Define
where Y* and Y are independent and identically distributed. The authors’ PI compares conditional distributions Y|X and
. Our index compares the conditional distribution
with the unconditional distribution Y. When Y is continuous and X is binary, the CMPI estimates the Wilcoxon statistic; when Y is binary and X continuous, the CMPI estimates Pr(Y=1|X).
Step 1: construct (from sample distributions) marginally orthonormal score functions
and
the variance of Fmid(Y). Construct Sj(Y), j>1, by the Gram–Schmidt method from powers of S1(Y), and discrete Legendre polynomials. For vectors X, k integers k′, construct Sk(X) as the product of
of each component X′ of X.
Step 2: compute score co‐moments
.

. Measure the dependence (mutual information) of Y and X non‐parametrically by the sum of squares of LP(j,k;Y,X).
Step 4: the parametric logistic regression model for the CMPI regresses on influential Sk(X) identified from the largest co‐moments LP(j,k;Y,X); choose sufficient statistics before parameters.
Step 5: the copula density function of (Y,X) is non‐parametrically estimated by maximum entropy (exponential model) density estimation. The copula density is interpreted as the joint density of (Fmid(Y;Y), Fmid(X,X)); Fmid(X;X) has components
marginal mid‐distributions.
The authors deserve our appreciation for a path breaking and inspiring paper. Our comments aim to outline additional tools for statisticians’ toolbox of modern applied statistics, looking at data as well as modelling them.
Details and graphs can be obtained from www.stat.tamu.edu/deep/discussionPIM.pdf.
Emilio Porcu (Universidad de Valparaiso) and Alessandro Zini (University of Milano Bicocca)
We congratulate the authors for their nice paper: we have the following comments.
- (a)
The semiparametric class of models proposed enables us to understand better the statistical meaning of both the parameters of a wide range of classic models or class of them (generalized linear models and generalized additive models) and the relationships with both the applied estimators (Mann–Whitney and Wilcoxon–Mann–Whitney) and estimating techniques (quasi‐likelihood).
- (b)
The class of probabilistic index models seems not to be nested with respect to several wide classes of models.
- (c)
In some situations, it contains models taking into account the heteroscedasticity of the data, in spite of other traditional models.
- (d)
In other contexts, with respect to a competitor, the probabilistic index model proposed appears equivalent, but the convergence to asymptotic behaviour is faster.
- (e)
Referring to points (b) and (d), the authors propose, also, a simple new graphical tool, to evaluate the specification or misspecification of a candidate model.
- (f)
In our opinion, the efficiency of estimators, instead of consistency established by the authors, is a more minor question than specification or misspecification of models, even in small samples.
The authors may want to consider the following points.
- (a)
A general and crucial point is that of the choice, in the specification of a model, of the discrete (natural) scale with which some phenomenon is subjectively measured: the authors assume, for simplicity, scales on integers, subjectively chosen both by researchers and patients, implicitly claiming ‘granularity’. But, who or what guarantees equidistance about the subjective choices? From our perspective, this matter should be taken into account (endogenously) by the model. We sketch here two potential ways.
- (i)
When comparing the term β(X*−X) in function m, the following choices may be taken into account:
and
for γ ∈ R where the former choice underlies some Box–Cox transformation. The latter choice, which is coherent with a future point of research in the authors’ conclusions about non‐linearity with respect to modelling dependence from covariates, may be an interesting alternative for specific problems. Both alternatives pose the problem of equidistance in categories.
- (ii)
For a discussion about the choice of the scale, a useful reference may be Zini (2008), where the implications about ordering are discussed, though in the authors’ specific context.
Mark A. van de Wiel (VU University, Amsterdam)
I congratulate the authors for an excellent paper on this exciting and potentially very useful class of regression models. The authors show the wide applicability of probabilistic index models (PIMs) in various examp1es. Below I address a few issues.
First of all, a philosophical one: PIMs are definitely useful for ordinal responses, in particular because the ordering is then the only meaningful property of the response. To some extent this also holds for (medical) survival data, at least in many settings where modelling of absolute survival is hopeless. However, I believe that the use of PIMs for well‐characterized continuous responses is limited. It seems to me that we should use the ‘richness of the continuity’ for the response and not only its ordering. Of course, this may lead to more complex models (e.g. including heteroscedasticity), but these should give more insight on how the covariate impacts the response than does a PIM.
The ‘competition’ with parametric models becomes even more important when considering a paired or clustered setting (which the authors briefly mention in Section 7). In an unpaired setting, the power of the PIM‐based test relative to that of parametric counterparts is relatively good, because all binomial
pairs are used in the PIM statistic. A well‐known example is the high asymptotic power of the two‐sample Wilcoxon test with respect to a two‐sample t‐test. However, this relative power drops dramatically in a paired setting when only the ordering within pairs (or clusters) can be used, unless additional distributional assumptions are made.
My final concern is the relatively bad control of the type I error for the asymptotic test in the case of small to moderate sample sizes (Table 4). It should be possible to obtain better small sample results, even when multiple covariates are present. I understand the authors’ wish to avoid the bootstrap, but it would have been nice to have these results for their setting. When concentrating on one β, it seems that these could be obtained by assuming asymptotic (joint) normality for the other parameters under the assumption that β=0, which defines a sampling model, and then compare
with its bootstrap counterparts. Alternatively, approximations that use higher moments, such as Edgeworth expansions or saddle point approximations, could be explored.
Wang Zhou (National University of Singapore)
It was my pleasure to read this important and interesting paper that proposes probabilistic index models. I shall make two comments.
My first comment is about the inference of the parameter β. In theorem 1, the authors derive the asymptotic normality for their estimators βn, which satisfies equation (8). However, normal approximations are often too rough to be useful in practice for small to moderate sample sizes. To improve the inference, one may consider using some other techniques. We propose to use the empirical likelihood method.

. This is a U‐structured estimation equation. So we can use the jackknife empirical likelihood (see Jing et al. (2009)) for inference on β. To be more specific, let

, the statistic computed on the orginal data set with the ith observation removed. The jackknife pseudovalues

, a standard empirical likelihood ratio can then be constructed on
as follows:

One can prove that
as n→∞ under mild conditions, where p is the dimension of β.
My second comment is about Section 4.3, the two‐sample problem. At the beginning of Section 4.3, the authors assume that
, are independent and identically distributed. So n1 and n2 should be random. Their MW is different from the classical Mann–Whitney test statistic in which the two sample sizes n1 and n2 are fixed.
The authors replied later, in writing, as follows.
First we thank the discussants for reading our paper and for taking time to prepare interesting and very insightful comments. After having prepared for writing this rejoinder, we are more than ever aware that the probabilistic index model (PIM) method can be looked at from so many angles that it will take us, and hopefully also many other researchers, quite some time to disentangle its colourful set of flavours. For brevity we cannot respond to all the comments in detail.
We have organized this rejoinder as follows. Instead of replying to each discussant separately, we have tried to arrange our answers by topic. As not all issues could be grouped, at the end we shall briefly give some feedback to particular questions or problems.
Efficiency
Several discussants make suggestions for improving the efficiency of the parameter estimators. Ingrid Van Keilegom proposes to embed a PIM in the transformation model, because efficient estimators in such models have been described. In particular, she defines her model (33) with
. Within the transformation model framework, a PIM is presented as
. Note that this construction resembles the PIM formulation using expected placement values, as suggested by Lori Dodd. Efficient estimators can be obtained when h(·) is known, but under certain conditions h(·) may be replaced by a consistent estimator. Ingrid Van Keilegom recognizes that this may not be simple in our setting. At this point we refer to Cai and Dodd (2008), who, in developing regression methods for the partial area under the receiver operating characteristic curve came across a similar problem: the conditional distribution function
must be replaced by a consistent estimator in the estimating equation. Although this is feasible under additional smoothness conditions, we deliberately did not want to proceed along this path initially, because we fear that sparseness in the covariate space may obstruct its use in real data settings. In the next section we briefly describe a simpler version of a PIM, for which the estimation of the nuisance function h(·) becomes easier without having to introduce stringent smoothness conditions.
Marginal probabilistic index model
(46)
. We refer to model (46) as a marginal PIM. The transformation model with
now reduces to the marginal PIM. As before, h(·) is unknown, but now its estimation is straightforward without requiring many additional assumptions. In particular,
in which
is the empirical distribution function of the response variable. It would be interesting to study this model further in the transformation model setting to find semiparametric efficient estimators. Later in our rejoinder we come back to the interpretation of this model, and its relationship to the Kruskal–Wallis rank test. Finally, we also note that this marginal PIM resembles very closely the comparison mid‐probability index of Emanuel Parzen and Subhadeep Mukhopadhyay. At this point we would like to take the opportunity to thank them for their very stimulating contribution (which has been made available on their Web site) and their deep insights into the non‐parametric modelling of the comparison mid‐probability index.
, where Yi has distribution function Fi and Y* has the marginal distribution function

K‐sample problem and transitivity
(47)
. This corresponds to a dummy coding of Xj and Xk and with
as used for most examples in the paper. Let
. Then, for all j<k<l,
(48)
(49)
and
for all k=2,…,K. With this model equation (48) no longer holds and transitivity is no longer guaranteed. Hypothesis tests for testing that all βjk are 0 may be used for testing for transitivity. Whereas transitivity is often a desired property, or at least it is a convenient characteristic that, for example, always holds in location–shift models, there are settings in which it is not guaranteed or in which detecting intransitivity is even of interest. Several discussants (Lori Dodd, Wicher Bergsma, Tom King and Dean Follmann) refer to examples of studies in which the response data come immediately in the form of pairwise orderings (pseudo‐observations) that do not necessarily satisfy transitivity. This suggests that PIMs may also be useful for this type of application. It is also worth noting that some study designs ensure that the pairwise orderings (pseudo‐observations) are mutually independent so that sparse correlation is no longer an issue.
independent parameters. For both PIMs the marginal PIM becomes

, with
the estimator of βj in equation (47) (equivalent to the non‐parametric estimator of
.
Model (47) is also used by Michael Fay for illustrating that a PIM sometimes may be misspecified. We agree with him, but we remark that his example only demonstrates that sometimes transitivity does not hold. The saturated model (49) will fit his data. Goodness‐of‐fit methods may also be used for assessing the quality of the fit. Recently we (De Neve et al., 2012) have developed a new method for assessing the fit of a PIM.
Model formulation
In connection with the usefulness of goodness‐of‐fit methods, we also refer to the proposal of Emilio Porcu and Alessandro Zini. They suggest considering terms
or
in the PIM. Again a model assessment will be required to evaluate the adequacy of the model.

With this model specification, the natural least squares criterion would be the squared pseudonorm that is provided by Joe McKean in his contribution. Joe McKean gives also the corresponding L1‐pseudonorm, and he concludes that this demonstrates that least squares, rank‐based and PIM estimates share the property that observations with the same covariate patterns do not contribute to the estimate.

. In this way

.
Finally, we remark that Lori Dodd's models (35) and (36) with her lexicographical orderings SESSES* and SES*SES do not agree with our model formulation of equation (31). We explicitly restricted the PIM to a strict lexicographical ordering SES<SES* so that we do not run into the problem that she encountered.
Computation
and covariance matrix estimates
. Finally we combine the estimates into


We can demonstrate that these estimators are asymptotically equivalent to the original estimators (with fixed s<∞). The order of the computation time for estimating β reduces from O(n2) to O(n2/s), i.e. a reduction by a factor s.
The method that is described in the previous paragraph may also turn out to be useful when further research focuses on using computationally intensive methods for inference in PIMs. For example, Mark van de Wiel and Wang Zhou suggest adopting a bootstrap or an empirical likelihood procedure.
Estimating equations
Hannu Oja suggests adopting estimation equations to account for the pairwise non‐zero covariances between the pseudo‐observations Iij and Ikl. A possible way forward is to use pseudolikelihood by constructing the product of all bivariate distributions of two pseudo‐observations. We are currently exploring this path for PIMs for clustered data (a collaboration with Stijn Vansteelandt and Fanghong Zhang from Ghent University).
Inspired by the relationship between the PIM and Cox proportional hazard models, Thomas Gerds suggests further developing the PIM framework by allowing for censored data. He proposes two strategies. The first involves inverse probability weighting and the second makes use of pseudo‐pseudovalues. Here we mention only that inverse probability weighting has already been described by Cheng et al. (1995) in a class of semiparametric linear transformation models that generalize Cox proportional hazard models that make use of estimating equations similar to ours. In passing we note that the discussants Chenlei Leng and Guang Cheng made this observation too. Given the importance of missing and censored data we would very much welcome further research along the lines suggested by Thomas Gerds.
Starting from the same linear transformation model as Cheng et al. (1995) do, Chenglei Leng and Guang Cheng suggest going even one step further by not having to specify the distribution function of the additive error term in the transformation model; this would result in a maximum rank correlation estimator as in Han (1987). We have two remarks. First, with only a single covariate X, there will be no unique maximum rank correlation estimator, because the order relation restriction on the covariates will make
for all positive β and
for all negative β (or the other way around). Perhaps his problem disappears when X contains multiple regressors. Finally, on using the relationship between the error distribution and the link function, maximum rank correlation estimates may also be advertised as an appropriate method for situations in which the link function is left unspecified.
Stijn Vansteelandt argues that covariate adjustment may have several disadvantages when the primary focus is on the probabilistic index as the effect size of a treatment. For example, the interpretation of the treatment effect changes with covariates selected in the model, and the variance of the treatment effect parameter estimator may be inflated by adding covariates. As an alternative solution, he proposes to adjust for confounding by changing the estimating equation of the marginal probabilistic index by incorporating the propensity score. This approach seems to have many advantages for comparative studies, and we sincerely hope that this method will be further developed.
Unstructured responses
Wang Zhou argues that our two‐sample setting (Section 4.3) is different from the classical setting in the sense that we allow the sample sizes n1 and n2 to be random. We understand the misunderstanding. We should actually have added that the Xi are subject to the restriction
.
Jorge Mateu and Carlos Diaz‐Avalos, and Vanda Inácio and colleagues suggest extensions that are related to functional data analysis. We welcome their suggestions and we encourage further research in this important area. We also thank Thomas Lumley for pointing us to a reference that describes an asymptotic theory that sheds a different light on the concept of sparse correlation. In the interest of further extending and generalizing the PIM method to more complicated data structures, we believe that it will be necessary to find a more general asymptotic theory that can deal with the type of weak dependences that we encounter.
Throughout the paper we have stressed several times that we consider the PIM to be a valuable additional tool in the statistician's toolbox. When, however, the scientific focus is on the mean response, other regression techniques are favourable. This is, for example, illustrated by David Draper, who reanalysed the Beck depression inventory data with a treed Gaussian process model. Another method for an informative analysis of this data set is semiparametric quantile regression (Koenker, 2005).
Finally we turn to Stephen Senn. We understand his concerns related to the use of the probabilistic index as an effect size measure. His criticism applies, however, to most of the methods that focus on the probabilistic index. We hope that further research can solve the issues that he raises.
References in the discussion
Citing Literature
Number of times cited according to CrossRef: 32
- Gillian Gresham, Márcio A Diniz, Zahra S Razaee, Michael Luu, Sungjin Kim, Ron D Hays, Steven Piantadosi, Mourad Tighiouart, Greg Yothers, Patricia A Ganz, André Rogatko, Evaluating Treatment Tolerability in Cancer Clinical Trials Using the Toxicity Index, JNCI: Journal of the National Cancer Institute, 10.1093/jnci/djaa028, (2020).
- Lu Mao, Tuo Wang, A class of proportional win‐fractions regression models for composite outcomes, Biometrics, 10.1111/biom.13382, 0, 0, (2020).
- Jan De Neve, Thomas A. Gerds, On the interpretation of the hazard ratio in Cox regression, Biometrical Journal, 10.1002/bimj.201800255, 62, 3, (742-750), (2019).
- Dean Follmann, Michael P Fay, Toshimitsu Hamasaki, Scott Evans, Analysis of ordered composite endpoints, Statistics in Medicine, 10.1002/sim.8431, 39, 5, (602-616), (2019).
- Zhiwei Zhang, Shujie Ma, Changyu Shen, Chunling Liu, Estimating Mann–Whitney‐type Causal Effects, International Statistical Review, 10.1111/insr.12326, 87, 3, (514-530), (2019).
- Rasmus F. Brøndum, Martin Bøgsted, Companion Diagnostics Based on Time-to-Event Data, Companion and Complementary Diagnostics, 10.1016/B978-0-12-813539-6.00015-8, (289-305), (2019).
- Sander Greenland, Michael P. Fay, Erica H. Brittain, Joanna H. Shih, Dean A. Follmann, Erin E. Gabriel, James M. Robins, On Causal Inferences for Personalized Medicine: How Hidden Causal Assumptions Led to Erroneous Causal Claims About the D -Value , The American Statistician, 10.1080/00031305.2019.1575771, (1-13), (2019).
- Dennis Dobler, Sarah Friedrich, Markus Pauly, Nonparametric MANOVA in meaningful effects, Annals of the Institute of Statistical Mathematics, 10.1007/s10463-019-00717-3, (2019).
- Christina Boeck, Anja M. Gumpp, Alexandra M. Koenig, Peter Radermacher, Alexander Karabatsiakis, Iris-Tatjana Kolassa, The Association of Childhood Maltreatment With Lipid Peroxidation and DNA Damage in Postpartum Women, Frontiers in Psychiatry, 10.3389/fpsyt.2019.00023, 10, (2019).
- Georg Zimmermann, Markus Pauly, Arne C Bathke, Small-sample performance and underlying assumptions of a bootstrap-based inference method for a general analysis of covariance model with possibly heteroskedastic and nonnormal errors, Statistical Methods in Medical Research, 10.1177/0962280218817796, (096228021881779), (2019).
- Paul Blanche, Michael W Kattan, Thomas A Gerds, The c-index is not proper for the evaluation of $t$-year predicted risks, Biostatistics, 10.1093/biostatistics/kxy006, 20, 2, (347-357), (2018).
- Edgar Brunner, Arne C. Bathke, Frank Konietschke, Edgar Brunner, Arne C. Bathke, Frank Konietschke, Designs with Three and More Factors, Rank and Pseudo-Rank Procedures for Independent Observations in Factorial Designs, 10.1007/978-3-030-02914-2_6, (333-355), (2018).
- Conghua Cheng, Yiming Liu, Zhi Liu, Wang Zhou, Balanced augmented jackknife empirical likelihood for two sample U-statistics, Science China Mathematics, 10.1007/s11425-016-9071-y, 61, 6, (1129-1138), (2018).
- Alexandra M. Koenig, Laura Ramo-Fernández, Christina Boeck, Maria Umlauft, Markus Pauly, Elisabeth B. Binder, Clemens Kirschbaum, Harald Gündel, Alexander Karabatsiakis, Iris-Tatjana Kolassa, Intergenerational gene × environment interaction of FKBP5 and childhood maltreatment on hair steroids, Psychoneuroendocrinology, 10.1016/j.psyneuen.2018.04.002, 92, (103-112), (2018).
- G. Amorim, O. Thas, K. Vermeulen, S. Vansteelandt, J. De Neve, Small sample inference for probabilistic index models, Computational Statistics & Data Analysis, 10.1016/j.csda.2017.11.005, 121, (137-148), (2018).
- Maarten De Schryver, Ian Hussey, Jan De Neve, Aoife Cartwright, Dermot Barnes-Holmes, The PI IRAP : An alternative scoring algorithm for the IRAP using a probabilistic semiparametric effect size measure, Journal of Contextual Behavioral Science, 10.1016/j.jcbs.2018.01.001, 7, (97-103), (2018).
- Huazhen Lin, Fanyin Zhou, Qiuxia Wang, Ling Zhou, Jing Qin, Robust and efficient estimation for the treatment effect in causal inference and missing data problems, Journal of Econometrics, 10.1016/j.jeconom.2018.03.017, 205, 2, (363-380), (2018).
- Debajit Chatterjee, Uttam Bandyopadhyay, Testing in nonparametric ANCOVA model based on ridit reliability functional, Annals of the Institute of Statistical Mathematics, 10.1007/s10463-017-0643-8, (2018).
- Maja Pohar Perme, Damjan Manevski, Confidence intervals for the Mann–Whitney test, Statistical Methods in Medical Research, 10.1177/0962280218814556, (096228021881455), (2018).
- Dennis Dobler, Markus Pauly, Bootstrap- and permutation-based inference for the Mann–Whitney effect for right-censored and tied data, TEST, 10.1007/s11749-017-0565-z, 27, 3, (639-658), (2017).
- David J. Hand, Measurement: A Very Short Introduction —Rejoinder to discussion , Measurement: Interdisciplinary Research and Perspectives, 10.1080/15366367.2017.1360022, 15, 1, (37-50), (2017).
- Keming Yu, Xi Liu, Rahim Alhamzawi, Frauke Becker, Joanne Lord, Statistical methods for body mass index: A selective review, Statistical Methods in Medical Research, 10.1177/0962280216643117, 27, 3, (798-811), (2016).
- Edgar Brunner, Frank Konietschke, Markus Pauly, Madan L. Puri, Rank‐based procedures in factorial designs: hypotheses about non‐parametric treatment effects, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 10.1111/rssb.12222, 79, 5, (1463-1485), (2016).
- Jan De Neve, Olivier Thas, A Mann–Whitney type effect measure of interaction for factorial designs, Communications in Statistics - Theory and Methods, 10.1080/03610926.2016.1263739, 46, 22, (11243-11260), (2016).
- Carsten Jentsch, Anne Leucht, Bootstrapping sample quantiles of discrete data, Annals of the Institute of Statistical Mathematics, 10.1007/s10463-015-0503-3, 68, 3, (491-539), (2015).
- Jan De Neve, Olivier Thas, A Regression Framework for Rank Tests Based on the Probabilistic Index Model, Journal of the American Statistical Association, 10.1080/01621459.2015.1016226, 110, 511, (1276-1283), (2015).
- Maarten J. Bijlsma, G-computation might be used to control for confounding when estimating the population-level impact of interventions through outcome distribution curves, Journal of Clinical Epidemiology, 10.1016/j.jclinepi.2014.06.005, 67, 11, (1286), (2014).
- Inna Feldman, Anna Sarkadi, Filipa Sampaio, Michael P. Kelly, Response to Invited Commentary: Methods to address control for confounding and nonperfect randomization when using outcome distribution curves to estimate the population-level impact of a public health intervention, Journal of Clinical Epidemiology, 10.1016/j.jclinepi.2014.06.006, 67, 11, (1286-1288), (2014).
- J. De Neve, J. Meys, J.-P. Ottoy, L. Clement, O. Thas, unifiedWMWqPCR: the unified Wilcoxon-Mann-Whitney test for analyzing RT-qPCR data in R, Bioinformatics, 10.1093/bioinformatics/btu313, 30, 17, (2494-2495), (2014).
- Chunpeng Fan, Donghui Zhang, Wald-type rank tests: A GEE approach, Computational Statistics & Data Analysis, 10.1016/j.csda.2013.12.004, 74, (1-16), (2014).
- Sofie Vindevogel, Maarten de Schryver, Eric Broekaert, Ilse Derluyn, War-related experiences of former child soldiers in northern Uganda: comparison with non-recruited youths, Paediatrics and International Child Health, 10.1179/2046905513Y.0000000084, 33, 4, (281-291), (2013).
- Jan De Neve, Olivier Thas, Jean-Pierre Ottoy, Goodness-of-Fit Methods for Probabilistic Index Models, Communications in Statistics - Theory and Methods, 10.1080/03610926.2012.695851, 42, 7, (1193-1207), (2013).




