Exploring the existence of a stayer population with mover–stayer counting process models: application to joint damage in psoriatic arthritis

Summary Many psoriatic arthritis patients do not progress to permanent joint damage in any of the 28 hand joints, even under prolonged follow‐up. This has led several researchers to fit models that estimate the proportion of stayers (those who do not have the propensity to experience the event of interest) and to characterize the rate of developing damaged joints in the movers (those who have the propensity to experience the event of interest). However, when fitted to the same data, the paper demonstrates that the choice of model for the movers can lead to widely varying conclusions on a stayer population, thus implying that, if interest lies in a stayer population, a single analysis should not generally be adopted. The aim of the paper is to provide greater understanding regarding estimation of a stayer population by comparing the inferences, performance and features of multiple fitted models to real and simulated data sets. The models for the movers are based on Poisson processes with patient level random effects and/or dynamic covariates, which are used to induce within‐patient correlation, and observation level random effects are used to account for time varying unobserved heterogeneity. The gamma, inverse Gaussian and compound Poisson distributions are considered for the random effects.


Introduction
In psoriatic arthritis, several researchers (Aguirre-Hernández and Farewell (2004), Solis-Trapala and Farewell (2005) and recently O' Keeffe et al. (2012)) have considered the existence of a stayer population (those who do not have the propensity to experience the event of interest) with regard to clinical joint damage, after having discussed the clinical plausibility of such a population with Professor Dafna Gladman, who has established the largest and most comprehensively studied cohort of psoriatic arthritis patients in the world. If a subpopulation of stayers exists, the identification of characteristics that are associated with being a stayer would provide further understanding of the disease, as would accounting for such a population when characterizing the rate of developing damaged joints in the movers (those who have the propensity to experience the event of interest). Since these studies, there has been even more empirical evidence to suggest that a stayer population may exist; many patients have remained damage free even after further follow-up. Thus it now seems particularly appropriate to provide a more comprehensive investigation into the existence of a stayer population.
Because the outcome is the occurrence of permanently damaged joints over time, mover-stayer counting process models will be considered as a useful framework for analysis. In the literature, the models for the movers are typically constructed based on Poisson processes with patient or observation level random effects; these act multiplicatively on the intensity for greater flexibility. In addition to these commonly proposed models, models with both patient and observation level random effects and where the random effects are of a compound Poisson (CP) form will also be considered. This paper demonstrates that, when fitted to the same data (real psoriatic arthritis data and simulated data), these models can provide widely varying conclusions on the existence and proportion of a stayer population, thus demonstrating that, as is commonly done in the literature, a single analysis of a stayer population should not generally be adopted uncritically. In light of these results and given the prevalence of longitudinal count data with excess 0s, this paper aims to provide greater understanding of mover-stayer counting process models, particularly relating to estimation of the stayer proportion, by comparing the inferences, performance and model features of many models fitted to real and simulated data sets. We note that these results are analogous to those obtained on cure rate models (Boag, 1949;Farewell, 1977Farewell, , 1982Farewell, , 1986Gordan, 1990;Ghitany et al., 1994) where it has been shown that the estimate of the cured fraction can be quite sensitive to the choice of the survival distribution (Yu et al., 2004). Aguirre-Hernández and Farewell (2004) and Solis-Trapala and Farewell (2005) have previously used mover-stayer counting process models to investigate the existence of a stayer population that does not develop damaged joints. They considered models for the movers based on introducing independent gamma random effects into Poisson process (Kingman, 1992) models at the observation and patient level respectively. By comparing inferences with their respective non-mover-stayer model, neither of these investigations found any convincing evidence for a stayer population. The former analysis was performed at the total joint level, using the overall total joint count, whereas the latter was performed at the total joint level, using only the counts for the hand joints. O' Keeffe et al. (2012) took a different approach based on the use of mover-stayer multistate models at the individual hand joint level. They considered several patient level random-effects distributions, the gamma, inverse Gaussian (IG) and CP distribution, and demonstrated large discrepancies in the estimates for the probability of being a stayer (in the hands) across the different random-effects distributions. In particular, the use of patient level gamma random effects in the model for the movers produced little evidence for a stayer population, whereas more convincing evidence was found when patient level IG and CP random effects were used instead. Although the analyses performed by Aguirre- Hernández and Farewell (2004), Solis-Trapala and Farewell (2005) and O' Keefe et al. (2012) were on different versions of the same data set and the modelling approach of O' Keeffe et al. (2012) was different (multistate models as opposed to counting process models), the discrepancies across the various randomeffects distributions may suggest that the results of Aguirre-Hernández and Farewell (2004) and Solis-Trapala and Farewell (2005) were linked to the choice of gamma random effects and that, if a different random-effects distribution was chosen, vastly different results and conclusions may have occurred. This paper builds on this important issue (different random-effects distributions providing vastly different conclusions on a stayer population) observed by O'Keeffe et al. (2012) by also considering other random-effects structures, so that the results are more general or relevant to the count data setting, and additionally considering model performance and simulation studies to provide greater understanding.
The next section introduces the psoriatic arthritis data on which this analysis is based.

Psoriatic arthritis and the data
Psoriatic arthritis is an inflammatory arthritis associated with psoriasis. A basic measure of disease progression in psoriatic arthritis is the number of permanently damaged joints. Apart from the damage process being permanent (once a joint becomes damaged it will remain so), damaged joints can ultimately lead to disability and a reduced quality of life (Husted et al., 2001;Kane et al., 2003). It is therefore important to prevent or slow the damage process where possible. The main strategy employed by clinicians is to reduce activity in the joints (swelling and/or pain in the joints) as this is widely believed to result in or to cause joint damage (O'Keeffe et al., 2011). This paper focuses on the 28 joints in the hands (14 joints in each hand), which can result in severe disability if damaged. Analyses will be based on data from the University of Toronto psoriatic arthritis clinic which, since its inception in 1978 until the start of 2013, has followed over 1000 patients with clinic visits scheduled approximately 6-12 months apart. At each clinic visit, a physical examination, routine blood and urine tests and a rheumatological assessment which includes a count of active and damaged joints are performed.
To produce a more homogeneous set of patients, we considered the 757 patients who entered the clinic with no damaged hand joints and had more than a single clinic visit. Of these patients 422 (55.75%) were male and 335 were female (44.25%). At clinic entry, the mean age was 42.19 years, with a standard deviation of 12.48 years. The mean and median number of clinic visits per patient were 11.27 and 7, and this ranged from 2 to 57. The mean age at onset of arthritis was 36.76 years, with a standard deviation of 13 years. The mean follow-up time was 9.46 years, with an interquartile range of 11.15 years. The mean and median intervisit times were 0.84 and 0.54 years, with a standard deviation of 1.19 years.
While in the clinic, a large percentage, 72% (524 patients), of these patients remained damage free in the hand joints. Of the patients (233 patients) who developed damaged joints, the mean rate of gaining damage was 0.53 joints per year (total number of damaged joints at the last clinic visit or follow-up time in the clinic).
For this subset of the data, we consider models to estimate the proportion of stayers and to characterize the rate of developing damage in the movers. Although the development of damaged hand joints is not formally a recurrent events process (because there is a finite number of hand joints), the models for the movers are based on Poisson processes, as an approximation, with various random-effects structures and distributional assumptions. Heuristically, recurrent events methodology was considered as a reasonable approximation because there are few occasions where a large number of damaged hand joints have been observed, and therefore these observations will contribute less in the analysis. More formal justifications are provided in Sections 3.4 and 5.2.

Patient level random-effects models
Let D ij = N ij+1 − N ij be the number of damaged joints that patient i has developed between the jth and .j + 1/th clinic visit. A natural first choice would be to assume that D ij is Poisson distributed with mean where u i is a realization of the patient level random effect U i which induces correlation between the observations of a patient, {t ij } m i j=0 are the times at which the clinic visits occurred, λ 0 is a constant baseline intensity and β and z ij are column vectors of regression coefficients and covariates evaluated at the jth clinic visit respectively. To account for a subpopulation of stayers, the distribution of U i is taken to have a mixture distribution. Specifically the mover-stayer random effects densities for U i are of the form where π i is the probability that patient i is a stayer and f.u i / is a truncated random-effect density which integrates to 1 − π i when patient i is a mover. The corresponding marginal likelihood contribution from patient i is then where c iAE = 0 if patient i remained damage free while in the clinic and c iAE = 1 otherwise. The likelihood is then constructed by taking the product of all likelihood contributions from patients in the data set. Models corresponding to a likelihood of this form will be referred to as Poisson mover-stayer models and further qualification, when needed, will be through the addition of the type of random-effects distribution used. Subsequently, the three different random-effects distributions that were used by O'Keeffe et al. (2012) will be considered. The first two random-effects distributions are of the form where g.u i / has either a gamma density with rate and shape parameter 1=θ or an IG density with mean 1 and shape parameter ψ, i.e.
These two distributions will be referred to as the mover-stayer gamma and mover-stayer IG models respectively. The third mover-stayer random-effects density is a CP density of the form where N i is a Poisson random variable with rate parameter ρ i and X j (j = 1, : : : , N i ) are independently and identically distributed gamma random variables with shape and rate parameters 1 and ν respectively. The density is given by where I 1 .h/ is a modified Bessel function of the first kind, i.e.
: Aalen (1992) and Moger (2004Moger ( , 2005 have provided applications of the CP distribution to survival studies. Another commonly used random-effects distribution for the positive part is the log-normal distribution. It can, however, be shown that this distribution is very similar to an IG distribution and hence we do not consider its use here as it affords less tractability. See Chhikara and Folks (1977) for more details.
It is also worth noting that there is a more general distribution which has the gamma, IG and CP distributions nested in its family. The power variance family function is a three-parameter family that was introduced by Hougaard (1986) and extended in Hougaard (2000). Its density and Laplace transform are where m > −1, ν > 0 and mρ > 0. When ρ → ∞ and m → 0 at a rate such that ρm → η, the gamma distribution results. For the cases where m = − 1 2 and m > 0, the IG and CP distributions arise. In certain more tractable situations, i.e. when its Laplace transform is only required, the parameter m can be estimated directly, thus providing information on the existence (m > 0) or non-existence (m 0) of a stayer population. Note that the special case of m = 1 corresponds to the CP distribution that is used in this paper.

Observation level random-effects models
The Poisson mover-stayer models are derived by conditioning on a single random effect for each patient to give conditionally independent contributions from a patient's data. This induces a correlation structure between the observations of a patient. This patient-specific random effect U i might be thought of as accounting for missing time invariant variables that generate heterogeneity across patients. It is, however, plausible that such unobserved variables are not constant with time, especially when the follow-up periods are as long as in the psoriatic arthritis data set. For example a patient's prescribed medication, which is administrated at each clinic visit, can dramatically change with time. Another commonly adopted approach would be to retain an individual-specific probability of being at no risk of damage, π i , but to condition each observation from patients at risk on a different random effect U ij drawn from an appropriate distribution. This would allow the unobserved heterogeneity across patients to vary with time also.
If this approach is adopted, some adjustment for the correlation between the observations on the same patient, if they are at risk of damage, can be induced by inclusion of the observed dynamic variable N ij , which specifies the attained number of damaged joints at the jth visit. Thus the assumption becomes one of independence between incremental damage observations from the same patient given this observed dynamic variable and the patient being in the at-risk group.
More formally, D ij is assumed Poisson distributed with mean where c i = 1, with probability 1 − π i , 0, with probability π i , indicates whether the ith patient is a mover (c i = 1) or stayer (c i = 0). The corresponding marginal likelihood contribution from patient i is then The dynamic variable N ij enters through being an element in z ij . Note that the integration over the random effects that is required in the marginal likelihood contribution of patient i is performed at the observation structural level and therefore m i integrations are required. This contrasts with the marginal likelihood for the Poisson mover-stayer models which contained a patient level random effect for which a single integration was performed over the contributions from all observations on a single patient. Although these models are again of the mover-stayer structure, they will be referred to as zero-inflated models for consistency with the literature. Note that the 0-inflation is at the patient and not observation level (i.e. c i and not c ij is considered) and therefore a patient can only be either a stayer or a mover throughout their follow-up. Further qualification of the zero-inflated models will again be through the marginal models that are used for patients in the at-risk category.

Model fitting
For the Poisson mover-stayer models, the gamma and IG distributional parts of the moverstayer random-effects distributions were parameterized to have unit means to avoid identifiability problems with the baseline intensity. An alternative, but mathematically equivalent, approach to avoid non-identifiability was taken for the CP random-effects distribution which has an expectation of ρ i =ν; ρ i and ν were allowed to vary freely with λ 0 constrained to 1. The three Poisson mover-stayer models corresponding to these three random-effects distributions are referred to as the Poisson mover-stayer gamma, Poisson mover-stayer IG and the Poisson mover-stayer CP models. The zero-inflated models were examined when g.u ij / was taken separately to be gamma and IG distributions, resulting in the zero-inflated negative binomial (ZINB) and zero-inflated Poisson IG (ZIPIG) models as they are commonly known in the literature. The same parameterizations for the gamma and IG distributions from Section 3.1 were used, although their rate and shape parameters are denoted by θ nb ij and ψ pig ij respectively. Here the subscripts on the dispersion parameters indicate the possible dependence on explanatory variables from the ith patient at t ij . The CP distribution was not considered as an observation level random-effects distribution because a mass at zero exists in this distribution and hence would represent the unrealistic scenario that a patient can be a stayer or mover at one point in time and a mover or stayer at another. This is unlike the case when patient level random effects are considered, where a single random effect specifies whether a patient is a mover or stayer.
Parameter estimation for these and all subsequent models in this paper was achieved by maximizing the log-likelihood function by using the 'Broyden-Fletcher-Goldfarb-Shanno' optimization technique (Broyden, 1970) in the statistical package R (R Core Team, 2015). The analytic solutions to the integrals in the likelihood for the random-effects distributions that are considered in this section can be found in the on-line appendix A. The confidence intervals for all parameters and quantities reported are 95% Wald intervals obtained from evaluating and then inverting the observed information matrix at the maximum likelihood estimates.
The number of currently active (swollen and/or painful) joints, permanently attained damaged joints, age at onset of arthritis (in years) and duration of arthritis (in years) were considered as explanatory variables for the mean functions, and intercept-only models were considered for the dispersion parameters, i.e. θ nb ij = θ nb and ψ pig ij = ψ pig . All models were fitted with π i = π so that the existence of a stayer population could be more simply investigated, specifically, through testing the null hypothesis H 0 : π = 0 for the Poisson mover-stayer gamma, Poisson mover-stayer IG and the zero-inflated models. Under the null hypothesis, the asymptotic distribution of the likelihood ratio test for these models (against their non-mover-stayer counterpart) is a 50:50 mixture of a point mass at zero and a χ 2 1 -distribution (Self and Liang, 1987). A test of H 0 : π = 0 for the Poisson mover-stayer CP model is equivalent to testing H 0 : exp.−ρ/ = 0 (or H 0 : ρ = ∞). However, under this null hypothesis, the parameter ν becomes irrelevant (see the on-line appendix B for details), which therefore results in the asymptotic distribution of the likelihood ratio statistic being intractable. For this model, consideration will initially be given to the 95% Wald interval ofπ to reason about the existence of a stayer population. It is also worth noting that models where π i depends on explanatory variables can be developed. However, this would make sense only if reasonable evidence exists for a subpopulation of stayers. Table 1 shows the results from the Poisson mover-stayer (patient level random-effects) and zeroinflated (observation level random-effects) models fitted to the Toronto psoriatic arthritis data described in Section 2. Within the class of Poisson mover-stayer and zero-inflated models, the regression coefficient estimates are quite similar; most estimates lie in the corresponding 95% Wald interval of the other models. The current number of active joints, duration of arthritis and age at onset of arthritis demonstrate significant positive associations with damage progression in the Poisson mover-stayer models, whereas the current number of active joints is seen to have a significant positive association in the zero-inflated models. The Poisson mover-stayer models suggest that the attained number of damaged joints has a negative association with damage progression whereas the zero-inflated models suggest a positive association. After accounting for correlation through the multiplicative patient level random effect, it is natural to postulate that the negative association is indicative of there being fewer joints having the propensity to become damaged. This would suggest that different results may have been obtained if the number of joints that are at risk of damage was accounted for in the model (beyond inclusion of this dynamic covariate). For this reason, truncated Poisson models and multistate models at the individual joint level were fitted; these models directly account for the number of joints that are at risk of damage at each time point. Reassuringly, similar results were produced including the significant negative or positive association of the attained number of damaged joints when patient or observation level random effects were included. As mentioned, the attained number of damaged joints was primarily introduced to provide information about the history of a patient due to true contagion and is therefore introducing correlation between the patient observations. As the patient level random effect is also designed to reflect partly this type of correlation (in addition to capturing time invariant unobserved heterogeneity), the effect of this dynamic covariate will probably be confounded with the random effects. For a detailed discussion on the connection between frailty models and models with dynamic covariates see Aalen et al. (2008). Fig. 1 shows plots of the profile log-likelihoods for π. From Fig. 1, the profile log-likelihood for the Poisson mover-stayer gamma model is seen to be a monotonically decreasing function, which implies that the maximum is attained at the boundary, in particular at π = 0, and not at the value that is produced from the numerical optimization procedure (reported in Table 1). The Poisson mover-stayer gamma model therefore suggests the non-existence of a stayer population. As the optimization procedure did not converge at the maximum, a more relevant confidence interval (as opposed to a Wald interval) can be computed from the profile likelihood based on values of π in which the null hypothesis H 0 : π = 0 cannot be rejected. Such an interval was calculated as .0, 0:063/. The optimization routine for the other models (Poisson mover-stayer IG and CP models and zero-inflated models) seem to have converged at the maximums of their respective profile log-likelihoods. Furthermore, as the stayer proportions and their respective confidence intervals are estimated far from zero, these models are consistent with a stayer population. A likelihood ratio test of H 0 : π = 0 (as described in Section 3.3) resulted in p < 0:001 for the Poisson mover-stayer IG and zero-inflated models therefore indicating convincing evidence for a stayer population. From Table 1, it can, however, be seen that the stayer proportion estimates are widely varying across these models.

Model performance
Because the fitted models are non-nested, standard asymptotic theory does not support formal hypothesis testing through comparison of likelihood values. Nevertheless, the zero-inflated models are seen to have vastly larger likelihood values than the Poisson mover-stayer models. This would indicate, informally, that the zero-inflated models provide a much improved fit relative to the Poisson mover-stayer models. If, however, the goodness of fit to the data is of specific, standalone, interest (in addition to providing a means of model comparison), likelihood values are less informative. Therefore, we compare the observed and estimated increments of damaged joints. Let be the estimated probability that patient i develops d additional damaged joints between t ij and t ij+1 . HereΛ ij andπ are the maximum likelihood estimates that were obtained in Section 3.4. For the Poisson mover-stayer models, D ij is assumed to have a Poisson distribution conditional on u i . To obtain values of e ij .d/ for these models, u i is estimated by using the empirical Bayes estimate, specifically,û whereφ is the maximum likelihood estimate for the set of parameters of the random-effects distribution and f.u i |λ 0 ,β,φ, z i , t i / denotes the predictive distribution of u i , which takes the form Here L i .λ 0 ,β|u i ; z i / is the conditional likelihood contribution from the ith patient. For the zero-inflated models, D ij is assumed to have a Poisson distribution conditional on u ij and c i . Hence D ij has a negative binomial (NB) or Poisson IG distribution conditional on c i if u ij is drawn from a gamma or IG distribution respectively. Values of e ij .d/ can then be obtained by using these marginal distributions and estimating c i by its empirical Bayes estimate, i.e.
The observed and estimated changes in joint counts are displayed in Table 2. It is evident from Table 2 that none of the Poisson mover-stayer models provide particularly close agreements between the observed and estimated values. The category with increments of one damaged joint is considerably overestimated by all three models which then results in the majority of categories with larger increments of damaged joints being severely underestimated. However, the zero-inflated models demonstrate much closer agreement, as also suggested by the likelihood values, which may indicate that time varying unobserved heterogeneity is strongly present in the data. The category corresponding to an increment of one damaged joint is overestimated to a much lesser degree by these models and as a result categories with larger incremental damage are underestimated less severely. One general statistic that accounts for the overall model The Pearson statistic distribution under the null hypothesis that the fitted models are the data-generating mechanisms can be used to assess more formally the goodness of fit of the fitted models. The exact form of this distribution is obtained from summing the independent but non-identically distributed e ij .d/s which are of a multinomial form. This, however, results in an intractable distribution. Alternatively, this distribution can be estimated by using a parametric bootstrap algorithm such as that described in Aguirre-Hernández and Farewell (2004), in which a bootstrap data set s was simulated by first simulating c s i from a Bernoulli distribution with success probability 1 −π for each patient. If the realized value c s i was 0, the patient was a simulated stayer and therefore the simulated values d s ij were 0 for all observed time intervals for this patient. Otherwise, each value d s ij was simulated givenΛ ij andθ nb from an NB orψ pig from a Poisson IG model. To simulate this d s ij from these distributions, a value u s ij from either a gamma or IG distribution with parameterθ nb orψ pig is first sampled; then d s ij is drawn from a Poisson distribution with mean u s ijΛ ij . The Pearson statistic can then be calculated for the bootstrap data set by refitting the same model structure onto this data set and calculating the corresponding o s .d/ and e s .d/ for all d.
This procedure was repeated 1000 times, hence producing 1000 realizations of the Pearson statistic under the null hypothesis that the fitted model is the underlying data-generating mechanism. The p-values for the models proposed are then calculated as being the proportion of these realizations greater than the Pearson statistic evaluated on the observed data set. The p-values for the ZINB and ZIPIG models were calculated as 0.098 and less than 0.001 with the means of the bootstrap goodness-of-fit distribution being 16.56 and 15.77 respectively. These results suggest that there is little evidence of lack of agreement between the observed and estimated incremental joint damage from the ZINB model. However, there is more evidence of lack of agreement from the ZIPIG model.

Model features influencing estimation of π
In the previous sections, and as is commonly done in the literature, the parameter π was used to investigate the existence and proportion of a stayer population as it represents a statistically coherent framework. For example, one could test H 0 : π = 0 or examine its estimated confidence interval. In regard to the PA data, widely varying estimates of this parameter resulted when different models for the movers were specified. This, at the very minimum, would suggest a single analysis of a stayer population through π should not be adopted uncritically and that greater consideration of this parameter would be worthwhile. This section considers why different values of π may have resulted by investigating the interpretability of such a parameter across the fitted models.
Across the fitted models, it would be natural to postulate that patients who are highly likely or unlikely to be stayers have similar stayer probability estimates and that the main discrepancies from these estimates have resulted from disentangling slow transitioning movers and stayers. This may suggest, because of the widely varying estimates of π, that the PA data contained some patients who were slow transitioning movers and that the fitted models distinguished very differently between slow transitioning movers and stayers. If this is so, it would be intuitive to examine features of the fitted models, especially the model structure for the movers which generate the possibility and proportion of slow transitioning movers.
In the motivating application, both the Poisson mover-stayer IG and CP models strongly suggest the existence of a stayer population. For these models, the continuous part of the patient level random-effects densities are such that g.u i / → 0 when u i → 0, regardless of the parameter ψ (IG), and g.u i / → ρν exp.−ρ/ as u i → 0 (CP). Thus it appears less likely that these distributions can place a large proportion of mass arbitrarily close to zero. If a slow transitioning mover population exists, the Poisson mover-stayer IG and CP models may therefore struggle to represent these patients adequately in the model for the movers, and hence these models may attribute slow transitioning movers with high stayer probabilities instead, i.e. the stayer proportions will be overestimated. Regarding the CP distribution, the parameter ρ, as discussed, governs both the overall baseline transition intensity ρ=ν and the stayer proportion exp.−ρ/. An overall slow or fast transition intensity, e.g. when there are many or few slow transitioning movers, as indicated by ρ will then also enforce a high or low stayer probability (through ρ) even if few or many stayers exist. This is not so for the other models as a slow or fast overall baseline transition intensity .1 − π/λ 0 through λ 0 will not force π to take a certain value.
The class of zero-inflated models also strongly suggests the existence of a stayer population. These models, unlike the others, do not contain patient level random effects and therefore are less able to represent specific patients with constantly low propensity of developing damage, after adjusting for covariates. Hence, for these zero-inflated models, representing an arbitrary number of slow transitioning movers may also be problematic, particularly more so than with patient level random-effect models, and this will again probably result in slow transitioning movers being attributed high stayer probabilities.
Of the fitted models, only the Poisson mover-stayer gamma model suggests the non-existence of a stayer population sinceπ = 0. This model was constructed by using patient level gamma random effects in the model for the movers. Under the current parameterization for U i Λ ij , if θ > 1, the gamma distribution is such that g.u i / → ∞ as u i → 0, and therefore this distribution can place a large proportion of mass arbitrarily close to zero. Implicitly, the gamma distribution can represent a large proportion of patients with small transition intensities, namely the slow transitioning movers. In contrast with the other models, the Poisson mover-stayer gamma model would seem to be at less risk of overestimating the stayer proportion; the stayer probabilities for the slow transitioning movers are less likely to be overestimated. However, if sufficient followup information is not available, it is clear that identifiability issues will arise because stayers can be reasonably represented either through π or the left-hand tail of the gamma distribution. In the motivating application, the fitted Poisson mover-stayer gamma model was such that θ = 6:19 .5:11, 7:5/ with a small average estimated transition intensity. Thus this model seems to have inferred a slow transitioning mover population, instead of attributing these patients as stayers.
In summary, although π is defined as those with no propensity to develop damage (i.e. λ = 0), this parameter seems to be heavily dependent on the model for the movers, at least for the fitted models, perhaps because it generates the possibility and proportion of slow transitioning movers. At a minimum, this would suggest that a more appropriate interpretation of π would be the proportion of patients who are at no or at least minimal risk of damage, which cannot be adequately explained by the fitted model for the movers, although the fitted model for the movers may not necessarily fit well. This would be particularly relevant for the motivating application, especially when considering the differences between the Poisson mover-stayer gamma model and the others, because the model structures for the movers specified vastly different probability distributions for the risk of damage. 0.3 (0.039, 0.04) -0 †The second column displays the mean and standard deviation SD of the estimated values of π together with the mean standard error SE. The third column displays the mean and standard deviation of the estimated difference between π from each model and π from the true model. The fourth column displays the percentage of simulations whereπ ≈ 0.

Simulation study
To formalize ideas and to investigate estimation of π further by using the considered (and commonly advocated) models, we perform some simulation studies. These studies will represent the scenarios where there are few or many slow transitioning movers in addition to a stayer population. In the first scenario, where few patients are slow transitioning movers, a lesser distinction between estimated values of π is expected, although possibly not for the Poisson mover-stayer CP model. In the second scenario, where many slow transitioning movers exists, there should be a greater distinction between the estimates of π, specifically between the Poisson mover-stayer gamma model and the other models which are less able to represent an arbitrary number of slow transitioning movers through the left-hand tail of their mover distribution.
In scenario 1, we simulate from the Poisson mover-stayer gamma, Poisson mover-stayer IG, Poisson mover-stayer CP, ZINB and ZIPIG models with no covariates and {π = 0:3, λ =  there would be more slow transitioning movers than in scenario 1. Simulations from the Poisson mover-stayer models were performed by first simulating u s i for each patient. The damage joints increments d s ij were then simulated from a Poisson distribution with mean u s i Λ ij . For the Poisson mover-stayer gamma and IG models, u s i was obtained from c s i v s i where c s i is a realization from a Bernoulli distribution with success probability 1 − π and v s i is a realization from either a gamma or IG distribution with mean 1 and variance θ or 1=ψ respectively. For the Poisson moverstayer CP model, u s i was obtained by simulating from a gamma distribution with rate ν and shape n s i where n s i is generated from a Poisson distribution with mean ρ. Simulations from the zero-inflated models were performed by simulating d s ij from a Poisson distribution with mean c s i u s ij Λ ij where c s i is as before and u s ij is a realization from either a gamma or IG distribution with mean 1 and variance θ nb or 1=ψ pig respectively.
In both scenarios, 1000 data sets were simulated with each containing 200 patients. Each patient was assumed to have 11 clinic visits annually. The Poisson mover-stayer and zero-inflated models were then fitted to each data set.
Tables 3 and 4 display the results of the simulation study under scenario 1 and scenario 2 respectively. Only results on estimates of π are displayed although the baseline intensities and dispersion parameters were also estimated. In both scenarios, there were some similar characteristics to those seen in the motivating application. In particular, the zero-inflated models generally provide the highest stayer proportion estimates, and furthermore the estimates were similar across the ZINB and ZIPIG models. However, across both scenarios, it is clear that the stayer proportion estimates are considerably more biased in scenario 2, as indicated by the difference in stayer proportion estimates from each model and the true model (the third column in Tables 3 and 4). This is especially so when the Poisson mover-stayer gamma and IG models are the true model. These results suggest that the model for the movers is particularly influential in the estimation of π when there are many slow transitioning movers. Indeed, no model, on average, could accurately estimate the true stayer proportion in all the settings in scenario 2. A noteworthy observation can be made when the zero-inflated models are the true models. In this case, the Poisson mover-stayer CP model estimates considerably smaller stayer proportions than the other models which are in close agreement with one another. The zero-inflated models in these simulation settings cannot generate patient-specific propensities for damage since patient level random effects or dynamic covariates are not included in these models. Moreover, they can generate only independent observation-specific propensities of gaining damage. Therefore, there is a lesser need for the patient level random effects in the Poisson mover-stayer models, which is reflected through a larger estimated value ofν (equivalently the continuous part of the CP distribution is estimated to have smaller variance) in the Poisson mover-stayer CP model. If the overall baseline intensity in the Poisson mover-stayer CP model,ρ=ν, is to be similar to the other models,ρ will then be required to take a larger value. This causes the estimated stayer proportion exp.−ρ/ to take a smaller value.
When the Poisson mover-stayer gamma and IG models are the true models in scenario 2, Table 4 indicates that the Poisson mover-stayer gamma model has estimatedπ ≈ 0 in 28:7% and 58:9% of simulated data sets respectively, whereas none of the other models estimateπ ≈ 0 for any of the same simulated data sets. Similarly, when the zero-inflated models are the true model in scenario 2, the Poisson mover-stayer gamma model provides the greatest proportion of estimates whereπ ≈ 0. As discussed, the Poisson mover-stayer gamma model has greater ability to represent an arbitrary number of slow transitioning movers, and thus more information, e.g. in the form of longer follow-up or a smaller proportion of slow transitioning movers (as in scenario 1), is required for the Poisson mover-stayer gamma model to identify a non-zero stayer proportion, if one exists, when compared with the other models. Conversely, as indicated in scenario 2, particularly when the Poisson mover-stayer gamma model is the true model, other models may greatly overestimate the stayer proportion with a relatively narrow confidence interval when many slow transitioning movers are contained in the data. This observation was also verified even when no stayer population exists (results from this simulation study are not shown). Overall, these results emphasize the difficulty of estimating a stayer proportion when many slow transitioning movers exists. Models which are more able to represent an arbitrary number of slow transitioning movers may be the most appropriate in estimating the stayer proportion, because they are at less risk of overestimating the stayer proportion. But with less informative data (because of many slow transitioning movers and limited follow-up) these models may suggest, in the extreme case, thatπ ≈ 0 even if a stayer population exists, as seen in scenario 2 where π = 0:3. Models which are less able to represent slow transitioning movers require significantly less informative data to identify a non-zero stayer proportion; however, such an estimate may be an overestimate of the true stayer proportion in the presence of slow transitioning movers.

Model description
The simulation studies demonstrated that widely varying stayer proportion estimates, as seen in the motivating application, may result from the choice of structure and distributional assumption of the random effects, the existence of many slow transitioning movers and lack of information in the data. Nevertheless, the lack of fit with the Poisson mover-stayer models provides an additional complicating factor in interpreting the models for this application. As the zero-inflated models provide a much improved fit to the data, which suggests that time varying unobserved heterogeneity could be present in the PA data, a natural progression would be to provide this same adjustment for the Poisson mover-stayer models. This, as before, can be achieved through the incorporation of observation level random effects. If this approach is adopted, the patient level random effects will primarily be designed to introduce correlation between patient observations, as opposed to previously, where it was also designed to capture unobserved heterogeneity.
Let U i and U ij be multiplicative patient level mover-stayer and observation level random effects respectively. Assume that D ij is Poisson distributed with mean Then, under the usual assumption that U i and U ij are independent, the marginal likelihood for patient i is where c i AE is as before and These models were examined with the three patient level mover-stayer random-effects distributions that were described in Section 3.1 and with the gamma distribution (with parameter θ nb as before) for the observation level random effects. The marginal likelihood was evaluated by first computing the integrations over the gamma observation level random effects, resulting in NB models conditional on U i , then using the integrate command to perform the integrations over the patient level random effects. These models will be referred to as NB mover-stayer models, and further qualification, when needed, will again be through the addition of the type of patient level random-effects distribution that is used. Note that the ZINB model is obtained from the NB mover-stayer gamma and NB mover-stayer IG models when θ and 1=ψ = 0 respectively, whereas the class of Poisson mover-stayer models is obtained when θ nb = 0. The same explanatory variables are again considered for regression purposes. Table 5 displays the results from fitting the three NB mover-stayer models. Covariate effects estimated from this model structure also remain relatively stable across different random-effects distributional assumptions. Furthermore, apart from the attained number of damaged joints, the covariate effects are comparable with those produced by the ZINB model. The number of damaged joints attained now demonstrates a much smaller negative effect size than in the Poisson mover-stayer models (when gamma observation level random effects are not included). This may have resulted because the attained number of damaged joints and the patient level random effects are more confounded than before; the patient level random effect is now primarily designed to introduce correlation between patient observations. This would also suggest that there is (at most) little effect of accounting for the number of joints at risk of damage for these models. Fig. 2, which displays the profile log-likelihood for π, demonstrates that the numerical optimization routine converged at the maximum of the profile log-likelihoods of the NB moverstayer IG and NB mover-stayer CP models, but not for the NB mover-stayer gamma model. The estimated value of π from the NB mover-stayer IG and CP models were 0:3 .0:18, 0:45/ and 0:34 .0:26, 0:43/ respectively and are therefore in much closer agreement than when observation level random effects were not included, i.e. compared with the difference between the Poisson mover-stayer IG and Poisson mover-stayer CP models. Both models again provide strong evidence for the existence of a stayer population through the size and confidence interval ofπ. A test of the null hypothesis H 0 : π = 0 for the NB mover-stayer IG model resulted in p < 0:001, thus providing convincing evidence to reject this null hypothesis. The maximum of the profile log-likelihood in Fig. 2 for the NB mover-stayer gamma model can again be seen asπ = 0, with a 95% likelihood ratio interval calculated as .0, 0:086/. Even after accounting for time varying unobserved heterogeneity, the NB mover-stayer gamma model suggests the non-existence of a stayer population and therefore its inferences through π are again vastly different from those from the NB mover-stayer IG and CP models.

Results
A (generalized) likelihood ratio test of H 0 : θ nb = 0 resulted in p < 0:001 for each of the fitted NB mover-stayer models, therefore suggesting that it is necessary to account for time varying unobserved heterogeneity. The large decrease in log-likelihood values when compared with the Poisson mover-stayer models also suggest a much improved fit to the data. A comparison of the observed and estimated incremental joint damage from these models now follows.

Comparison of the observed and estimated incremental joint damage
As in Section 3.5, e ij .d/ can be obtained through whereΛ ij andθ nb are the maximum likelihood estimates obtained in the previous subsection and whereû i (which is a function ofπ i ) is the empirical Bayes estimate of the random effect as described in Section 3.5. In parentheses are the Pearson statistic contributions. These are obtained by squaring the difference between the observed and estimated changes and then dividing by the estimated changes. Table 6 displays the observed and estimated incremental joint damage from the NB moverstayer models. These models demonstrate a much improved fit to the data when compared with the Poisson mover-stayer models, which is consistent with the respective likelihood values. From Table 6, it is interesting to observe that these models provide similar agreements between estimated and observed increments of damaged joints across all categories. In particular, there is some evidence that all three models still overestimate the category corresponding to one incremental joint damage. Overall, the Pearson statistic was calculated as 37.07, 38.06 and 35.89 for the fitted NB mover-stayer gamma, IG and CP models respectively.

Does a stayer population exist in the psoriatic arthritis data?
This paper was motivated by the desire to obtain empirical evidence to support the existence or non-existence of a stayer population who cannot develop damaged hand joints. For this purpose, we fitted multiple mover-stayer models to a comprehensive psoriatic arthritis data set and analysed their inferences, performance and features. As a result, reasonable fitting models and greater understanding about the interpretation of inferences were obtained. Overall, such an analysis has instilled greater confidence in the conclusions drawn.
When there are many slow transitioning movers (relatively to the number of patients), the results of Section 4 indicate that estimating the stayer proportion can be problematic. Unfortunately, this may have been so for the psoriatic arthritis data since similar results were observed to those in scenario 2 of the simulation study. In particular, there were large discrepancies in the estimated values of π between the fitted models with patient level gamma random effects and the other fitted models. The primary difficulty results from the other fitted models (those where the distribution of the patient level random effects, if included, were not chosen to be gamma) being less able to represent an arbitrary number of slow transitioning movers, as discussed in Section 4, and therefore their estimated value of π may not represent only the proportion of stayers but also the slow transitioning movers. Although the models with patient level gamma random effects, which are less likely to overestimate the stayer proportion, may suggest the non-existence of a stayer population even if a stayer population does exist; because π may become non-identifiable for these models when there are many slow transitioning movers. This was seen in the simulation study under multiple settings from scenario 2, particularly when the Poisson mover-stayer IG model was the true model and 58:9% of the simulated data sets produced values ofπ ≈ 0 even though the true value was π = 0:3. The widely varying stayer proportion estimates of 0 (0, 0.086), 0.31 (0.18, 0.45) and 0.34 (0.26, 0.43) from the NB mover-stayer gamma, IG and CP models respectively, which all provided reasonable fits to the psoriatic arthritis data, therefore suggest that there is inconclusive evidence to support the existence of a stayer population who cannot develop damaged hand joints. Indeed, longer follow-up may demonstrate that the non-zero stayer proportion estimates from the NB mover-stayer IG and CP models were in fact overestimates of the true stayer proportion, which may be 0, due to the slow transitioning movers, whereas the NB mover-stayer gamma model may have suggested the non-existence of a stayer population, even if one truly did exist, because of identifiability issues due to lack of follow-up information. It is, however, important to note that the NB mover-stayer models are in strong agreement regarding the existence of a minimal risk population, as suggested by the non-zero estimated values of π and its corresponding confidence interval from the NB mover-stayer IG and CP models as well as the estimated shape of the gamma distribution (θ = 3:11 .2:24, 4:33/) from the NB mover-stayer gamma model. Such a population, in itself, may be of most clinical interest especially because these patients are at a lower risk of developing immediate damage. Identifying patients in this subset can then be facilitated through introducing covariates into the model for π within the reasonable fitting models, as discussed in Section 3.3.

Discussion
Mover-stayer counting process models are commonly used in the literature. In many instances, a single model is fitted (sometimes including the non-mover-stayer version) and conclusions regarding a stayer population are drawn based solely on that model. This paper demonstrates that, when fitted to the same data, different mover-stayer models can lead to widely varying stayer proportion estimates even if all the models provide reasonable fits to the data, e.g. the fitted NB mover-stayer models in the motivating application. Thus, if such an analysis is adopted uncritically, specific conclusions on a stayer population may be reached even though different models fitted to the same data may produce vastly different results.
A model's ability to account adequately for slow transitioning movers is a worthwhile consideration when estimation of a stayer proportion is of interest, i.e. whether the model for the movers can adequately allow for an arbitrary number of patients who have constantly low propensity to experience the event of interest, if these patients exist. Section 4 provides a discussion for some of the most commonly used models. Models which are less able to account for an arbitrary number of slow transitioning movers may not be able to provide an accurate estimate of the stayer proportion, especially if there are many slow transitioning movers, because their estimated value of π may not only represent stayers, as intended, but also patients who have a minimal risk of the event of interest (which is not adequately accounted for in the model for the movers). Fitting multiple mover-stayer models which are more and less able to account for many slow transitioning movers can therefore contribute greater understanding or confidence regarding the composition of stayers and slow transitioning movers, which will probably be of clinical interest in many cases since neither are likely to develop immediate damage. The simu-lation study demonstrated a potential pitfall regarding the use of mover-stayer models on less informative data, e.g. because of relatively short follow-up, particularly the possibility that these models suggest the non-existence of a stayer population even if such a population exists. Such an observation may be seen as counterintuitive if the lack of sufficient information in the data is thought to manifest itself through large standard errors as opposed to greatly altering the point estimate. This consideration may therefore be useful, especially if it leads to more appropriate interpretation ofπ. It should be noted, as was clearly seen in the simulation study, that models which are more able to represent an arbitrary number of slow transitioning movers will probably require more informative data to suggest a non-zero stayer proportion, and therefore this potential pitfall will be most relevant for these models such as those with patient level gamma random effects. Additionally, it is useful to examine model goodness of fit as this facilitates understanding important characteristics of the distribution of events in the movers. Accounting for these characteristics, such as observation level unobserved heterogeneity, may result in more reasonable fitting models therefore instilling greater confidence in the inferences obtained, and may also provide more similar estimates of π between competing models.
This work considered practical models that can be fitted reasonably easily, because of closed form marginal likelihoods or marginal likelihoods that can be evaluated by using a single integration per patient, that accommodated important characteristics of the data, such as within-patient correlation and time varying unobserved heterogeneity. Thus this work considered models that have commonly been used, or are likely to be used in the future, to demonstrate potential pitfalls that may occur if a single analysis regarding the existence of a mover-stayer scenario is adopted uncritically. A natural extension would consist of developing more flexible models, e.g. by using random-effects structures that account for time varying unobserved heterogeneity, such as those proposed in Aalen et al. (2008) and Unkel et al. (2014). Most of these random-effects structures when used in conjunction with intermittently observed Poisson processes and where the follow-up is extensive, such as in the psoriatic arthritis data, will, however, result in an intractable marginal likelihood, and therefore further research will probably be required. This work also did not focus on predicting a new patient's outcomes but instead focused on understanding the outcomes from a specific set of patients. Thus the goodness-of-fit test proposed, which utilized in-sample prediction, did not penalize a model for overfitting the data. More appropriate model assessment criteria for prediction, if it is of interest, would involve out-of-sample prediction or other forms of penalty functions to identify overly complex models.
The results in this paper suggest that the models fitted, taken together, do not provide conclusive evidence for the existence of a stayer population who cannot develop damaged hand joints. However, there is convincing evidence of a subpopulation of patients who are at least at minimal risk of damage, where minimal risk is characterized by the fitted models. An interesting area of future work, at least from a clinical perspective, would be to characterize minimal risk directly on the basis of clinical considerations, perhaps through specifying a threshold for the damage progression rate, and then to develop models which can estimate the proportion of patients who belong to this user-specified characterization. At a minimum, this will ensure that the target of inference will be of clinical interest and perhaps estimation of π will be less model dependent because π will be characterized by using a range of values of the damage progression rate and not based solely on zero.