A new strategy for diagnostic model assessment in capture–recapture

Common to both diagnostic tests used in capture–recapture and score tests is the idea that starting from a simple base model it is possible to interrogate data to determine whether more complex parameter structures will be supported. Current recommendations advise that diagnostic tests are performed as a precursor to a model selection step. We show that certain well‐known diagnostic tests for examining the fit of capture–recapture models to data are in fact score tests. Because of this direct relationship we investigate a new strategy for model assessment which combines the diagnosis of departure from basic model assumptions with a step‐up model selection, all based on score tests. We investigate the power of such an approach to detect common reasons for lack of model fit and compare the performance of this new strategy with the existing recommendations by using simulation. We present motivating examples with real data for which the extra flexibility of score tests results in an improved performance compared with diagnostic tests.


Introduction
This paper considers model selection for capture-recapture data that are obtained from open populations of wild animals. Capture-recapture studies involve the capture and unique marking of individuals, which are then released into the population and subsequent attempts are made to recapture them. The resulting data can be recorded as individual encounter histories for each animal, which take the form of vectors with elements 0 and 1, indicating non-capture and capture respectively. The encounter history data can often be conveniently summarized in terms of an upper triangular matrix, which is known as an m-array, with elements m i,j denoting the number of individuals released at occasion t i and next recaptured at occasion t j , concatenated with a column vector with elements v i denoting the numbers of individuals released at occasion t i which were never captured again. The ith row of the matrix has a multinomial distribution with index R i denoting the number of individuals released at occasion t i , i = 1, : : : , T . We write m = {m ij } and v = {v i }.
The Cormack-Jolly-Seber (CJS) model is the benchmark model for such data when age structure is not considered. It is defined in terms of two sets of parameters: φ i is the probability that an individual that is alive at time t i survives until time t i+1 , and p i is the probability that an individual that is alive at time t i is captured at that time. We write φ = {φ 1 , : : : , φ T −1 } and p = {p 2 , : : : , p T }. The likelihood is then product multinomially distributed, over the rows of the m-array, defined by and η ij = 0 for i j. We define If all m ij > 0 ∀ i < j then the CJS model is parameter redundant with deficiency of 1, since φ T −1 and p T only ever occur in the cell probabilities as a product. The other parameters and this product have explicit maximum likelihood estimates; see for example , page 70. We note that, if some m ij = 0, for i < j, the parameter redundancy of the model may change; see for example Cole et al. (2012).
The likelihood of equation (1) can be factorized to give a term involving the model parameters and one which provides the distribution of data conditional on a set of sufficient statistics. The second of these terms may be used to assess model adequacy; see Davison (2003), page 177. Pollock et al. (1985) derived a goodness-of-fit test for the Jolly-Seber capture-recapture model. The CJS model is a special case of the Jolly-Seber model and thus this goodness-of-fit test is also a goodness-of-fit test for the CJS model. Burnham (1991) showed that the Jolly-Seber goodness-of-fit test can be expressed as the product of two conditionally independent terms, which lead to the diagnostic tests that are now known as test 2 and test 3. We describe these in detail in the next section. The diagnostic tests do not require any model fitting and it is thus recommended that these are performed as a preliminary step, before model selection, which may result in a simplified set of models for consideration-see Lebreton et al. (1992) and Pradel et al. (2005).
The CJS model of equation (1) has been extended in many directions, which creates a problem for model selection. Two generalizations which relate to the diagnostic tests which we shall encounter later are a model incorporating trap dependence and a model accommodating transient individuals, which are individuals which pass through the study area and are therefore encountered only once. Structurally the transience model is equivalent to a capture-recapture model with two age classes for survival, with all individuals marked as young.
The trap-dependent model is defined in terms of three sets of parameters: {φ i } and {p i } as before, and p Å i is the probability that an individual alive at time t i is captured at that time, given that it was also caught at occasion t i−1 . We write p Å = {p Å 2 , : : : , p Å T }. The likelihood is then a product multinomial distribution, over the rows of the m-array, defined by and η TD ij = 0 for i j and χ i = 1 − Σ T j=i+1 η TD ij . The standard m-array is not sufficient for fitting a model for transience. We generalize the m-array by defining m {0}ij to be the number of individuals that are captured for the first time at occasion t i and next recaptured at occasion t j and m {1}ij to be the number of previously captured individuals which are captured at occasion t i and next recaptured at occasion t j . v {0}i denotes the numbers of newly marked individuals that were released at occasion t i which were never captured again, and v {1}i denotes the numbers of previously marked individuals that were released at occasion t i which were never captured again. We write The transience model is then defined in terms of three sets of parameters: {φ i } and {p i } as before, and φ Å i is the probability that a newly marked individual that is alive at time t i survives until time t i+1 . We write φ Å = {φ Å 1 , : : : , φ Å T −1 }. The likelihood is then a product multinomial distribution, over the rows of the extended m-array, defined by In current practice, tests of whether trap dependence or transience are required within the model use appropriately constructed contingency tables, which have the benefit of reducing model fitting but the weakness of low power and disconnection from the parametric modelling framework. These tests can alternatively be considered as diagnostics regarding the omission of particular components or as steps in a selection procedure of which components should be included. Within this paper we propose alternative likelihood-based methods.
We might expect diagnostic tests to be related to score tests and we demonstrate that, whereas two important diagnostic tests are, others are not. In addition the two approaches that we compare within this paper differ in mode of application and thus have the potential to produce different results. Model selection procedures using these two approaches are compared in this paper, and clear conclusions result.
The methods that are proposed in this paper can be applied to any capture-recapture data; the approach is shown to be at least as good as existing methods, and in fact it often outperforms other approaches because of the improvement in statistical power.
Motivating examples are introduced in Section 2 and within Section 3 the connection between score and diagnostic tests is established. Section 4 describes the two model selection strategies and compares them by using simulation. The analyses of the two case-studies that are described in Section 2 are presented in Section 5 and the paper ends with discussion and recommendations in Section 6.
The programs that were used to analyse the data can be obtained from http://wileyonlinelibrary.com/journal/rss-datasets

Motivating examples
We consider two motivating capture-recapture data sets. The first is a large study of breeding great cormorants Phalacrocorax carbo sinensis from Denmark. The cormorant data have been fully analysed in Hénaux et al. (2007). The cormorants provide a complex case-study for which it is unknown a priori what behavioural traits may be exhibited by the population. The data consist of capture histories from 862 breeding birds, captured at an established single colony over a period of 11 breeding seasons. The cormorants are only initially captured at the time of marking and are then subsequently resighted in the breeding colony.
The second is a set of capture-recapture data on the humpback whale Megaptera novaeangliae population in the South Pacific. These data have been analysed by Madon et al. (2013). The capture-recapture data are compiled from genetic records and here we consider just the female genetic data for illustration, which have capture histories from 101 individuals, collected over a period of seven encounter occasions.
In both cases, identifying behavioural responses, such as transience or trap response, may provide important biological insight into the animals being studied. If such responses are ignored within a model, then biases would result in the estimates of the parameters of interest, and therefore it is essential to fit appropriate models to the data.

Equivalence of score tests and diagnostic tests
Diagnostic tests for capture-recapture data have become a standard preliminary tool before model fitting and consist of a number of contingency table tests based on summary statistics. They are commonly used because of readily available computer software, RELEASE, which can be run from within program MARK (White and Burnham, 1999) and U-CARE (Choquet et al., 2009). Once the preliminary diagnostic tests have been conducted, the traditional approach then relies on fitting all biologically plausible models (excluding those which have been ruled out by the diagnostic tests), comprising a model set which can be prohibitively large for successful implementation. An alternative step-up model selection strategy using score tests has been successfully used for ring recovery models (Catchpole and Morgan, 1996) and multistate capture-recapture models (McCrea and . For comparing nested models, score tests are asymptotically equivalent to likelihood ratio tests under the null hypothesis, but they are simpler in not requiring models to be fitted under the alternative hypothesis to conduct tests. See for example Morgan (2008), page 101.
Both diagnostic and score tests share the common feature of checking whether particular aspects of models need to be included in a model selection procedure, starting from a simple base model and without fitting more complex models unless the data suggest otherwise. It is therefore natural to explore the relationships that might exist between the two types of test. Smyth (2003) showed that the Pearson goodness-of-fit test for a 2 × 2 contingency table is mathematically equivalent to a score test and we outline a proof of this in Appendix A.1. We now use this result to demonstrate how specific important diagnostic tests for capture-recapture data can be expressed as score tests. Throughout the paper we adopt the notation that is used in the software U-CARE.

Diagnostic test 2
Test 2 involves comparing the future histories of individuals that are captured and not captured at a given capture occasion, and thus tests whether capturing individuals affects the probability of future encounters (Pradel, 1993). This test is performed through a series of paired contingency table tests, examining differences between individuals that were captured at occasion t i and those not captured at occasion t i but which are known to be alive then, thus detecting a behavioural response to capture. The tests for capture occasion t i are denoted by test 2.CT(i) and test 2.CL(i), for i = 2, : : : , T − 1. The contingency table corresponding to test 2.CT(i) compares whether capture at occasion t i affects time of subsequent capture and is generally given by Table 1.
We now consider the model probabilities that are associated with this test. If p Å i+1 denotes the probability that an individual is captured at occasion t i+1 given that it was also captured at occasion t i , a score test for immediate trap dependence at occasion t i would examine H 0 : p Å i+1 = p i+1 . An X 2 -test of homogeneity based on the expected values of the contingency table tests whether These expressions are equal if and only if p Å i+1 = p i+1 , and therefore, by Smyth (2003), test 2.CT(i) is equivalent to a score test. The peeling-pooling algorithm of Burnham (1991) demonstrates how p i+1 is estimated solely from the components of the m-array that is used within test 2.CT(i), which means that the score test of H 0 : φ t , p 2 , : : : , p i+1 = p Å i+1 , : : : , p T versus H 1 : φ t , p 2 , : : : , p i+1 , p Å i+1 , : : : , p T , where p i+1 = p Å i+1 , is equivalent to test 2.CT(i). Test 2.CL(i) tests for differences between the expected time of recapture between those captured and not captured at occasion t i , for those individuals that were captured after time t i+1 . Thus, this component test should intuitively be equivalent to a score test of a delayed trap dependence, such that capture at occasion t i affects capture at occasion t i+2 , as the test compares whether capture at occasion t i affects the probability of capture at occasion t i+2 or later. However, in this case the score test of a long-term trap effect and test 2.CL(i) are not equivalent. This is due to the parameter p i+2 that appears in cell probabilities corresponding to cells which are not included in the contingency table for test 2.CL(i). It is, however, possible to perform a score test of long-term trap effect following capture and one approach of how this can be done is discussed in Appendix A.2.

Diagnostic test 3
Test 3 compares the future encounter histories of 'new' and 'old' individuals, where new individuals are those which have not been previously captured and old individuals are those which have been encountered before their current capture and thus will test for differences in survival probability of new and old individuals. The standard m-array that was presented earlier conditions on the time of last capture, and therefore the past encounters of particular individuals are not recorded within this format. It is therefore necessary to use the generalized m-array that was introduced in Section 1, which includes information on whether individuals are new or old. We note that, at occasion t 1 , all released individuals will be new. Test 3 is constructed as a series of contingency table tests based on the generalized m-array components, and comparisons are made between new and old individuals that are released at occasion t i through tests 3.SR(i) and 3.Sm(i). The contingency table that is associated with component test 3.SR(i) is given by Table 2.
The probabilities that are associated with the contingency table for test 3.SR(i) are for the newly marked individuals, and for the previously marked individuals. Pradel et al. (1997) described this as a test for transient individuals. Therefore, test 3.SR(i) is equivalent to a score test of H 0 : φ Å i = φ i . As with test 2.CL(i), there is no clear score test relationship with remaining component test 3.Sm(i), which is in line with the lack of ecological interpretation for this component test (Pradel et al., 2005).
Because of independence of the component diagnostic tests at occasion t i , test statistics can be summed over i, resulting in tests 2.CT, 2.CL, 3.SR and 3.Sm. It is these summed test statistics which are often presented in practice. Component test statistics 2.CT and 2.CL can also be added, which result in test 2, and similarly test statistics 3.SR and 3.Sm can be added to form test 3. A global goodness-of-fit test results from the sum of the four tests; however, generally they are reported individually to diagnose departures from model assumptions. Further description of diagnostic tests for capture-recapture data can be found in McCrea and Morgan (2014),  2.CT(i) φ 1 , : : : , φ T −1 , p 2 , : : : , {p i+1 = p Å i+1 }, : : : , p T φ 1 , : : : , φ T −1 , p 2 , : : : , p i+1 , p Å i+1 , : : : , p T 2.CT φ 1 , : : : : : : , p T 2.CL(i) and 2.CL No equivalent score test 3.Sm(i) and 3.Sm No equivalent score test †The parameters under the null and alternative hypothesis are provided for the score tests. chapter 9. Table 3 summarizes the equivalences between diagnostic tests and score tests and presents the parameter structures under the null and alternative hypotheses.

Model selection strategies
In Section 2 we demonstrated the equivalence of components of two important diagnostic tests to specific score tests, and this relationship motivates us to examine whether the diagnoses of trap dependence and transience can be incorporated in a step-up model-selection approach. We shall compare the performance of two alternative strategies.
(a) The traditional diagnostic tests based on the CJS model are conducted and then the potential model set is determined by the results of these tests. If none of the diagnostic tests are significant, the model set will consist of the CJS model with all combinations of time dependent and constant parameters. If any of the diagnostic tests is significant, then the model set will incorporate potential trap dependence (if test 2 was significant) or transience (if test 3 was significant), or combinations of both if tests 2 and 3 were each significant. Once the model set has been determined, all models in the set are fitted and are compared by using the Akaike information criterion (AIC). (b) The second strategy is a score test approach which tests for trap dependence and transience during the step-up algorithm that is adopted. The score test approach starts with the simplest model with constant survival and capture parameters and tests for each parameter dependence in turn, including tests for trap-dependent capture probabilities and transience in survival probabilities as well as time dependence in parameters. This is an important difference compared with strategy (a) which assumes time dependence throughout. Starting with a CJS model with constant parameters, a path is followed through the model set by selecting the model with the most significant score test and then fitting that model, which becomes the model under the null hypothesis for the next level of tests. The procedure stops at the stage when all score tests are non-significant.
The simulation study compares the powers of these two strategies and investigates the power of the score test approach to detect trap dependence and transience for a variety of parameter structures.
The simulations that we present here and the applications in the next section have generally used a level of significance of 0.05 for each of the score tests, although different significance levels are examined in Section 4. 2. As discussed in McCrea and  there is an issue of multiple testing with step-up approaches; however, within the model set that we consider here the number of models being compared is relatively small and therefore not formally correcting significance levels, e.g. through a Bonferroni correction, is unlikely to cause problems in practice. Further, McCrea and Morgan (2011) suggested the use of step-down tests in conjunction with step-up tests because of the complexity of the model space that they were working in. Again, this is unlikely to be a problem for the models of this paper. We present illustrative simulation results for diagnostic and score tests; however, we have drawn the same conclusions for a wide range of parameter values, and the power simulation results for the diagnostic tests which we have run as part of our performance comparisons are in line with the results of Pollock et al. (1985).
We note that throughout the remainder of the paper we use standard capture-recapture notation; for example a model which includes trap-dependent capture probabilities (as described in Section 3.1) is denoted by p.trap/, and φ.trans/ denotes that the model incorporates transient survival probabilities, as described in Section 3.2. Time dependence in capture and survival is denoted by p.t/ and φ.t/ respectively. Interactions of parameter-dependence are denoted by 'Å'.

Simulation investigating power
We have shown that performing component diagnostic tests 2.CT and 3.SR is equivalent to performing score tests where the model under the null hypothesis is the CJS model, with timedependent survival and capture probabilities. However, for some data the survival and/or capture probability parameters may not vary with time, resulting in some of the parameters of the null model for these two diagnostic tests being superfluous. We therefore investigate the effect of such superfluous parameters on the power of the tests.

Detecting trap dependence
We simulate data with R i = 500, for i = 1, : : : , T = 10, assuming a constant survival probability, φ = 0:6, and we assume that the capture probability p is constant for individuals that were captured at the previous occasion, and p Å = p + β for individuals that were captured at the previous occasion. Therefore, β determines the 'trap effect': β < 0 indicates trap shyness, whereas β > 0 indicates trap happiness. We define the structure of the capture-recapture models that we are considering by using a '·' to denote a probability which is constant over time and a 't' to denote time-dependent probabilities. We consider the performance of two tests of trap dependence: (a) a score test of H 0 : φ.·/, {p.·/ = p Å .·/} versus H 1 : φ.·/, p.·/, p Å .·/ and (b) the diagnostic test 2.CT, which is equivalent to a score test of H 0 : We observe from Fig. 1 that the score test has a much higher power to detect trap happiness than the diagnostic test under these conditions, with β ranging from 0 to 0.1 in increments of 0.01 for values of p = 0:2, 0:4, 0:6, 0:8.
We have also looked at the power of the score test H 0 : φ.·/, {p.·/ = p Å .·/} versus H 0 : φ.·/, p.·/, p Å .·/, when the survival and/or capture probabilities are time dependent, with additive trap happiness β. For each iteration of each simulation run, the time-dependent survival probability was simulated as φ t ∼ U.0:5, 0:7/ and time-dependent p t ∼ U.0:2, 0:5/. When constant, p = 0:2 and φ = 0:7. The power results in this case are displayed in Fig. 2. We observe that there is an increased type 1 error for the score test of trap dependence (when β = 0) when there is time- However, in practice, within the step-up strategy the score test of trap dependence is performed at the same time as the score test for time dependence, and the path resulting from the most significant test statistic would be followed. We display boxplots of the p-values resulting from the score tests for time-dependent survival, trap dependence and time-dependent capture probability when β = 0 in Fig. 2 and we note that the score test for time dependence is more significant than the score test for trap dependence and therefore time dependence will be included first, and a subsequent test for trap dependence at the next step will not have an inflated type 1 error. We note that, if the step-up score test selects time dependence in both capture and survival probabilities, the model under the null hypothesis becomes H 0 : φ.t/, p.t/ and the score tests for the next set of tests will be exactly equivalent to the diagnostic tests for trap dependence and transience and so the two model selection strategies coincide.  (for test 2) or occasion of first capture (for test 3). When a stepwise score test approach is carried out, the initial null model assumes no time dependence, and so we constructed a contingency table test which ignored temporal effects. We devised a pooled contingency table test, which adds the cell entries of each of the 2 × 2 2.CT(i) contingency tables, and then computed a single test statistic from the pooled data. A similar pooled test can be constructed for transience, by pooling the 2 × 2 3.SR(i) contingency tables.

Pooling the diagnostic test
The power curves for the case of p = 0:8, for −0:1 β 0:1 are displayed in Fig. 3. We see that the pooled contingency table approach has an intermediate power to detect trap dependence, with an improvement compared with the standard diagnostic tests, but has less power than the score test approach.

Detecting transience
The power of the tests for transience is presented in Fig. 4. We simulate data, with R i = 500, for i = 1, : : : , T = 10, assuming a constant capture probability p = 0:8 and constant survival probability φ = 0:7 for individuals that were previously captured and φ Å = φ + γ for newly captured individuals. Since we assume that transient individuals are less likely to be caught again, we consider values of γ between −0:1 and 0. We observe that the power of the diagnostic test is lower than that of the equivalent score test, and interestingly the power of the pooled diagnostic test is very similar to the power of the score test in this case.

Simulation comparing strategies
A simulation study has been run to compare the overall performance of the two alternative model selection approaches for varying sample sizes and levels of significance. Data were simulated from a model with constant capture probability of 0.4; previously marked individuals had a survival probability of 0.7, and new individuals had a marginally higher survival probability of 0.8. The sample size was varied through the values of R i and varied from 100 to 500. At smaller sample sizes, the power of the diagnostic test was not as good as the score test approach (in line with the earlier power simulations), and in over 50% of cases failed to detect the difference in survival probabilities between new and old individuals (Table 4). Only when the ecologically unrealistic sample size of R i = 500 and the level of significance of 5% were used did the diagnostic test outperform the score test. We see that the 5% level of significance should be reduced as sample size increases considerably. These simulations, and others that we have run, suggest that current recommendations promoting the use of diagnostic tests to rule out the need for trap dependence or transience within a candidate model set may result in important effects being ignored.

Cormorants
The results from the stepwise score test approach are displayed in Table 5. We note that the AIC values and likelihood ratio tests have been computed only for comparison. Tests that were conducted within a single level of the model selection procedure are denoted with the same letter (with A representing the first stage of models, B the second stage etc.) and the model under the null hypothesis at each level is denoted with a 0. The procedure selects a model with transience, time-dependent survival probability and trap-dependent capture probability. We note that the p-values for the significant score tests are highly significant and thus the choice of a conservative level of significance is not important. The diagnostic tests indicate that both trap dependence and transience are significant (Table  6), which is identified by the significance of tests 2.CT and 3.SR respectively. Consideration of the AICs of the models incorporating both trap dependence and transience indicates the optimal model to be φ.transÅt/, p.trap/, agreeing with the score test approach. The score test approach has been more straightforward since only four models have been fitted, compared with nine for the diagnostic approach, and the diagnosis of trap dependence and transience has been conducted within the model selection stage rather than during a preliminary testing step.

Humpback whales
Performing the diagnostic tests results in non-significant diagnostic tests. In particular test 3.SR results in p = 0:38; however, some evidence of transience is provided by a one-sided test of the signed square root of the Pearson X 2 -statistics (p = 0.04); see Madon et al. (2013) for details. Using the standard diagnostic test conclusions, the relevant model set for consideration would require the four models φ.·/, p.·/, φ.t/, p.·/, φ.·/, p.t/ and φ.t/, p.t/ to be compared, and the model with the smallest AIC is the simplest model with constant capture and survival probabilities. The AIC values for three of these four models are presented in Table 7 for comparison.  Using a stepwise score test approach the transience is detected at the first stage of model selection .p = 0:02/ and the model selected has a very simple structure, of transience in survival probabilities and a constant capture probability (Table 7). This model also has the lowest AIC value of all the fitted models. Here it is clear that there is insufficient evidence that the parameters in the model are time dependent and therefore the score test approach has greater power to detect the transience than the diagnostic test approach.

Discussion and conclusions
We have demonstrated the equivalence of components of the diagnostic tests to specific score tests, which has motivated an alternative strategy for detecting trap dependence and transience.  Drawing conclusions from diagnostic tests can be challenging for particular applications. For example, a significant test for trap dependence within a population which is not physically captured may in fact be due to spatial heterogeneity of the survey region; see for example Lahoz-Monfort et al. (2011). We note that overdispersion may be calculated based on the significant diagnostic tests and then a modified AIC might be used for model selection. Using our new strategy means that such an initial evaluation is not possible; however, McCrea et al. (2011) have presented a general method for assessing absolute goodness of fit following a step-up model selection procedure and appropriate corrections can be made at this stage to the resulting standard errors in the model.  extended the basic diagnostic tests to diagnostic tests for joint recapture and recovery data. Similarly there are tests for multistate capture-recapture data as presented in Pradel et al. (2003). None of these tests will have a direct equivalence to a score test because the contingency tables are generally larger than 2 × 2 for the joint recapture and recovery case and contingency table tests for mixtures being used for the multistate case. However, the strategy that is proposed in this paper still holds for these more complex data structures, as the tests for effects on recovery probability, emigration, memory, trap effects and transience can all be included in the basic model set and a step-up approach can be used to explore the large model space. The lack of power of the diagnostic test of memory for multistate capture-recapture data was detected in Cole et al. (2014) and the lack of power of diagnostic tests for single-site capture-recapture data has been demonstrated here by using simulation. The stepwise score test approach has been shown to work well on both simulated and real data sets and may detect important biological traits which diagnostic tests lack the power to identify. Consequently, our recommendation is to incorporate all possible parameter dependences (time, trap dependence, transience and possibly age if known) within a candidate model set and to explore that model set during the model selection procedure. An efficient way to proceed is to use score tests; however, likelihood ratio tests or the AIC could be used as comparative measures, although they would require the fitting of more models.

A.2. Using score tests to detect long-term trap effects
Although test 2.CL does not have a direct equivalence to a CJS parameterized score test, it is often intuitively described as a test for long-term trap effect on capture probability. Test 2.CT and the equivalent score test examine differences in capture probability at occasion t i+1 between individuals which were captured at occasion t i and those which were not captured at occasion t i . However, biologically, the effect of capture may last for more than one sampling occasion, and such effects were considered for closed populations in Cormack (1989).
One possible way of modelling such a trap effect is through the use of a logistic-linear relationship between the capture probability and the length of time since previous capture. To specify such a model, suppose that we define the probability that an individual is captured at occasion t j , given that it was last captured at occasion t i , as Under H 0 : β = 0, the model assumes that the capture probability does not depend on the occasion of last capture; however, under H 1 : β = 0, the model includes either increasing probability with time since last capture (trap shyness) or decreasing probability with time since the last capture (trap happiness). Other models for a long-term trap effect would be possible. The use of score tests for examining the significance of temporal covariates for ring recovery models was considered in Catchpole et al. (1999) and the formulation extends to capture-recapture models.