Beta–negative binomial auto‐regressions for modelling integer‐valued time series with extreme observations

The paper introduces a general class of heavy‐tailed auto‐regressions for modelling integer‐valued time series with outliers. The specification proposed is based on a heavy‐tailed mixture of negative binomial distributions that features an observation‐driven dynamic equation for the conditional expectation. The existence of a stationary and ergodic solution for the class of auto‐regressive processes is shown under general conditions. The estimation of the model can be easily performed by maximum likelihood given the closed form of the likelihood function. The strong consistency and the asymptotic normality of the estimator are formally derived. Two examples of specifications illustrate the flexibility of the approach and the relevance of the theoretical results. In particular, a linear dynamic equation and a score‐driven equation for the conditional expectation are studied. The score‐driven specification is shown to be particularly appealing as it delivers a robust filtering method that attenuates the effect of outliers. Empirical applications to the series of narcotics trafficking reports in Sydney and the euro–pound sterling exchange rate illustrate the effectiveness of the method in handling extreme observations.


Introduction
Time series data with integer-valued observations are often encountered in empirical applications. Classical continuous response models, such as auto-regressive moving average models, are not suited for the modelling of such series. Over the last few decades, researchers have developed methods that can properly account for the discreteness of the data. A standard approach is to consider observation-driven models that feature time variation in the intensity parameter of the Poisson distribution (Fokianos et al., 2009;Ferland et al., 2006;Davis et al., 2003). A limitation of the Poisson distribution is that it imposes equidispersion, i.e. the mean equal to the variance. Equidispersion is typically restrictive in empirical applications and therefore overdispersed distributions, such as the negative binomial distribution, are often considered (Zhu, 2011;Davis and Wu, 2009;McKenzie, 1986). Another extension that has been considered in the literature is the use of zero-inflated distributions, which are suited for time series with large numbers of 0s (Zhu, 2012). We refer the reader to  for an overview of recent developments.
Extreme observations, or outliers, are often present when analysing time series data. The study of time series with outliers has a long history that dates back to Fox (1972). Ignoring extreme observations in the data set leads to statistical models that offer a poor description of the series of interest. Additionally, statistical inference can also be problematic in the presence of outliers. Some recent studies have focused on the analysis of integer-valued time series with outliers. Barczy et al. (2010Barczy et al. ( , 2012 and Chen et al. (2019) have discussed the estimation of integervalued auto-regressive models and binomial auto-regressive models that are contaminated with innovational and additive outliers. Hall (2001Hall ( , 2003 derived the extremal index of integervalued moving averages. Scotto et al. (2018) studied the properties of a first-order max-integervalued auto-regressive model. Zhu et al. (2015Zhu et al. ( , 2016 considered influence analysis of outliers for integer-valued generalized auto-regressive conditional heteroscedastic (INGARCH) models. Silva et al. (2019) discussed outlier detection in a Bayesian framework.
A standard modelling approach for data sets with outliers is to describe extreme observations by means of heavy-tailed distributions. There is a vast literature on modelling continuousvalued time series with heavy-tailed distributions. Student's t-distribution is often used for this and robust specifications for the dynamic component of the model are employed to attenuate the effect of outliers (Creal et al., 2011;Harvey and Luati, 2014). In contrast, the current literature on integer-valued time is less well developed. One of the only contributions is given by the first-order integer-valued auto-regressive model of Qian et al. (2020). In the current paper, we introduce a general class of observation-driven models for integer-valued time series data with extreme observations. The approach is based on a heavy-tailed mixture of negative binomial distributions, known as the beta-negative binomial (BNB) distribution. The class of models features a dynamic location parameter and a BNB conditional distribution, which describes extreme observations as tail events. We derive conditions for stationarity, ergodicity and finiteness of moments for the proposed class of stochastic processes. Additionally, we show that inference can be easily performed by maximum likelihood (ML), given that the likelihood function is available in closed form. The strong consistency and the asymptotic normality of the ML estimator are proved under general conditions. We consider and study two specifications of the dynamic component of the model. The first is a simple linear auto-regression for the conditional mean. Instead, the second is based on the generalized auto-regressive score (GAS) framework of Creal et al. (2013) and Harvey (2013). This second specification delivers a robust filter that attenuates the effect of extreme observations on the conditional expectation of the BNB process. Finally, we present two empirical applications to the time series of police reports on narcotics trafficking in Sydney, Australia, and to the number of tick changes of the euro-pound sterling exchange rate. The results illustrate the ability of the proposed approach in modelling time series data with extreme observations. The paper is structured as follows. Section 2 provides a brief review of the BNB distribution. Section 3 introduces the class of BNB auto-regressive processes and discusses their stochastic properties. Section 4 derives the asymptotic properties of the ML estimator. Section 5 introduces the linear specification of the model and discusses its properties. Section 6 introduces the scoredriven specification. Section 7 presents the empirical applications. Section 8 concludes.

Preliminaries
We start by reviewing some properties of the BNB distribution, which will be useful in the rest of the paper. The BNB distribution arises as a beta mixture of negative binomial distributions. In particular, let Y conditional on P have a negative binomial distribution, Y |P ∼ N B.r, P/, with dispersion parameter r and success probability P. Assume further that P has a beta distribution, P ∼ beta.α, β/, with shape parameters α and β. Then, the marginal distribution of Y is BNB with the following probability mass function (PMF): where Γ.·/ denotes the gamma function and B.·, ·/ the beta function. The parameter α is the tail parameter of the BNB, which determines the heaviness of the right-hand tail. The smaller α the heavier the tail is. Throughout the paper, we parameterize the BNB distribution in terms of its mean. More specifically, we consider β = .α − 1/λ=r. In this way, the parameter λ represents the mean of the BNB distribution when the mean is finite, which is the case when α > 1. We say that Y ∼ BN B.λ, r, α/ has a BNB distribution with mean λ > 0, dispersion parameter r > 0 and tail parameter α > 1 if The BNB distribution enables us to account for extreme observations, which can be seen as tail events. Furthermore, we note that the BNB distribution can approximate arbitrarily well the negative binomial distribution as well as the Poisson distribution. As the tail parameter diverges, α → ∞, the BNB distribution BN B.λ, r, α/ converges to a negative binomial distribution with dispersion parameter r and success probability λ=.r + λ/. Further, as the dispersion parameter diverges, r → ∞, the BNB converges to a Poisson distribution with mean λ. For a more detailed review of the BNB distribution, we refer the reader to Wang (2011).

Beta-negative binomial auto-regressive models
Consider a time series of counts {y t } t∈Z with the conditional distribution where r > 0, α > 1 and F t denotes the σ-field that is generated by {y t , y t−1 : : :}. The conditional mean process λ t = E.y t |F t−1 / is specified by the following stochastic recurrence equation (SRE) where g θ .·, ·/ is a parametric updating function that maps from N × R + into R + , and θ ∈ R n is a parameter vector. We denote by κ ∈ R n+2 the entire parameter vector of the model κ = .ξ T , θ T / T , where ξ = .r, α/ T . Note that in the above formulation, for simplicity of exposition, λ t is assumed to be F t−1 measurable. Below, we show that under a drift condition and a semicontraction condition on g θ the model's equations admit a stationary and ergodic solution and λ t is F t−1 measurable. The BNB auto-regressive model that is specified in equations (2) and (3) can describe extreme observations in time series data by means of the heavy-tailed BNB conditional PMF. A small value of the tail parameter α indicates that extreme observations are more likely to occur. The mth conditional moment of the BNB auto-regressive process E.y m t |F t−1 / is finite if and only if α > m. However, as we shall discuss below, finiteness of unconditional moments requires further conditions on the updating function g θ . In Sections 5 and 6, some examples of specifications of the updating function are presented. We shall see that a robust updating function may be desirable to reduce the effect of outliers on the conditional mean λ t . We also note that the specification of the conditional mean λ t in equation (3) can be extended to a higher order by including more lags of λ t and y t . The conditional mean of the BNB auto-regression of order .p, q/ can be expressed as λ t+1 = g θ .y t , : : : , y t−p−1 , λ t , : : : , λ t−q−1 /: However, throughout the paper, for simplicity of exposition and ease of notation, we shall focus on the first-order specification (p = q = 1) that is given in equation (3).
In the rest of the section, we study the stochastic properties of the BNB auto-regression that is described by equations (2) and (3). We derive conditions on the updating function g θ that ensure that the process is strictly stationary and ergodic and that guarantee the existence of the first two unconditional moments of y t . The first result that we obtain is the stationarity and ergodicity of the process and existence of the first moment under a drift condition and a semicontraction condition. The proof of theorem 1 is based on the approach of Doukhan and Neumann (2019) for semicontractive GARCH-type processes.
Theorem 1 (stationarity and ergodicity). Consider the BNB auto-regressive process, given by equations (2) and (3), and assume that (a) the drift condition holds where c 0 , c 1 and c 2 are positive constants such that c 1 + c 2 < 1, and (b) the semicontraction condition holds for any y ∈ N, where b is a positive constant such that b < 1.
Then, {.y t , λ t /} t∈Z admits a stationary and ergodic solution with a finite first moment E.y t / < ∞. Furthermore, λ t is F t−1 measurable.
The proof of theorem 1 is given in Appendix A.1. We highlight that the conditions of theorem 1 are weaker than the contraction condition that is typically considered in the literature of integer-valued processes, which is given by |g θ .y, λ/ − g θ .y, λ Å /| c 1 |y − y Å | + c 2 |λ − λ Å |, with c 1 + c 2 < 1 (see, for instance, Davis and Liu (2016) for an application to exponential families). More specifically, the above contraction condition immediately implies that conditions (4) and (5) are satisfied.
The drift condition (4) in theorem 1 requires that the sum of the coefficients c 1 and c 2 is smaller than 1 to ensure stationarity, ergodicity and finiteness of the first unconditional moment of y t . However, a stricter condition is needed to obtain the finiteness of the second moment. The next result imposes sufficient conditions on c 1 and c 2 to obtain E.y 2 t / < ∞, and hence the weak stationarity of the BNB auto-regressive process.
Theorem 2 (weak stationarity). Let the conditions of theorem 1 hold. Furthermore, assume that α > 2 and the drift condition (4) holds with Then, y t has a finite second moment E.y 2 t / < ∞. Hence {y t } t∈Z is weakly stationary. The proof of theorem 2 is given in Appendix A.1. We note that, besides a stricter drift condition, theorem 2 also requires that the tail parameter α is greater than 2. This is needed because the conditional second moment of y t is finite if and only if α > 2.
The results in theorems 1 and 2 are of key importance to derive the asymptotic properties of the ML estimator. In particular, as we shall see in Section 4, the strict stationarity conditions in theorem 1 are sufficient for the consistency of the ML estimator. However, the additional conditions in theorem 2 are needed for the asymptotic normality to hold. In Sections 5 and 6, theorems 1 and 2 will be employed to establish the stochastic properties of a linear and a score-driven BNB auto-regression.

Maximum likelihood estimation
In this section, we discuss the estimation of the BNB auto-regression by ML. We derive conditions to ensure consistency and asymptotic normality of the ML estimator. We assume that a subset of a realized path from the BNB auto-regressive process in equations (2) and (3) with true parameter value κ = κ 0 is observed {y t } T t=1 . Here T denotes the sample size. The likelihood function is available in closed form through a prediction error decomposition. In particular, the average log-likelihood function iŝ where p denotes the conditional PMF, given by p{y t |λ t .θ/, r, α} = Γ.y t + r/ Γ.y t + 1/Γ.r/ B{α + r, .α − 1/λ t .θ/=r + y t } B{α, .α − 1/λ t .θ/=r} : The filtered time-varying parameterλ t .θ/ is obtained recursively by using the observed data where the recursion is initialized at a fixed pointλ 1 .θ/ ∈ R + . We note that initializing the recursion in expression (6) is needed since the observed data start from time t = 1. This is quite standard in the literature of observation-driven models. Finally, the ML estimatorκ T is defined as the maximizer of the likelihood function κ T = arg sup κ∈KL T .κ/, .7/ where K = Ξ × Θ with Ξ ⊂ .0, ∞/ × .2, ∞/ and Θ ⊂ R n being compact parameter sets.

Consistency
To establish the consistency of the ML estimator, we first derive the stochastic limit properties of the filtered parameterλ t defined in expression (6). Note thatλ t .·/ is a stochastic function that maps from Θ into R + . The stability of the filtered parameterλ t is often referred to in the literature as invertibility (Straumann and Mikosch, 2006;Blasques et al., 2018). Because of the initialization,λ t evaluated at the true parameter value,λ t .θ 0 /, does not correspond to the true conditional mean λ t . In what follows, we show that {λ t } t∈N converges exponentially almost surely and uniformly in Θ to a stationary and ergodic sequence of functions {λ t } t∈N such that λ t .θ 0 / = λ t with probability 1. (A sequence of random variables {x t } t∈N converges exponentially almost surely to another sequence {x t } t∈N if there is a constant c > 1 such that c t |x t −x t | → 0 almost surely as t → ∞.) We start by imposing a continuity condition on the updating function g θ , which ensures that θ →λ t .θ/ is continuous in the compact set Θ with probability 1.
Next, we assume that the drift condition and the semicontraction condition hold for all θ in the parameter set Θ, which contains the true parameter value θ 0 . We note that this assumption is not restrictive since, in general, Θ can be defined as a compact ball around the true parameter vector θ 0 .
Proposition 1 (invertibility). Let assumptions 1 and 2 hold. Then, the filter {λ t .θ/} t∈N converges exponentially almost surely and uniformly over Θ to a unique stationary and ergodic Proposition 1 plays a crucial role to ensure that the likelihood functionl t .κ/, which depends on the approximate filterλ t , converges to a stationary and ergodic limit l t .κ/ = log[p{y t |λ t .θ/, r, α}], which depends on limit filterλ t . In this way, l t .κ 0 / corresponds to the true conditional log-PMF of the BNB auto-regressive model.
Next, building on the invertibility result, we impose some additional conditions to obtain the strong consistency of the ML estimator. The next assumption imposes a lower bound on the updating function.
Finally, we consider an identification condition on the parametric updating function g θ to ensure that any parameter value θ ∈ Θ, θ = θ 0 , generates a pathλ t .θ/ that is observationally different from the true λ t . Furthermore, we impose that λ t is not a degenerate random variable to guarantee the identification of the parameter r of the BNB PMF. In particular, the BNB PMF y → p.y|λ t , r, α/ is equivalent for r =r and r = .α − 1/λ t =r,r > 0. Therefore, if λ t is a constant, the parameter r is not identified. We refer to the proof of lemma 8 in the on-line supplement for a more detailed discussion. We also note that the assumption that λ t is not constant is typically implied by the identification condition on g θ ; see, for example, the specifications of λ t in Sections 5 and 6.
Under the assumptions that were stated above, we obtain the strong consistency of the ML estimator.
Theorem 3 (consistency). Let assumptions 1-4 hold and let κ 0 ∈ K; then the ML estimator that is defined in equation (7) is strongly consistent, i.e. κ T → κ 0 , almost surely, as T → ∞: The proof is given in Appendix A.1.

Asymptotic normality
We now focus on deriving the asymptotic normality of the ML estimator. First, we require that the data-generating process has a finite second moment. A finite second moment is needed to ensure the existence of some moments for the derivatives of the log-likelihood, which are used to apply a central limit theorem to the score of the log-likelihood function.
The next assumption is needed to ensure that the Fisher information matrix is positive definite.
Finally, the next assumption requires some regularity conditions on the updating function g θ . In particular, we impose that the updating function is twice continuously differentiable and has some of its derivatives bounded by some linear combinations of their arguments. Note that ' · ' denotes the L 1 -norm when applied to a vector and the operator norm induced by the L 1 -norm when applied to a matrix. Furthermore, we consider the following shorthand notation for the derivatives of the updating function: y, λ/ denotes the second-order partial derivative with respect to the elements of the vector. For instance, if d = .θ, θ/, then g θθ θ .y, λ/ = @ 2 g θ .y, λ/=@θ @θ T . Assumption 7. The function .θ, λ/ → g θ .y, λ/ is twice continuously differentiable with respect to both θ and λ. Furthermore, for any θ ∈ Θ the following inequalities hold The next result delivers the asymptotic normality of the ML estimator. The proof is given in Appendix A.1.

The model
An intuitive and simple way to specify the conditional mean λ t is to consider a linear autoregressive process driven by past observations. Count processes with a linear specification of the conditional mean are often referred to in the literature as INGARCH models (Ferland et al., 2006). We define the BNB-INGARCH model through the equations where ω > 0, φ 0 and τ > 0 are static parameters to be estimated. These parameters are restricted to be positive to guarantee that λ t is strictly positive with probability 1. A linear auto-regressive specification for the conditional expectation λ t is often considered for Poisson and negative binomial auto-regressions; see Ferland et al. (2006), Fokianos et al. (2009) and Zhu (2011). We refer to these models as Po-INGARCH and NB-INGARCH respectively. We can immediately see that BNB-INGARCH can approximate arbitrarily well both Po-INGARCH and NB-INGARCH since the BNB distribution converges to the negative binomial distribution as α → ∞ and to the Poisson distribution as, additionally, r → ∞. The next result relies on theorems 1 and 2 to derive conditions for strict and weak stationarity of the BNB-INGARCH process (8).
The proof is given in the on-line supplementary material. Finally, we derive the strong consistency and asymptotic normality of the ML estimator of the BNB-INGARCH model by appealing to theorems 3 and 4.
Theorem 6. Let the observed series {y t } T t=1 be generated by the BNB-INGARCH process (8) with parameter value κ 0 = .r 0 , α 0 , ω 0 , φ 0 , τ 0 / T . Furthermore, let κ 0 ∈ K, where K is a compact parameter set such that φ + τ < 1, ω > 0, φ 0 and τ > 0 for any κ ∈ K. Then, the ML estimator (7) is strongly consistent: almost surely, as T → ∞: Assume further that κ 0 satisfies condition (9), α 0 > 2, and κ 0 ∈ int.K/. Then, the ML estimator is asymptotically normally distributed: The proof of theorem 6, which is given in the on-line supplement, is obtained by checking that assumptions 1-7 are satisfied. The small sample properties of the ML estimator are studied in a Monte Carlo experiment in the next section.
We also note that a BNB-INGARCH process of order .p, q/ can be obtained by adding lags of λ t and y t in the specification of the conditional mean: where ω > 0, φ i 0, for i = 1, : : : , q, and τ i > 0, for i = 1, : : : , p. Furthermore, the linear specification of λ t that is discussed in this section can be extended to include some form of non-linearity. Following Wang et al. (2014), we can consider a self-excited threshold auto-regressive process for λ t of the form where ω i > 0, φ i 0, τ i > 0, for i = 1, 2, and k ∈ N. From the theory that was developed in Section 3, we can derive conditions for stationarity and ergodicity and for the finiteness of the second moment of y t . In particular, from theorem 1, we obtain that φ m + τ 2 < 1 is sufficient for the stationarity and ergodicity of the BNB self-excited threshold auto-regressive process, where φ m = max{φ 1 , φ 2 }. Moreover, from theorem 2, we obtain that the additional condition is sufficient to ensure that E.y 2 t / < ∞.

Monte Carlo simulation study
We investigate the small sample properties of the ML estimator by means of a Monte Carlo simulation experiment. We generate samples from the BNB-INGARCH model (8) for various sample sizes (T = 250, 500, 1000, 2000). The parameters are then estimated by using the ML estimator (7). Fig. 1 reports the kernel density of the ML estimates. Additional simulation results for different parameter values are given in the on-line supplementary material. The parameters r and α are reparameterized in terms of their inverse. This is done because, especially in small samples, a given realized path from the BNB-INGARCH process may not present outliers and the estimate of α may become arbitrarily large since the likelihood function is flat for large α.
The results confirm that the estimates are consistent since their distribution collapses towards the true parameter values as the sample size increases. Furthermore, the distributions of the ML estimators of δ, δ = ω=.1 − φ − τ /, and τ seem symmetric for all sample sizes. This suggests that the asymptotic normal distribution is an accurate approximation. The results show a more skewed distribution for the estimates of φ, r and α, especially for small T . We also highlight the presence of a small bump in the distribution of r around the point r = .α 0 − 1/δ 0 =r 0 . This is not surprising since, from the identification of r, we expect the distribution of r to have a second lower mode in small samples. Finally, we note that the parameters ω, φ and τ , which determine the dynamic properties of λ t , can also be estimated by alternative methods such as quasi-ML (QML). The Poisson QML estimator is often considered in the literature and it delivers consistent parameter estimates (Ahmad and Francq, 2016). We provide additional simulation results, which are reported in the on-line supplement, to compare the accuracy of ML with Poisson QML. The results show that ML performs substantially better than Poisson QML in terms of mean-squared error, especially for small values of the tail parameter α. This finding is quite intuitive since we expect ML based on the BNB PMF to outperform competing estimation methods when outliers are present in the data.

A score-driven beta-negative binomial auto-regression
The BNB-INGARCH model that was presented in Section 5 accounts for extreme observations by means of the heavy tail of the conditional BNB distribution. However, the SRE for λ t is not robust against outliers: an extreme value of y t can have an arbitrary large effect on λ t+1 . In practice, it is often desirable to have a robust SRE that attenuates the effect of extreme observations on λ t . We propose a robust specification that is based on the GAS framework of Creal et al. (2013) and Harvey (2013). The conditional mean λ t is specified as an auto-regressive process with innovation given by the score of the predictive likelihood. Score-driven models are widely used in the literature to specify robust updating functions in models with heavy-tailed distributions (Harvey and Luati, 2014;Opschoor et al., 2017). The BNB auto-regression with a score-driven SRE, which we label BNB-GAS, is specified by the equations .10/ where ω ∈ R, φ 0 and τ > 0. The score innovation s t = @ log{p.y t |λ t , r, α/}=@ log.λ t / is given by where γ = .α − 1/=r and ψ.·/ denotes the digamma function, i.e. ψ.x/ = @ log{Γ.x/}=@x. The exponential link function in expression (10) is considered to ensure that λ t is strictly positive with probability 1. We refer to the score s t as the innovation of the process since E.s t |F t−1 / = 0 with probability 1. The peculiarity of BNB-GAS is that the functional form of the score innovation reduces the effect of outliers. Fig. 2 illustrates the sensitivity of s t to y t for various values of the tail parameter α. We can see that the effect of large values of y t on s t is attenuated and the degree of attenuation depends on the tail parameter α. The smaller the parameter α the more robust the score innovation s t is. This behaviour of s t is quite intuitive since a small α introduces heavy tails in the conditional PMF of y t and therefore it generates outliers in the observed time series; see also Harvey and Luati (2014) for a similar interpretation in the context of Student's t-distributions. Furthermore, as is shown in the proof of theorem 7 below, the score innovation is bounded by a constant |s t | α + r + 1. Therefore, s t is robust since it does not go to ∞ as y t → ∞. BNB-GAS can approximate arbitrarily well some existing models that have been proposed in the literature. As α → ∞, BNB-GAS becomes a score-driven model with negative binomial distribution; see Gorgi (2018) for an application of the GAS framework with negative binomials. As additionally r → ∞, the model becomes a Poisson auto-regressive model, which belongs to the class of models of Davis et al. (2003).
We now focus on the stochastic properties of BNB-GAS. The next theorem gives sufficient conditions for the existence of a stationary and ergodic solution for the BNB-GAS process.
The proof of theorem 7, which is given in the on-line supplement, is obtained by an application of theorem 1. The inequality in expression (12) is a parameter restriction that makes the semicontraction condition (5) hold. Given the complex functional form of s t , it is not straightforward to obtain a sharper upper bound for the Lipschitz coefficient b in equation (5). We note that, if inequality (12) is not satisfied, it does not mean that the process is non-stationary since the semicontraction condition in theorem 1 is sufficient but not necessary for stationarity. Condition (12) may be restrictive in practice. For instance, it is not satisfied for the parameter values that are estimated in the empirical applications in Section 7. However, this condition highlights that the stationarity region is not degenerate.
Remark 1. Under the conditions of theorem 7, the BNB-GAS process admits a stationary and ergodic solution with E.y m t / < ∞ if and only if α > m. This result holds true because λ t takes values on a compact set with probability 1. This means that E.λ m t / < ∞ for any m ∈ N. Therefore, E.y m t / < ∞ if and only if E.y m t |F t−1 / < ∞ with probability 1, namely, α > m. Finally, we note that the BNB-GAS model (10) can be generalized by considering a different rescaling of the score innovation s t . The conditional Fisher information is often used for this; see Creal et al. (2013) for a discussion. A simple rescaling for BNB-GAS can be obtained by multiplying the score innovation by λ −k t , k ∈ [0, 1], to make the magnitude of the innovation depend on the level of λ t . In the limit case, α, r → ∞, the resulting class of models corresponds to the class of Poisson processes of Davis et al. (2003).

Police reports on narcotics trafficking
In this section, we present an empirical application to the monthly number of police reports on narcotics trafficking in Sydney, Australia. The time series is from January 1995 to December 2016 and it is part of the New South Wales data set of police reports, which is available from http://www.bocsar.nsw.gov.au/. The data set can be found in the section 'Datasets' under the name 'Recorded crime by offence' and geographic breakdown by 'LGA'. The code and data to reproduce the empirical results are available from https://rss.onlinelibrary.wiley.com/hub/journal/14679868/seriesb-datasets. We can see that the data set presents some extreme observations. In particular, the number of narcotics trafficking reports is exceptionally high in August 2000, March 2008 and May 2015. Therefore, BNB auto-regressive models seem particularly suited to describe the auto-correlation structure and to account for the outliers in the data.
Besides the BNB-INGARCH (8) and the BNB-GAS (10), we consider two negative binomial specifications: one with a linear updating function and one based on the GAS framework, which we label NB-INGARCH and NB-GAS respectively. As discussed in Sections 5 and 6, these two models are limit cases, α → ∞, of the BNB-INGARCH and the BNB-GAS model. We Therefore, under correct specification of BNB-INGARCH, the parameter estimates from NB-INGARCH may be seen as negative binomial QML estimates. Theoretical results on negative binomial QML were derived in Aknouche et al. (2018), where a two-stage QML estimator was considered. In contrast, BNB-GAS and NB-GAS do not have the same specification for λ t and only the conditional mean of BNB-GAS is robust against outliers. Model selection between BNB and negative binomial models and among different specifications of λ t can be implemented by using information criteria, such as the Akaike information criterion AIC and Bayesian information criterion BIC, or by using a validation sample. We note that, although the BNB models can approximate arbitrarily well their corresponding negative binomial versions as α → ∞, a formal testing procedure for the null hypothesis α −1 = 0 does not follow from the asymptotic results in Section 4. This is because α −1 = 0 is a boundary point of the parameter space. Table 1 reports the estimates of the models and several measures of in-sample fit and forecasting accuracy. The models BNB-static and NB-static are BNB and negative binominal models with constant conditional expectation, i.e. λ t is a static parameter to be estimated. The log-score criterion is used to measure the accuracy of PMF forecasts (Geweke and Amisano, 2011) and the mean absolute error MAE is used to measure the accuracy of point forecasts. The log-score criterion and MAE are obtained from the one-step-ahead predictions of the observations in the second half of the sample (the last 132 observations). A recursive estimation procedure is considered, where the models are re-estimated for each prediction using all previous observations. More specifically, the prediction of y t at a given time point t = s + 1 is obtained by estimating the models using the observations from time t = 1 to time t = s. In this way, the estimation win- (0.141) †The last five columns contain the log-likelihood, AIC, BIC, log-score criterion and mean absolute error MAE. Standard errors (in parentheses) are obtained from the inverse of the observed Fisher information. A more detailed explanation on how the standard errors and the criteria are computed is given in the on-line supplement. The parameter δ is δ = ω=.1 − φ − τ / for the INGARCH models and δ = ω=.1 − φ/ for the GAS models, and it is equal to the constant conditional expectation for the static models. Values with the best performance are highlighted in italics.
dow expands as more observations become available. More details on how the predictions are obtained are given in the on-line supplement. The results show that the BNB specifications give a better in-sample fit since they have lower values for AIC and BIC. The relevance of the BNB distribution can also be elicited from the relatively low estimates of the tail parameter α, which is estimated to be around 5 with a standard error of about 0:8 for both BNB-INGARCH and BNB-GAS. This means that the estimated conditional distribution of the data is heavy tailed with only four or five finite moments. BNB models deliver more accurate PMF forecasts since they have a larger log-score criterion compared with negative binomial models. This finding indicates that BNB models can deliver more accurate predictions for the probability of occurrence of certain events when the estimated tail parameter α is relatively small. The more accurate point forecasts are given by BNB-GAS and all the other models have a similar performance in terms of MAE. This suggests that the use of a BNB conditional distribution is not relevant for point forecasts. However, the robust specification of λ t of BNB-GAS is beneficial in this application. An illustration of how BNB-GAS attenuates the effect of outliers is given in the on-line supplementary material.
Diagnostic analyses, which are reported in the supplement, show that the BNB specifications are well suited to model the series. In particular, the Pearson residuals show no significant autocorrelation and parametric resampling confirms that the fitted model can properly capture the auto-correlation structure of the time series. Furthermore, the probability integral transformation histogram indicates that the BNB conditional PMF can accurately describe the distribution of the data.

Euro-pound sterling exchange rate
We present a second empirical application where we analyse the euro to British pound exchange rate EUR/GBP on December 12th, 2019. On this day, general elections were held in the UK. We consider the number of tick changes by minute of EUR/GBP from 9.00 a.m. to 9.00 p.m. (Greenwich mean time). As is well known, prices of financial assets change over a discrete grid since the minimum price variation is determined by the tick size (Koopman et al., 2017). The series is constructed by using high frequency prices from the foreign exchange market. In particular, the closing price of each minute for EUR/GBP is considered and the number of tick changes is computed by dividing the absolute price variation by the tick size, which is 10 −5 . More specifically, denoting with p t the closing price at minute t, the number of tick changes of the exchange rate at minute t is n t = |p t − p t−1 | × 10 5 . The sample size of the series is 720 observations, which corresponds to the number of minutes in the 12 h of the sample. The data were obtained from http://www.histdata.com/. The web site provides high frequency transaction data for several currencies that are traded in the foreign exchange market. The EUR/GBP-data can be found in the section 'Download free Forex data' under 'M1 (1 minute bar) data', selecting the desired format of the data file. Fig. 4 shows the plot and the empirical auto-correlation functions of the series. The sample mean is 13.2 and the sample variance is 224.2. The series displays several extreme observations and the auto-correlation functions indicate that there is significant auto-correlation in the data. These features suggest that BNB auto-regressions are well suited for the modelling of the series and to describe the extreme observations by means of the heavy tail of the BNB distribution. We note that studying and predicting the probability of tail events of prices of financial assets are of practical interest to measure and forecast financial risk (Embrechts et al., 2013). Table 2 reports the estimation results as considered in the previous empirical application. The log-score criterion and MAE are obtained from the prediction errors of the observations in the second half of the sample (the last 360 observations). The results show that the BNB models have a better in-sample fit in terms of AIC and BIC. Furthermore, PMF forecasts from BNB models are more accurate as shown by the log-score criterion. However, the point forecast P. Gorgi (0.066) †The last five columns contain the log-likelihood, AIC, BIC, log-score criterion and MAE. Standard errors (in parentheses) are obtained from the inverse of the observed Fisher information. A more detailed explanation on how the standard errors and the criteria are computed is given in the on-line supplement. The parameter δ is δ = ω=.1 − φ − τ / for the INGARCH models and δ = ω=.1 − φ/ for the GAS models, and it is equal to the constant conditional expectation for the static models. Values with the best performance are highlighted in italics.
accuracy is similar across the models and the BNB models do not dominate the negative binomial models. This further confirms that the key advantage of the BNB models is in describing extreme observations as tail events and hence delivering more accurate PMF predictions. In contrast, we note that the accuracy of point predictions mostly depend on the specification of λ t . BNB-INGARCH and NB-INGARCH have the same specification for λ t and therefore a similar point forecast accuracy is to be expected. BNB-INGARCH performs better than BNB-GAS in sample as well as out of sample. Therefore, the robust conditional mean of BNB-GAS does not provide benefits in this application. Overall, the dynamic models with a time-varying conditional mean λ t perform better than the static models.
Also for this application, diagnostic analyses, which are reported in the on-line supplement, show that the BNB models are a good fit for this data set in terms of both modelling the auto-correlation of the series and describing the distribution of the data.

Conclusion
This paper introduces a general framework for modelling integer-valued time series with outliers. The paper proposes a class of observation-driven models that are based on a mixture of negative binomial distributions, known as the BNB distribution. The stochastic properties of the models and the asymptotic theory of ML estimation are formally discussed. Various specifications of the conditional mean are considered and studied. Two empirical applications illustrate the practical relevance of the approach. Further research may focus on extending the proposed method to more flexible specifications. Relevant developments may include embedding the model with a zero-inflated BNB distribution to handle time series with a large number of 0s and introducing time variation in the parameters r and α to account for smooth changes in the shape of the distribution. Another relevant line of research may concern the development of a formal testing procedure to select between BNB and negative binomial models, which would enable detection of the presence of heavy tails in integer-valued time series data.

Acknowledgements
The author thanks two referees and the Associate Editor for their useful comments, which helped to improve the paper. A.1.1. Proof of theorem 1 The existence of a stationary and ergodic solution follows directly from an application of theorem 2.2 of Doukhan and Neumann (2019) by checking that conditions A1-A3 in Doukhan and Neumann (2019) are satisfied. Furthermore, the F t−1 -measurability of λ t is also implied by the proof of theorem 2.2 of Doukhan and Neumann (2019) since it is shown that λ t can be expressed as a measurable function of {y t−1 , y t−2 : : :}. Below we show that conditions A1-A3 hold. It is straightforward to see that condition A1 is implied by the drift condition in equation (4) and assumption A2 is implied by the semicontraction condition (5). Condition A3 requires an upper bound on the total variation distance between two BNB distributions with different means. In particular, denoting with BNB λ 1 the probability measure of a BNB distribution with mean λ 1 , BNB.λ 1 , r, α/, and with BNB λ 2 the probability measure of a BNB distribution with mean λ 2 , BNB.λ 2 , r, α/, condition A3 requires the following upper bound to hold for some δ > 0:

A.1. Proofs
where the total variation distance is defined as We obtain that the above upper bound on the total variation distance holds with δ = 1 by lemma 7 in Appendix A.2.
Finally, we conclude the proof by showing that E λ t Θ < ∞. By assumption 2, we obtain that with probability 1 Therefore, since c 2 < 1, for sufficiently large t we obtain that and by stationarity we have that the above inequality holds for any t. Therefore, the desired result follows by noting that E.y t / < ∞ by theorem 1.
Then, given conditions (a) and (b) and the compactness of K, the consistencyκ T → κ 0 , almost surely, follows immediately by standard arguments that go back to Wald (1949).
(a) An application of the triangle inequality yields where L T .κ/ = .1=T /Σ T t=1 l t .κ/. Therefore, the desired result follows if we can show that both terms on the right-hand side of inequality (16) are vanishing almost surely and uniformly as T diverges.