The Hot Hand in Professional Darts

We investigate the hot hand hypothesis in professional darts in a near-ideal setting with minimal to no interaction between players. Considering almost one year of tournament data, corresponding to 167,492 dart throws in total, we use state-space models to investigate serial dependence in throwing performance. In our models, a latent state process serves as a proxy for a player's underlying ability, and we use autoregressive processes to model how this process evolves over time. We find a strong but short-lived serial dependence in the latent state process, thus providing evidence for the existence of the hot hand.


Introduction
In sports, the concept of the "hot hand" refers to the idea that athletes may enter a state in which they experience exceptional success. For example, in basketball, players are commonly referred to as being "in the zone" or "on fire" when they hit several shots in a row. However, in their seminal paper, Gilovich et al. (1985) analyzed basketball free-throw data to find no support for a hot hand, hence coining the notion of the "hot hand fallacy". Since then, there has been mixed evidence, with some papers claiming to have found indications of a hot hand phenomenon and others disputing its existence.
There are mainly two types of approaches that have been used to investigate for potential hot hand patterns, namely 1) analyses of success rates conditional on the outcomes of previous attempts (see, e.g., Gilovich et al., 1985, Dorsey-Palmateer and Smith, 2004, Miller and Sanjurjo, 2014, and 2) such that use a latent variable to describe the underlying ability (or "hotness") of a player (see, e.g., Sun, 2004, Wetzels et al., 2016, Green and Zwiebel, 2017. Within 1), the hot hand is understood as a causal relationship, where success increases the probability of success in subsequent attempts. In contrast, 2) focuses on correlation in players' abilities, allowing for periods where players experience elevated success rates. In this paper, we focus on the latter, since this approach is more aligned with colloquial expressions such as "being in the zone". More specifically, using state-space models, we evaluate serial dependence in a latent state process, which can be interpreted as a player's varying ability.
Notably, Miller and Sanjurjo (2016) highlight a subtle selection bias that may sneak into analyses of sequential data and challenge the findings of Gilovich et al. (1985).
Aside from mathematical fallacies, we note that many of the existing studies considered data, e.g. from baseball or basketball, which we believe are hardly suitable for analyzing streakiness in performances. For example, when analyzing hitting streaks of a batter in baseball, other factors such as the performance of the pitcher are also important but hard to account for. The same applies to basketball, as there are also several factors affecting the probability of a player to make a shot, e.g. the position (of a field goal attempt) or the effort of the defense. In particular, an adjustment of the defensive strategy to stronger focus on a player during a hot hand streak can conceal a possible hot hand phenomenon (Bocskocsky et al., 2014).
To overcome these caveats, here we investigate whether there is a hot hand effect in professional darts, a setting with a high level of standardization of individual throws. In professional darts, well-trained players repeatedly throw at the dartboard from the exact same position and effectively without any interaction between competitors, making the course of play highly standardized. We consider a very large data set, with about n = 167, 492 throws in total, which allows for comprehensive inference on the existence and the magnitude of the hot hand effect.

Data
Data was extracted from http://live.dartsdata.com/, covering all professional darts tournaments organized by the Professional Darts Corporation (PDC) between April 2017 and January 2018. In these tournaments, players start each leg with 501 points, and the first player to reach exactly zero points wins the leg. To win the match, a player must be the first to win a pre-specified number of legs (typically between 7 and 15). In our analysis, we include all players who played at least 50 legs during the time period considered.
At the beginning of legs, players consistently aim at high numbers to quickly reduce their points. The maximum score in a single throw is 60 as in a triple 20 (T20), but the data indicate the outcomes triple 19 (T19), triple 18 (T18), triple 17 (T17), triple 16 (T16), triple 15 (T15), and bullseye (Bull), to be targeted in the initial phase of a leg as well. Thus, in the initial phase of a leg we regard any throw to land in the set H = {T15, T16, T17, T18, T19, T20, Bull} as success. A leg is won once a player reaches exactly 0 points, such that players do not target H towards the end of legs, but rather numbers that make it easier for them to reduce to 0. To retain a high level of standardization and comparability across throws, we truncate our time series data, excluding throws where the remaining score was less than c = 180 points.
We thus consider binary time series {y p,l t } t=1,...,T p,l , indicating the throwing success of player p within his l-th leg in the data set, with Each row corresponds to one leg -truncated when the score fell below 180 -and gaps between blocks of three successive dart throws indicate a break in Anderson's play due to the opponent taking his turn. Next we formulate a model that enables us to potentially reveal any unusual streakiness in the data, i.e. a possible hot hand effect.

State-Space Model of the Hot Hand
We aim at explicitly incorporating any potential hot hand phenomenon into a statistical model for throwing success. Conceptually, a hot hand phenomenon naturally translates into a latent, serially correlated state process, which for any player considered measures his varying underlying ability. For average values of the state process, we would observe normal throwing success, whereas for high (low) values of the state process, we would observe unusually high (low) percentages of successful attempts. Figuratively speaking, the state process serves as a proxy for the player's "hotness" -alternatively, it can simply be regarded as the player's varying ability. The magnitude of the serial correlation in the state process then indicates the strength of any potential hot hand effect. A similar approach was indeed used by Wetzels et al. (2016) and by Green and Zwiebel (2017), who use discrete-state Markov models to measure the underlying ability. While there is some appeal in a discrete-state model formulation, most notably mathematical convenience and ease of interpretation (with cold vs. normal vs. hot states), we doubt that players traverse through only finitely many ability states, and advocate a continuously varying underlying ability state variable instead. Specifically, dropping the superscripts p and l for notational simplicity, we consider models of the following form: where {y t } t=1,...,T is the observed binary sequence indicating throwing success, and {s t } t=1,...,T is the unobserved continuous-valued state process indicating a player's varying ability. We thus model throwing success using a logistic regression model in which the predictor η t (s t ) for the success probability π t depends, among other things, on the current ability as measured by s t . The unobserved ability process {s t } is modeled using an autoregressive process, and will include the possibility to be reduced to the nested special case of independent observations, corresponding to absence of any hot hand phenomenon.
Model (1) is a special case of a state-space model (SSM). Before we specify the exact forms of η t (s t ) and of h t in Chapter 3, in the next section we first discuss how to conduct maximum likelihood estimation within the general formulation given above.

Maximum Likelihood Estimation
The likelihood of a model as in (1) involves analytically intractable integration over the possible realizations of s t , t = 1, . . . , T . We use a combination of numerical integration and recursive computing, as first suggested by Kitagawa (1987), to obtain an arbitrarily fine approximation of this multiple integral. Specifically, we finely discretize the state space, defining a range of possible values [b 0 , b m ] and splitting this range into m intervals The likelihood of a single throwing history can then be approximated as follows: with b i denoting the midpoint of B i . This is just one of several possible ways in which the multiple integral can be approximated (see, e.g., Zucchini et al., 2016, Chapter 11).
In practice, we simply require that m be sufficiently large. With the specification as logistic regression model as in (1), we have that The approximate probability of the state process transitioning from interval , follows immediately from the specification of h t and the distribution of the noise, t .
The computational cost of evaluating the right hand side of Equation (2) is of order O(T m T ). However, the discretization of the state space effectively transforms the SSM into a hidden Markov model (HMM), with a large but finite number of states, such that we can apply the corresponding efficient machinery. In particular, for this approximating HMM, the forward algorithm can be applied to calculate its likelihood at a cost of order O(T m 2 ) only (Zucchini et al., 2016, Chapter 11). More specifically, . . , m, and m × m diagonal matrix P(y t ) with i-th diagonal entry equal to Pr(y t |s t = b i ), the right hand side of Equation (2) can be calculated as with column vector 1 = (1, . . . , 1) ∈ R m . Equation (3) We estimate the model parameters by numerically maximizing the approximate likelihood, subject to the usual technical issues as detailed in Zucchini et al. (2016).

Results
Before presenting the results of the different hot hand models considered, we formulate two models that correspond to the hypothesis of no hot hand effect being present. These will serve as benchmarks for the SSMs to be considered below. Model 1 assumes that each player's probability of success is constant across throws, i.e. the predictor in the logistic regression model for throwing success involves only player-specific intercepts: logit(π t ) = β 0,p .
Note we again suppress the superscripts p and l for player and leg, respectively, from In Model 3, we now include an underlying ability state variable {s t }, which we assume to follow an autoregressive process of order 1: with t iid ∼ N (0, 1). Effectively this is a Bernoulli model for throwing success in which the success probability fluctuates around the players' baseline levels -β 0,p , β 0,p + β 1 and β 0,p +β 2 for within-turn throws one, two and three, respectively -according to the autoregressive process {s t }. The process {s t } can be interpreted as varying underlying ability (or "hotness"). For φ = 0, the model collapses to our benchmark Model 2 (i.e. absence of a hot hand), whereas φ > 0 would support the hot hand hypothesis. For the beginning of a leg, we assume s 1 ∼ N (µ δ , σ δ ), i.e. that a player's underlying ability level starts afresh in every leg according to a normal distribution to be estimated.
We fit Model 3 using m = 150 and −b 0 = b m = 2.5, monitoring the likely ranges of the process {s t } to ensure the range considered is sufficiently wide given the parameter estimates. Table 1 displays the parameter estimates (except the player-specific intercepts) including 95% confidence intervals based on the observed Fisher information.
Crucially, the estimateφ = 0.493 supports the hot hand hypothesis, with the associated confidence interval not containing 0. This result corresponds to a considerable correlation in the underlying ability of the players' performances. The AIC clearly favors the hot hand model formulation, Model 3, over the benchmark given by Model 2 (∆AIC = 550). However, the estimated mean of the initial distribution,μ δ = −0.060, indicates that players tend to start a leg with an ability level slightly below average. This indicates that a momentum in performances may first of all need to be built, or in other words that the hot hand effect could be only short-lived, which is further discussed below.
To improve the realism of the hot hand model, we thus consider Model 4, where we In the (approximate) likelihood, which still is of the form specified in (4), the t.p.m. Γ is then not constant across time anymore, but equal to either a within-turn t.p.m. Γ (w) or an across-turn t.p.m. Γ (a) . For Model 4, which is clearly favored over Model 3 by the AIC (∆AIC = 242), the parameter estimates as well as the associated confidence intervals are displayed in Table 2. The estimate of the persistence parameter of the AR(1) process active within a player's turn,φ w = 0.726, corresponds to quite strong correlation, which provides evidence for a clear hot hand pattern within turns. However, the estimateφ a = 0.057 indicates only minimal persistence in the players' abilities across turns. In fact, when at time t a player begins a new set of three darts within a leg, then the underlying ability variable is drawn from an N (0.057s t−1 , 0.790 2 ) distribution, which is notably close to the initial distribution of the AR(1) process, an N (−0.034, 0.690 2 ), which determines the underlying ability level at the start of a leg. In other words, there is a clear hot hand pattern, but the corresponding momentum is very short-lived and effectively only applies to darts thrown in quick succession. We cannot rule out that there may be a weak carry-over effect also across turns -our results show no conclusive evidence in this regard. Table 3 provides an overview of the four models fitted, detailing the number of parameters, the AIC values, the type of state process (if any) and a short description.  To obtain a more detailed picture of the short-term correlation found in the throwing performances, and also to check the goodness of fit of our models, in Table 4 we compare the empirical relative frequencies of the eight possible throwing success sequences within players' turns -000, 001, 010, 011, 100, 101, 110, and 111 -to the corresponding frequencies as expected under the four different models that were fitted. We restricted this comparison to the first two turns of players within each leg, and used Monte Carlo simulation to obtain the model-based frequencies of the eight sequences. Table 4: Relative frequencies of the eight possible throwing success histories within a player's turn. The second column gives the proportions found in the data, while columns 3-6 give the proportions as predicted under the various models fitted, for data structured exactly as the real data. i.e. the most likely state sequence, given the observations. After discretizing the state space into m intervals, maximizing this probability is equivalent to finding the optimal of m T possible state sequences. This can be achieved at computational cost O(T m 2 ) using the Viterbi algorithm. We then calculate the corresponding trajectories π * 1 , . . . , π * T of the most likely success probabilities to have given rise to the observed throwing success histories, taking into account also the player-specific effects and the dummy variables. Figure 1 displays the decoded sequences for six players from the data set.
Since there are only 2 3 = 8 different possible sequences of observations within a player's turn, and since players start each turn almost unaffected by previous performances (cf. φ a = 0.057), there is only limited variation in the most likely sequences. The actual sequences may of course differ from these most likely sequences. The probability of hitting H increases after the first throw within a turn due to the two dummy variables.
We also see confirmed that the underlying ability level is not retained across turns.

Discussion
Our analysis of a throwing success in darts provides strong evidence for a short-lived Instead, we believe that the notion of the hot hand is usually supposed to refer to players building up momentum over some period of a match. From that perspective, we would have expected to find (stronger) evidence of serial correlation also across players' turns.
In other words, while we find strong serial correlation for within-hand throws, we do not actually find conclusive evidence for a hot hand the way it is usually understood.  Figure 1: Decoded most likely sequences of throwing success probabilities according to Model 4, for > 100 legs played by each of six players from the data set. The horizontal dashed lines indicate the player-specific intercepts for the respective player's withinturn throw one, and the vertical dashed lines denote the transition between a players' turn of three darts each.
Further research could focus on explicitly addressing player heterogeneity. In addition to the baseline level of π t , the parameters φ w , φ a , σ w and σ a , and hence the magnitude of the hot hand effect, may vary across players. This could reveal that for some players the hot hand effect lasts longer than for others, and potentially also across turns. Modeling this individual variability could be achieved using covariates or, if no suitable covariates are available to explain the heterogeneity, via random effects.
We also want to reiterate that the results presented in this paper refer to the hot hand as a correlational phenomenon, i.e. a correlation in the underlying ability level.
Some previous studies have instead assumed the hot hand to be a causal phenomenon, where throwing success at time t − 1 directly affects the probability of success at time t. With the binary time series data that we analyzed, we found that corresponding models that incorporate both correlational and causal effects could not be estimated reliably due to high numerical instability. With more detailed data on performances, we envisage approaches that allow both correlational and causal effects, in a single model, to potentially deliver important new insights into the hot hand concept.