Improving the identification of antigenic sites in the H1N1 influenza virus through accounting for the experimental structure in a sparse hierarchical Bayesian model

Summary Understanding how genetic changes allow emerging virus strains to escape the protection afforded by vaccination is vital for the maintenance of effective vaccines. We use structural and phylogenetic differences between pairs of virus strains to identify important antigenic sites on the surface of the influenza A(H1N1) virus through the prediction of haemagglutination inhibition (HI) titre: pairwise measures of the antigenic similarity of virus strains. We propose a sparse hierarchical Bayesian model that can deal with the pairwise structure and inherent experimental variability in the H1N1 data through the introduction of latent variables. The latent variables represent the underlying HI titre measurement of any given pair of virus strains and help to account for the fact that, for any HI titre measurement between the same pair of virus strains, the difference in the viral sequence remains the same. Through accurately representing the structure of the H1N1 data, the model can select virus sites which are antigenic, while its latent structure achieves the computational efficiency that is required to deal with large virus sequence data, as typically available for the influenza virus. In addition to the latent variable model, we also propose a new method, the block‐integrated widely applicable information criterion biWAIC, for selecting between competing models. We show how this enables us to select the random effects effectively when used with the model proposed and we apply both methods to an A(H1N1) data set.


Sampling γ
In order to sample γ we use collapsing methods as detailed in Section 4 of the main paper. Following the method proposed in Davies et al. (2017) we integrate over µ w , w * γ , π, and σ 2 ε , however in the case of the eSABRE method are left with a conditional distribution that includes µ y but not y, leading to the increased computational efficiency discussed and tested in Sections 4.1 and 8 of the main paper: where C π = Γ(||γ||+απ)Γ(J−||γ||+βπ) , m γ,0 = (µ w 0 , µ 0 , . . . , µ 0 ) with µ 0 repeated with length ||w γ || dependent on γ, and V γ,0 is a block diagonal matrix of (0, σ 2 0 ) where the square blocks have length 1 and ||w γ || respectively. We can use the Woodbury identity and the extended Sylvester's determinant theorem to speed up the computations and give the following conditional posterior distribution:

Collapsing Within Conditional Distributions
In order to sample the eSABRE method via the collapsing scheme suggested in Section 4 of the main paper, we must derive the collapsed conditional distributions for σ 2 ε and µ w . The conditional distribution of γ is derived in Section 1.1, while (3) and (18) give the distributions for π and w * γ . The conditional distribution for σ 2 ε can then be derived as follows: where the first distribution is taken from results in Section 1.1 and the definitions of m γ,0 and Σ γ are given in Section 1.1. Finally we can give the conditional distribution of µ w as follows:

Additional Data Information
The 570 reference and test virus pairs represent only a subset of the possible pairs that are available for testing. During influenza surveillance, test viruses are characterised through testing with a panel of relatively antigenically similar reference viruses so there is a systematic missingness pattern (see Harvey et al. (2016), Figure 1 for an illustration of this pattern). This however is not expected to be detrimental as titres between pairs of distant viruses provide no useful information due to there generally being no cross-reaction between such viruses. We analysed the subset of all collected data comprising all viruses used as both test and reference viruses . This constituted the largest dataset that allowed us to disambiguate between avidity and antigenic effects.

Model Selection Strategies
We have used two model selection strategies. For the feature selection, which is of primary interest for the identification of antigenic sites, we have used a Bayesian approach based on a spike-and-slab prior. For the selection of random effect components, on the other hand, we have used an approach based on information criteria, more specifically the widely applicable information criterion (WAIC). While the latter choice was mainly motivated for computational viability, as discussed at the beginning of Section 5 of the main paper, a referee has suggested that we discuss the implications in the context of Lindley's paradox (Lindley, 1957). In a nutshell, Lindley's paradox states that Bayesian model selection increasingly favours the less complex model as the vagueness of the prior relative to the posterior increases, while methods based on hold-out predictive performance (to which WAIC is asymptotically equivalent) are unaffected by the vagueness of the prior. Deeper methodological insight into the nature of this paradox can be obtained from Robert (2014), and strategies for calibration of Bayesian variable selection with out-of-sample cross validation have been discussed in Vehtari and Ojanen (2012) and references therein. We here refer to a recent comprehensive evaluation study, Aderhold et al. (2016), where Bayesian variable selection was systematically compared with WAIC for variations of the prior uncertainty over several orders of magnitude. The upshot is that the effect of Lindley's paradox only manifests itself for extreme degrees of prior uncertainty, and that WAIC and Bayesian model selection tend to give consistent results for prior parameter uncertainties that are realistic in practice. The former finding tallies with the conclusion in Robert (2014) that excessively vague and improper priors should be avoided, as we did in our study. The latter finding provides a practical justification for our approach of combining both model selection approaches.

Additional Results
This section contains additonal results for the simulation studies, which for brevity were not included in the main paper.

Additional results for the influenza inspired simulated datasets
To compare the out-of-sample performance of the eSABRE and conjugate SABRE methods, we used the 2,000 observations from each of the influenza inspired simulated datasets. We compare out-of-sample performance for the models trained on 500 and 1,000 observations respectively, using the remaining 1,000 samples for testing. Based on the generated parameter samples for each of these methods we made predictions for the 1,000 out-of-sample points using the (thinned) parameter samples. Table 1 shows the difference in mean squared error (MSE) between the eSABRE and conjugate SABRE methods, where a negative value means that there is a lower MSE for the eSABRE method. The results of Table 1 show that the eSABRE method has a lower MSE in each case, with the corresponding p-values showing that this difference is significantly different from zero. This result, along with that of Table 1 of the main paper, show that the eSABRE method outperforms the conjugate SABRE method in terms of both out-of-sample performance and variable selection in all scenarios tested here. Table 2: Table of differences in performance between biWAIC and nWAIC for mean out-of-sample (OOS) log-likelihood and mean number of variables included in each model. The tables give the difference between the (a) mean out-of-sample log-likelihood and (b) mean number of variables included in the model when the random effects factors are selected by biWAIC and nWAIC respectively. A positive value shows (a) a higher mean OOS log-likelihood and (b) a larger number of variables included in the model for when the random effects factors are selected by biWAIC. Additionally in each table are p-values from t-tests to see whether the differences are significantly different from zero.

Additional results for the selection of random effect components
To further compare the performance in selecting random effect components, we have looked at the out-of-sample performance of the selected model when using biWAIC and nWAIC. These results are given in Table 2a, where a positive value shows a higher mean out-of-sample loglikelihood when the random effect factors are selected by biWAIC. Additionally, we have looked at the number of variables (fixed effects) selected in the model, to see whether this is affected by the choice of information criterion. These results are shown in Table 2b, where a positive values indicates that the models selected by biWAIC have more variables in them than the models selected by nWAIC.
The results in Table 2a show that there is no real difference in the out-of-sample loglikelihoods for the models selected by biWAIC and nWAIC respectively. The results show 3 scenarios where the models selected by biWAIC have higher mean out-of-sample log-likelihoods, and 6 where it is higher for nWAIC. In each case, the results were not significant at the 5% significance level.
Similarly, the results of Table 2b show no real difference in the number of variables (fixed effects) selected when biWAIC and nWAIC are used to select the models respectively. There were 4 scenarios where biWAIC had a larger mean number of variables and 5 where it was larger for nWAIC. The only case where we see a p-value less than 0.05 is where P = 55 and σ 2 ε = 0.1. However after the Bonferroni correction for multiple testing, this result is no longer significant. Figure 1 shows the observed log HI titre values against the mean predicted log HI titre values. Additionally Figure 2 shows examples of typical densities of the predictions of 4 observations. In both cases, the mean predicted log HI titre values are calculated based on the mean of