Bayesian econometric modelling of observational data for cost‐effectiveness analysis: establishing the value of negative pressure wound therapy in the healing of open surgical wounds

In the absence of evidence from randomized controlled trials on the relative effectiveness of treatments, cost‐effectiveness analyses increasingly use observational data instead. Treatment assignment is not, however, randomized, and naive estimates of the treatment effect may be biased. To deal with this bias, one may need to adjust for observed and unobserved confounders. In this work we explore and discuss the challenges of these adjustment strategies within a case‐study of negative pressure wound therapy (NPWT) for the treatment of surgical wounds healing by secondary intention. We could not demonstrate that existing uncontrolled confounding affects NPWT effectiveness, and thus there was no evidence that NPWT was cost effective compared with standard dressings for the treatment of surgical wounds healing by secondary intention.


Introduction
System level decisions regarding the use of healthcare technologies are often informed by clinical and cost-effectiveness evidence, relying crucially on evidence from randomized controlled trials that the technology improves health outcomes. Observational studies are, however, increasingly recommended to support such decisions where, after looking at randomized controlled trial evidence, significant clinical uncertainty remains-for example, the Cancer Drugs Fund in the UK (NHS England, 2019) or the 'Surveillance, epidemiology, and end results program' in the USA (National Cancer Institute, 2019)-or to provide crucial intelligence on technologies such as medical devices that face no regulatory requirement to demonstrate efficacy (European Medicines Agency, 2020) and thus often present with limited effectiveness evidence (Akhmetov and Bubnov, 2015;Henshall et al., 2011;Byron et al., 2014).
But non-randomized observational evidence means that there is no control over assignment. In this situation treatment groups may differ in aspects that are related to the outcome (prognostic variables) generating potential for confounding. There is little guidance on the analysis of observational data: it is presented either as a description of methods (Faria et al., 2015) or generic checklists or questionnaires (Berger et al., 2014;Kreif et al., 2013;Motheral et al., 2003). Where there is a good understanding of the process of selection into treatment, and this information has been collected, analyses may attempt to adjust for imbalances, e.g. using regression adjustment, propensity scores or matching. However, these methods cannot rule out the possibility of confounding on prognostic factors that have not been observed, and hence causality cannot be established. Alternative adjustment methods use instrumental variables (IVs) (Cameron and Trivedi, 2005;Greene, 2012;Wooldridge, 2013), variables that are associated with treatment assignment but not directly with health outcomes (only through treatment assignment). The use of instruments enables the identification of the effects of 'pure' changes in treatment assignment (not related to confounders), and in this way to estimate causal effects. Only a few applications have been published to date on the use of IV methods in health technology assessment (Hadley et al., 2003;Brookhart et al., 2006a, b;Schneeweiss et al., 2008;Prentice et al., 2014). This paper explores the use of an existing observational cohort study (Chetter et al., 2019) to establish the clinical and cost effectiveness of a medical device that is widely used in the National Health Service for surgical wound healing by secondary intention (SWHSI) called negative pressure wound therapy (NPWT).
Section 2 gives a description of the case-study and a summary of the observational data, together with the methods of analyses. The (causal) cost-effectiveness model is described, to provide the framework by which potential improvements in time to wound healing by NPWT impact on health-related quality of life (HRQOL) and costs. We also describe the specific regression methods that are used to determine the effectiveness of NPWT on time to wound healing, using adjustments for observed factors and IV estimation. Section 3 gives a description of the results of analyses for effectiveness, costs and HRQOL. The paper finishes with a discussion of the challenges that are encountered (Section 4) that can be generalized to the use of IVs in other health technology assessment contexts.

Description of the case-study and observational data available
NPWT is a medical device that is applied to the wound surface and creates a suction force (or vacuum) removing tissue fluid away from the treated wound area into a canister. NPWT has been claimed (Banwell and Musgrave, 2004) to speed up healing (the outcome that is most valued by patients (Cullum et al., 2015)). However, there is no conclusive randomized control trial evidence of such an effect (Berger et al., 2014) and, in light of this, the National Institute for Health and Care Excellence in 2008 did not reject NPWT but recommended further research (Leaper et al., 2008;National Institute for Health and Care Excellence, 2008). Since, however, there has been no new relevant evidence (National Institute for Health and Care Excellence, 2011a, b) and, in all this time, NPWT has been widely used by health systems with no insight into its value for money.

Summary description of the data
The largest and most comprehensive study to date in this population is a recent observational cohort study that was undertaken in the UK (Chetter et al., 2019). This study was longitudinal, conducted over a 21-month period (from February 18th, 2013, to November 30th, 2014 in eight study sites across Yorkshire, UK. Participants were recruited with incident surgical wounds requiring treatment (the inclusion and exclusion criteria have been reported elsewhere (Chetter et al., 2019)) and were aimed to be followed up for 12 months or more.
The study recruited 393 participants and followed them up for 499 days, on average. Out of the total 393 participants, 22 died (5.6%) and 320 (81%) healed within the study. The mean time to healing of those who healed was 99 days. A full list of summary characteristics can be found in the on-line supplementary material (appendix 1, Table A1.1). 115 participants (29%) were treated with NPWT at some point during follow-up. Of those receiving NPWT only 69% (79 participants) healed within the study compared with 87% healed in those people who never received NPWT. Patients who received NPWT took longer to heal (a median time to healing of 175 days for NPWT versus 63 days without)-as highlighted in the Kaplan-Meier curve in Fig. 1. Given the observational nature of the study, imbalances in important prognostic variables (observed or unobserved) can justify the worse outcomes. The following subsection examines observed imbalances.

Imbalances and overlap between the groups
The majority of NPWT patients had a large (bigger than 25 cm 2 ) wound compared with only 12% of non-NPWT patients and were more likely to have experienced skin and subcutaneous tissue loss whereas patients who did not receive NPWT had wounds involving skin only. These and other identified differences demonstrate that there is the potential for selection bias in the cohort data.
For a set of baseline covariates, Fig. 2(a) shows the mean differences between groups (NPWT minus non-NPWT), standardized (by dividing the differences by the standard deviation of the whole sample). Although there may be differences between the groups across a multitude of factors, none of the covariates analysed shows a difference bigger than 1 standard deviation across groups. The distribution of values for each covariate was also compared across groups to assess whether there were regions of non-overlap (an example plot is shown in Fig. 2(b) for the wound area). We verified that there was always some overlap, which indicates that the mechanism of selection of patients into the treatment groups is not clearly determined.
2.1.3. Participants for whom an event (healing) was not observed (hard to heal) There were 73 patients who did not heal within the study, introducing censoring in the data. These patients were followed up on average for 416 days which, compared with average time to healing in those that healed of 99 days, suggests that these are 'hard-to-heal' patients. Also, these patients differed in their baseline characteristics from the patients who healed; for example, there were 23% more diabetics and 24% more patients with larger wounds, among others. A full list of summary characteristics by healing status can be found in appendix 1, Table A1.2, in the on-line supplementary material. This implies that censoring cannot be assumed uninformative which needs to be acknowledged in the methods of analyses (see Section 2.4).

Causal model for determining cost effectiveness
The outcome of analyses in determining cost effectiveness was the QALY: quality-adjusted life years gained (National Institute for Health and Care Excellence, 2013). QALYs weight lifetime lived for the HRQOL that the years are lived in. A 1-year time horizon was used and the UK National Health Service perspective was adopted for costs. To establish cost effectiveness it is important that the costs and HRQOL are based on the adjusted causal effect of NPWT on time to healing, Δt, and not the observed, unadjusted, effect. The cost-effectiveness model (equations (1)-(4)) computes the cost and HRQOL implications of treatment with NPWT. Equation (1) details that differences in QALY, ΔQ, are evaluated from Δt and from u h and u unh , the HRQOL weights of healed and unhealed patients respectively: The difference in costs, ΔC, is broken down into disease-related costs, ΔC dis , and the costs of the treatments themselves, ΔC t (equation (2)). ΔC dis are determined by applying Δt to the difference in the costs of unhealed and healed patients, c unh and c h respectively (equation (3)). ΔC t (equation (4)) assumes that patients receive some form of treatment up to healing. The treatment costs of the 'non-NPWT group' is simply the average cost of dressings, c 0 , multiplied by Δt. For patients receiving NPWT, the average daily cost of NPWT, c 1 , is applied to the mean time on NPWT treatment, TOT, and in the remaining time that patients are assumed to use dressings (with a daily cost equal to c 0 ; see equation (4)).
The description of the model above identifies evidence requirements, and the subsections ahead detail the estimation of the causal effect of NPWT on time to healing and inferences on  Age, age in years; Tobacco, tobacco use, smoker; Gender, male (versus female); npwt, patients on NPWT; no npwt, patients on treatments other than NPWT the effects of healing on HRQOL weights and costs. The remaining parameters (ToT, c 0 and c 1 ) were informed by the means observed in the data set.

Implementation and software
A Bayesian approach was used throughout to allow for the uncertainty in predicting treatment allocation to be fully considered and thus correctly reflected in decision uncertainty. Analyses were implemented in R (R Development Core Team, 2013) and inferences in WinBugs (Lunn et al., 2000). Model selection was based on the deviance information criterion (DIC) (Spiegelhalter et al., 2002;Gelman, 2014;Lunn, 2013), which reflects a trade-off between model fit and model complexity. DIC differences of 5 were considered meaningful (Spiegelhalter et al., 2002(Spiegelhalter et al., , 2014. Econometric tests of the relevance and validity of instruments are not developed in the Bayesian context, and frequentist methods were thus implemented in Stata (StataCorp, 2013). Decision uncertainty was reported by using the probability that each intervention was cost effective.

Inferences on time to healing
In cost effectiveness, where expected values are of interest, time-to-event outcomes are traditionally analysed by using parametric survival analysis. In this case-study, standard parametric analyses would struggle to provide a good fit because of the asymptote that is observed in Fig. 1, and more complex, and therefore flexible, functions (such as splines) would need to be used. However, these are not readily available in software packages that are used for Bayesian analyses (such as WinBugs (Lunn, 2013;Lunn et al., 2000)). For this reason, we took a different approach. We first generated a complete data set, enabling the use of linear regression in which IV methods are well established. We acknowledge that linear regression assumes that times to event (conditional on the explanatory variables) are normally distributed, which is unlikely because of the right skewness of these data. We therefore extend the analyses to consider more explicitly the hard-to-heal patients, who account for some degree of skewness.

Using a complete data set
To generate a complete data set, censored times to healing were imputed by assuming that patients healed the day after they were censored. This will closely reflect the data but, as it is unlikely that all individuals healed so close to censoring, it will systematically underestimate healing times. Time to healing (the index i for patients is omitted from the text throughout), y, was then regressed on treatment, x, and a set of relevant adjustment factors that have been observed, A (the design matrix). The model that was implemented is described in equation (5): for i = 1, : : : , n: .5/ When IVs are considered, a two-stage regression is needed (equation (6)). The first stage predicts treatment allocation conditional on the instrument(s), z, and a set of relevant covariates, B. The predicted probabilities p Å are then used in a second-stage regression of time to healing that conditions on the same set of covariates, B (the design matrix): stage 1, .6a/ stage 2, We also explored treatment effect interactions (equations (7)). An interaction term will be endogenous (i.e. subject to confounding) in describing health outcome, even if the covariate is not itself endogenous. This means that we need to define two first-stage regressions (stage 1A and stage 1B) and, therefore, two valid instruments, one for each endogenous term, the treatment variable x and the treatment interaction variable x · w. However, if z is a valid (single) instrument for the treatment variable and if the interaction variable w is exogenous then z · w is a valid instrument for the interaction term (Mullahy, 1994(Mullahy, , 1997Bun and Harrison, 2014). The set of associated regressions when evaluating the treatment effect are then as follows (where C is the design matrix): stage 1A, .7a/ stage 1B, .7b/ stage 2, .7c/ Standard non-informative normal priors were used (not shown).

Selection of variables for adjustment
By definition, relevant adjustment covariates are those that are associated with outcomes and are imbalanced across treatment groups, i.e. the intersection of covariates that are associated with outcomes and with treatment allocation. Given the absence of good epidemiological information in this area, and the concern that relevant predictors of healing cannot be identified on our sample because of the potential imbalances in treatment assignment, instead of using the intersection of the covariates that are associated with outcomes and treatment allocation we used their union. Although precision may be lost by the inclusion of potentially irrelevant covariates, accuracy may be gained by not missing important confounding factors. A logistic regression with treatment received as the dependent variable was used to identify factors that are associated with treatment assignment. A thorough model selection process was devised. We started by regressing all available baseline covariates individually and selected those leading to differences of 5 units in the DIC (Lunn, 2013) in relation to the null model. We then evaluated models by using combinations of these covariates (2 by 2, 3 by 3, etc.) and selected the covariate set in the model with the best DIC.
To identify determinants of healing and potential treatment effect modifiers, we looked at the relationship of available covariates with healing by using the covariate selection process that was described above. To avoid potential confounding, we did this on participants who used NPWT and, separately, on those who did not use NPWT. To accommodate potential treatment effect modifiers we fitted treatment-by-covariate models to the full data set by using all adjustment covariates identified.

Identification of instrumental variables z
A critical step in IV regression is the identification of instrument(s) z. On the basis of previous applied examples in healthcare (Brookhart et al., 2006a, b), the project team defined a priori potential instruments which related to clinicians' practice patterns and stated preferences on the use of NPWT. The expectation was that a patient is more likely to be treated with NPWT if the clinician has a preference for it, and it was assumed that preference for NPWT is not associated with time to healing except through the choice of treatment. Specifically, we considered as a potential instrument a covariate describing whether the health professional had prescribed NPWT to their previous patient. We also surveyed the health professionals on the cohort study regarding their stated preferences for NPWT and relevant proxies (see the on-line appendix 3). Health professionals who thought that NPWT was better at managing these wounds, were less expensive or represented better value for money were assumed to be more likely to use NPWT.
Formal tests were conducted to assess the performance of each instrument (appendix 4 in the on-line supplementary material), namely for evidence of endogeneity (the Durbin-Wu-Hausman test (Greene, 2012)), evidence of the instrument set not being valid (the Hansen-Sargan test (Sargan, 1958;Hansen, 1982)), evidence of a weak instrument set (the Stock and Yogo procedure (Andrews et al., 2005)) and finally evidence of model misspecification (the Pesaran-Taylor reset test (Pesaran and Taylor, 1999)).

Extended modelling approach to consider more explicitly the 'hard-to-heal' subpopulation
In this section we broaden the analyses to ignore the imputation and instead model the cohort data in two parts: the first part describes the probability of healing within follow-up, p h , and the second the expected time to healing for those who healed, x Å . The first part used logistic regression and the second part linear regression. Determinants of the probability of healing could differ from determinants of time to healing. We implemented adjustment on observables first (analogously to that described earlier) and then added the adjustment on unobservables by using IV regression. The implementation of the IV regression is more complex as a treatment effect is included in each equation and thus IVs are required for both.
As shown in equations (8), the predicted values for the probability of treatment allocation, p h , are used in the two stage 2 regressions (where D is the design matrix): stage 1, The mean time to healing was evaluated from the stage 2 equations and from expert opinion on the expected time to healing for censored observations. This was elicited according to best practice (Kahneman et al., 1982;O'Hagan, 2006;Soares et al., 2018;Sullivan and Payne, 2011) by asking three health professionals (who were involved in the cohort study) about their beliefs on the additional time to 50% and 90% of patients healed (the wording is reproduced in appendix 5 in the on-line supplementary material). The elicited quantities from each expert were used to derive the parameters of a gamma distribution (a and b) and the logarithm of the gamma parameter values were then pooled by using a standard multivariate normal formulation.

Inferences on health-related quality-of-life weights and costs
Given that the causal cost-effectiveness model assumes that NPWT affects overall HRQOL and costs only via effects on healing, inferences here aim to quantify the difference in HRQOL and costs between the unhealed and healed, i.e. the difference in the group means. Hence, models that are implemented here are not adjusted for further covariates, but these are implicitly averaged over so that the results can be considered to be representative of the population that is included in the cohort study.
Observations of EuroQol 5 dimensions, EQ-5D, that were collected within the cohort study at baseline and every 3 months thereafter up to 18 months follow-up were used. The EQ-5D index score was computed by using a published valuation algorithm (Brooks and De Charro, 1996). A panel data approach (a mixed model with random intercept) to capture both withinand across-patient variation was used to model change in the index score from baseline, using a time-dependent indicator of whether the patient had healed or not as the only covariate. Exchangeability across time within individuals was assumed. Flat uniform priors were used for variance parameters, and others used vague priors (which are not shown here). The estimated effect of healing on HRQOL was used to inform directly the term u h − u unh in the causal costeffectiveness model in equation (1).
As with EQ-5D, resource use was collected within the cohort study at baseline and at 3-monthly intervals, and included general practitioner visits, nurse visits, outpatient visits and hospital admissions. Costs per patient were obtained by multiplying resource use by relevant unit costs (2014 prices). Unit costs were obtained from the Personal Social Services Research Unit, 2014 (Curtis, 2014), and National Health Service reference costs 2013-2014 (Department of Health, 2014). Treatment costs were obtained from the British National Formulary (2014) as well as specific costing information from the two centres (Hull and Leeds) on NPWT. A detailed listing of the specific values of unit costs is shown in appendix 2, Table A2.1, in the on-line supplementary material.
Many observations over the different time points were expected to have zero costs, and hence a Bayesian two-part mixed model was used: the first part captured the probability of zero costs and the second part the costs of those with non-zero values. A mixed gamma model with loglink was used because of data skewness. Note that costs for patients healed are not necessarily expected to be zero as time to healing collected within the cohort study refers to an index wound and the patient may have other wounds.

Using the complete data set
Wounds were more likely to be treated with NPWT if the wound was larger, if there was more tissue involvement and if the patient was an inpatient (the results are not shown here). Within at least one treatment group, wound history ('yes' versus 'no' to having had an SWHSI before) and treatment location (inpatient versus outpatient) were associated with healing. Of these, wound history was identified as a treatment effect modifier. Effectiveness results under the complete, imputed, time to healing data are summarized in Table 1: model 0 is unadjusted (presented for completeness), model 1 adjusts for observables but excludes the interaction term and model 2 includes wound history as an interaction term. The results show that, after adjustment, patients using NPWT are still expected to take longer to heal than those who do not use NPWT. Model 1 estimates an increase in time to healing of 73.2 days: a statistically significant result. Model 2 estimates that patients with a history of SWHSI take 181 days longer to heal with NPWT, whereas patients without a history of SWHSI took 42 days longer to heal with NPWT.
Regarding the IV approach, previous treatment used by the treating health professional was the variable that performed best (see appendix 4 in the on-line supplementary material). Using this instrument, however, still led to higher time to healing estimates (but not statistically significant) for patients using NPWT- Table 1, models 3 and 4.

Extended analysis
The determinants of the probability of healing (stage 2A, equation (8)) identified were the body mass index BMI and wound location. For time to wound healing, determinants identified were surgery type (colorectal versus other surgeries), wound duration and diabetic foot wounds (stage 2B, equation (8c)). No evidence of relevant interaction terms was found. Observations 393 354 †x 1 , x 2 , x 3 , v 1 , v 2 , v 4 and v 5 are adjustment covariates; w is the covariate interacted with treatment; t is the treatment covariate; w · t is the treatment interaction; z is the instrument.
The extended model predicted that NPWT was associated with poorer healing outcomes (46 expected additional days to wound healing compared with the non-NPWT patients), using previous treatment used by the health professional as the instrument (Table 2, model b1). This result was not statistically significant.

Modelling of health-related quality-of-life weights and costs
There were 1484 observations of the EQ-5D index score over all time points and 1166 observations of disease-related costs (614, or 53.7%, had a zero value of costs). A descriptive summary of results is shown in appendix 2, Table A2.2, of the on-line supplementary material. The formal modelling of the EQ-5D index score (Table 3) shows that the effect of healing is small (an increase of 0.055 in the index score for those who healed compared with those who did not) but statistically significant (the credible interval does not include zero).
The modelling of costs shows that, overall, expected costs for unhealed and healed patients were respectively £1124.4 and £258.9 per quarter (Table 3)-a difference of £865 attributed to healing (the standard error over the difference was £111.9). Further details can be found in appendix 2, Tables A2.3 and A2.4, in the on-line supplementary material. The estimated patient time on NPWT at any stage was 36.9 days on average (standard error SE = 6:0). The average per-patient cost of NPWT treatment is thus evaluated at £1180.0 (SE = £189:2/. The responses to the elicitation exercise highlight that experts believe that up to 10% of these hard-to-heal patients may be unhealed 2-10 years after having exited the cohort study (Table 4). The pooled gamma distribution across the three experts determines a median additional time to healing of 1038.5 days. In the cost-effectiveness model, time to healing was replaced by the patient's life expectancy when patients were predicted to heal after their life expectancy. Considering the gender split in our sample (56.4% were male), the life expectancy for the sample is 83.9 years of age (Office for National Statistics, 2014).
Cost-effectiveness results, for the complete data set analyses, indicate that NPWT is less effective and more costly, and thus would not be recommended for use in SWHSI patients- Table 5. The results show no decision uncertainty, indicating that the confidence in the decision, based on the data and assumptions of analyses, is high and that further research is unlikely to change this recommendation. The extended analyses indicated that for average patient characteristics (BMI = 29; diabetes = 26%; cardiovascular or peripheral disease = 43%; smoker = 29%; wound area bigger than 25 cm 2 = 24%; emergency surgery = 64%; location other than abdomen = 66%; open wound planned = 41%; history of SWHSI = 25%) those treated with NPWT would be expected to heal in 251 days whereas those not treated with NPWT would be expected to heal in 137 days. The results indicate that NPWT is not cost effective for the treatment of SWHSI and, again, there is very little uncertainty over this result (Table 5, model b2).

Discussion
Recent policy developments, in the UK and elsewhere, are increasingly establishing the use of observational data to support decisions about the use of healthcare technologies, particularly where significant clinical uncertainties remain. This paper uses an applied example where data from an observational cohort study are used to evaluate the clinical and cost-effectiveness of NPWT in the healing of SWHSI. We explored different ways of adjusting for confounding and implemented adjustment for observables (using regression) and unobservables (using IV regression). But the different modelling approaches produced similar results, all suggesting that NPWT is neither effective nor cost effective compared with standard dressings. The IV approach reduced the mean additional days to heal for NPWT patients compared with other treatments from 112 (95% credible interval [94,141]) (unadjusted estimate) to 57 days with a wide credible interval (95% credible interval [−72, 193]), potentially better reflecting the existing uncertainty surrounding the effectiveness of NPWT.
Our analysis identified many challenges to the analysis of observational data that have not been hitherto considered in guidance. A first relates to core definitions, such as of outcomes that are affected by treatment or the definition of treatments themselves. Our analyses focused on time to wound healing: the outcome that is considered clinically most relevant (Cullum et al., 2015;Dumville et al., 2015). However, treatment may also affect other outcomes-it may reduce the rate of wound infection or facilitate wound management-and failing to identify the relevant outcomes of treatment can misinform recommendations. In what concerns the definition of the treatments, given the limited sample size our study pooled patients who received NPWT at baseline and patients who received NPWT at a later stage (89 and 26 respectively). However, patients receiving NPWT at inception can be argued to differ (and to have a different prognosis) from those who received NPWT during follow-up. There may also be a potential bias from the presence of time-dependent confounding, as patients who start NPWT after baseline are likely to have been switched to NPWT as a result of previous treatment failure. In addition, we have merged the different dressings that were used within the cohort into a single comparator (the 'non-NPWT' group). This intervention 'lumping' may be a source of further heterogeneity. All analyses of observational data are likely to have to make similar decisions on practical grounds. Within observational data sets, heterogeneity is expected to be more pronounced than in randomized controlled trials that define clear inclusion and exclusion criteria, determine the intervention and how it is applied and often determine follow-on care. In our case-study, heterogeneity in the patients who were recruited was extensive, e.g. in what concerns the location of the wound, the surgery that led to the wound, its size at baseline and the setting of care. However, the small sample size .n = 393/ was insufficient to explore the consequences of existing heterogeneity, i.e. to explore the implications of treatment within potentially relevant subgroups of the population (such as open abdominal wounds versus diabetic foot and leg wounds).
A second aspect relates to the potential for the data set to show a lack of overlap, where regions of the covariate space have relatively few treated or control units. If there is selection into treatment, a lack of overlap (to a bigger or lesser extent) is expected. Inferences that are made for such regions rely on extrapolation (Imbens and Rubin, 2015), with results being more dependent on model specification and less on direct support from the data (Gelman and Hill, 2007). This is relevant for any form of adjustment, whether just on observed characteristics and/or on unobserved characteristics through IV regression. Although some overlap is required to enable meaningful analyses (Faria et al., 2015b) there is no clear guidance on how to establish when overlap is a significant problem. Further research is needed to understand how alternative methods (e.g. regression adjustment, matching or inverse probability weighting) could capitalize on the underlying overlap issue (Gelman and Hill, 2007). Our case-study highlighted that, in practice, it may even be difficult to identify the sources of lack of overlap, especially when no single covariate arises as meaningful.
A third aspect relates to the adjustment on observables. The fact that health outcomes and their determinants were not known a priori in our case-study, and that the mechanism for selection was also not known, meant that it was difficult to identify the appropriate adjustment set. In terms of covariate selection, we opted for an information theoretic approach and considered models with different combinations of covariates to identify the best fitting one according to the DIC. Because these methods are computationally intensive (even for a relatively small number of covariates), within this study we developed a thorough and protocol-driven variable-selection process that can be generalized and used in any other application. There are various alternative approaches to variable selection in the literature, and further research could explore these in the context of this case-study. We highlight the Bayesian spike and slab, that defines mixture priors for the covariate effects (Bayarri et al., 2012;O'Hara and Sillanpää, 2009) and hence integrates uncertainty over covariate selection in the model estimates. In terms of methods of adjustment for observables, the use of an adjustment regression approach facilitated the further application of the IV method to address unobserved confounding. Although there is a range of alternative methods of adjustment, such as matching approaches, their relative performance is as yet unclear (Faria et al., 2015;Kreif et al., 2013), particularly with small sample sizes and poor overlap.
Another set of challenges relates to the application of IV techniques to control for unobserved confounding. Such methods depend on finding good and reliable, but difficult to validate, sources of exogenous variation (instruments) (Faria et al., 2015). More research is needed in defining instruments that are relevant in the context of health technology assessment, and on extending analyses to outcomes such as time-to-event data.
Also, in our case-study we used a Bayesian context for inferences, which we believe has clear advantages. It enables flexibility in specifying the model according to what is believed to be plausible without being limited to linearity or normality assumptions. Although in this work we used linearity and normality to describe time to healing, an important extension for future work is either to consider a log-normal transformation or to use time-to-event distributions and to explore potential non-linearity. Another important advantage is where the sample is small or the instruments weakly related to treatment assignment, in which case Bayesian methods are thought to return more accurate estimates of causal effects (Crespo-Tenorio and Montgomery, 2013). This is crucial, as in this area of research it is likely that the sample size is limited. However, Bayesian research on IV models is limited and recent, relative to the broad literature from the classical (i.e. frequentist) statistical perspective. An informal search (without date restriction) identified only 24 methodological papers focusing on aspects of Bayesian IV regression-one provides an overview (Kleibergen and Zivot, 2003), 16 develop Bayesian estimators for particular model specifications (Hollenbach et al., 2019;Kato and Hoshino, 2018;Shi and Tong, 2017;Li and Lu, 2015;Lopes and Polson, 2014;Crespo-Tenorio and Montgomery, 2013;Kato, 2013;Zellner et al., 2014;Imbens, 1996, 2003;Wiesenfarth et al., 2014;Kobayashi and Ogasawara, 2016;National Cancer Registration and Analysis Service, 2016;Conley et al., 2008Conley et al., , 2012Burgess et al., 2010), one focuses on variable selection (Sabnis et al., 2019) and six consider aspects of the validity of instruments (Kraay, 2012;Koop et al., 2012;Eicher et al., 2009;Lenkoski et al., 2014;Karl and Lenkoski, 2018;Henry et al., 2018). It is important to highlight that none provided generic software code of the model or for testing the validity of instruments.
Future methodological research is needed to help to guide the many practical challenges that are faced. This is important for analysts, but crucial for policy as it may improve the confidence that decision makers place in the results of such analyses. Even with our case-study indicating that NPWT should not be recommended for the healing of SWHSI and that further research is unlikely to change this recommendation, the National Institute for Health Research in the UK funded a follow-on pilot trial (which is likely to report in 2022). Such a lack of confidence in analyses of observational data may become a problem when these types of analyses are to be used more widely to inform system level decisions such as those required within the Cancer Drugs Fund (Grieve et al., 2016).

Ethics approval and consent to participate
The cohort study received research ethics (Research Ethics Committee reference: 11/YH/0313) and written informed consent was obtained from all participants before participation.
Before beginning the cohort study, approval was obtained from Yorkshire and Humber-Humber Bridge Research Ethics Committee (reference: 12/YH/0350) and from the associated National Health Service Trusts.
Before study activity began, approval was obtained from Yorkshire and Humber-Leeds East Research Ethics Committee (reference: 15-YH-0307) and from the associated National Health Service Trusts.

Availability of data and materials
The data sets that were used and/or analysed during the current study are available from the corresponding author on reasonable request.

Competing interests
Professor Claxton reports personal fees from F. Hoffmann-La Roche Ltd, personal fees from Heron (PAREXEL), outside the submitted work. Dr Welton reports grants from the National Institute for Health Research during the conduct of the study; in particular, she was supported by the National Institute for Health Research Biomedical Research Centre at University Hospitals Bristol NHS Foundation Trust and the University of Bristol; she is also Principal Investigator for a research grant jointly funded by the Medical Research Council and Pfizer.
The views expressed in this publication are those of the author(s) and not necessarily those of the National Health Service, the National Institute for Health Research or the Department of Health and Social Care.
All the other authors have no conflicts of interest to declare.

Authors' contributions
Dr Pedro Saramago (Research Fellow, Health Economics) undertook all the analyses and contributed to the design, conduct, interpretation and reporting. Professor Karl Claxton (Professor, Economics) oversaw all aspects of work, including design, conduct, analysis, interpretation and reporting. Professor Nicky Welton (Professor, Statistics and Health Economic Modelling) oversaw the design, interpretation and reporting of the analyses. Dr Marta O. Soares (Senior Research Fellow, Health Economics) supervised all aspects of the work, including design, conduct, analysis, interpretation and reporting.