The reproducibility crisis in science: A statistical counterattack
Abstract
More people have more access to data than ever before. But a comparative lack of analytical skills has resulted in scientific findings that are neither replicable nor reproducible. It is time to invest in statistics education, says Roger Peng
Over the last two decades, the price of collecting a unit of data has dropped dramatically. New technologies touching every aspect of our lives – from our finances, to our health, to our social interactions – have made data collection cheap and easy. In 1967 Stanley Milgram did an experiment (bit.ly/1PWzLDy) to determine the number of degrees of separation between two people in the USA. In his experiment he sent 296 letters to people in Omaha, Nebraska, and Wichita, Kansas, and the goal was to get the letters to a specific person in Boston, Massachusetts. His experiment gave us the notion of “six degrees of separation”. A 2007 study (bit.ly/1PWA2q8) updated that number to “seven degrees of separation” – except the newer study was based on 30 billion instant messaging conversations collected over 30 days.
This example illustrates a growing problem in science today: collecting data is becoming too much fun for everyone. Developing instruments, devices, and machines for generating data is fascinating, particularly in areas where little or no data previously existed. Our phones, watches, and eyeglasses all collect data. Because collecting data has become so cheap and easy, almost anyone can do it. As a result, we are all statisticians now, whether we like it or not (and judging by the looks of some of my students, many do not). All of us are regularly confronted with the problem of how to make sense of the deluge of data. Data follow us everywhere and analysing them has become essential for all kinds of decision‐making. Yet, while our ability to generate data has grown dramatically, our ability to understand them has not developed at the same rate.
Making research reproducible
There are two major components to a reproducible study: that the raw data from the experiment are available; and that the statistical code and documentation to reproduce the analysis are also available. These requirements point to some of the problems at the heart of the reproducibility crisis.
First, there has been a shortage of software to reproducibly perform and communicate data analyses. Recently, there have been significant efforts to address this problem and tools such as knitr, iPython notebooks, LONI, and Galaxy have made serious progress.
Second, data from publications have not always been available for inspection and reanalysis. Substantial efforts are under way to encourage the disclosure of data in publications and to build infrastructure to support such disclosure. Recent cultural shifts in genomics and other areas have led to journals requiring data availability as a condition for publication and to centralised databases such as the US National Center for Biotechnology Information's Gene Expression Omnibus (GEO) being created for depositing data generated by publicly funded scientific experiments.
One might question whether reproducibility is a useful standard. Indeed, one can program gibberish and have it be perfectly reproducible. However, in investigations where computation plays a large part in deriving the findings, reproducibility is important because it is essentially the only thing an investigator can guarantee about a study. Replicability cannot be guaranteed – that question will ultimately be settled by other independent investigators who conduct their own studies and arrive at similar findings. Furthermore, many computational investigations are difficult to describe in traditional journal papers, and the only way to uncover what an investigator did is to look at the computer code and apply it to the data. In a time where data sets and computational analyses are growing in complexity, the need for reproducibility is similarly growing.
One result of this is an epidemic of poor data analysis, which is contributing to a crisis of replicability and reproducibility of scientific results. Replication is the cornerstone of scientific research, with consistent findings from independent investigators the primary means by which scientific evidence accumulates for or against a hypothesis. The replicability of a study is related to the chance that an independent experiment targeting the same scientific question will produce a result consistent with the original study. Recently, a variation of this concept, referred to as reproducibility, has emerged as a key minimum acceptable standard, especially for heavily computational research. Reproducibility is defined as the ability to recompute data analytic results, given an observed data set and knowledge of the data analysis pipeline. Replicability and reproducibility are two foundational characteristics of a successful scientific research enterprise.
Public failings
Yet there is increasing concern in the scientific community about the rate at which published studies are either reproducible or replicable. This concern gained significant traction with a statistical argument that suggested most published scientific results may be false positives (bit.ly/1PWAhBx). Concurrently, there have been some very public failings of reproducibility across a range of disciplines, from cancer genomics (bit.ly/1PWAC7a), to clinical medicine (bit.ly/1KNc4u6) and economics (bit.ly/1PWBngz) and the data for many publications have not been made publicly available, raising doubts about the quality of data analyses. Compounding these problems is the lack of widely available and user‐friendly tools for conducting reproducible research.
Perhaps the most infamous recent example of a lack of replicability comes from Duke University, where in 2006 a group of researchers led by Anil Potti published a paper claiming that they had built an algorithm using genomic microarray data that predicted which cancer patients would respond to chemotherapy.1 This paper drew immediate attention, with many independent investigators attempting to reproduce its results. Because the data were publicly available, two statisticians at MD Anderson Cancer Center, Keith Baggerly and Kevin Coombes, obtained the data and attempted to apply Potti et al.'s algorithms.2 What they found instead was a morass of poorly conducted data analyses, with errors ranging from trivial and strange to devastating. Ultimately, Baggerly and Coombes were able to reproduce the (erroneous) analysis conducted by Potti et al., but by then the damage was done. It was not until 2011 that the original study was retracted from Nature Medicine.
Another recent example comes from the world of economics, where an influential paper published by Carmen Reinhart and Kenneth Rogoff suggested that countries with very high debt–GDP ratios suffer from low growth.3 In fact, they suggested that there was a “threshold” at 90% debt–GDP ratio above which there was a drop in economic growth. Tomas Herndon, a graduate student in economics, obtained the data from Reinhart and Rogoff and eventually reproduced their analysis.4 In the process of reproducing the analysis, however, he found numerous errors. One often‐quoted error was a mistake in a Microsoft Excel spreadsheet that lead to a few countries accidentally being left out of the analysis. However, a much more serious issue was an unusual form of data weighting that produced the “threshold” effect. Herndon et al. found that using a more standard weighting led to a smoother relationship between debt–GDP ratio and growth. Ultimately, the research on which much economic policy was based – most notably arguments in favour of economic austerity – suffered from serious but easily identifiable flaws in the data analysis.
So what went wrong with each of these studies? Clearly, many things – but reproducibility was arguably not the problem in either case. It was precisely because the analyses were reproducible that Baggerly and Coombes and Herndon et al. were able to identify so many errors (see box, “Making research reproducible”). Ultimately, the problem was the poor or questionable quality of the original analysis. The errors that were made showed a lack of judgement, training, or quality control. One then has to ask how these disasters could have been prevented.
Building trust
In order to improve the quality of science I believe we need to go beyond calling for mere reproducibility. The key question we want to answer when seeing the results of any scientific study is whether we can trust the data analysis. If we think of problematic data analysis as a disease, reproducibility speeds diagnosis and treatment in the form of screening and rejection of poor data analyses by journal referees, editors, and other scientists in the community. Once a poor data analysis is discovered, it can be “treated” in various ways.
This current “medication” approach to maintaining research quality relies on peer reviewers and editors to make a diagnosis consistently. This is a tall order. Editors and peer reviewers at medical and scientific journals often lack the training and time to perform a proper evaluation of a data analysis. This problem is compounded by the fact that data sets and data analyses are becoming increasingly complex, the rate of submission to journals continues to increase (bit.ly/1PWBxVm), and the demands on statisticians to referee are increasing (bit.ly/1PWC5um). These pressures have reduced the efficacy of peer review in identifying and correcting potential false discoveries in the medical literature. And, crucially, the medication approach only addresses the problem of poor data analysis after the work has been done.
Increasing data analytic literacy comes at a potential cost. Individuals might develop the skills to perform data analysis without the knowledge to prevent mistakes
If we could prevent problematic data analyses from being conducted, we could substantially reduce the burden on the community of having to evaluate an increasingly heterogeneous and complex population of studies and research findings. To prevent poor data analysis in the scientific literature we need to increase the number of trained data analysts in the scientific community, and to identify statistical software and tools that can be demonstrated to improve reproducibility and replicability of studies and be moderately robust to user error. The US National Institutes of Health has identified data science education as a priority by issuing requests for applications for training materials, courses, and other educational initiatives focused on reproducibility. Increasing data analytic literacy has the chance of increasing the probability that any given scientific data analysis will be sensible and correct. If this is successful it will reduce the burden of detecting poor data analyses through the overtaxed peer review system and will increase the pool of trained editors and referees in the peer review process.
Education at scale
How can we dramatically scale up data science education in the short term? One example is the approach we have taken at the Johns Hopkins Bloomberg School of Public Health, where we were one of the earliest participants in the massive online open course phenomenon. Inspired by the huge demand for statistical and data science knowledge, my colleagues Jeffrey Leek, Brian Caffo, and I built the Johns Hopkins Data Science Specialization (bit.ly/1PWBZms), a sequence of nine courses covering the full spectrum of data science skills from formulating quantitative questions, to cleaning data, to statistical analysis and producing reproducible reports.
But simply increasing data analytic literacy comes at a cost. Most scientists in programmes like ours will receive basic to moderate training in data analysis, creating the potential for generating individuals with enough skill to perform data analysis but without enough knowledge to prevent data analysis mistakes.
Therefore, to improve the global robustness of scientific data analysis, we must take a two‐pronged approach and couple massive‐scale education efforts with the identification of data analytic strategies that are reproducible and replicable in the hands of basic or intermediate data analysts. It is critical that we make a coordinated effort to identify statistical software and standardised data analysis protocols that are shown to increase reproducibility and replicability in the hands of people with only basic training.
It is also critical that statisticians bring to bear their history of developing rigorous methods to the area of data science. One fundamental component of scaling up data science education is performing empirical studies to identify statistical methods, analysis plans, and software that lead to increased replicability and reproducibility in the hands of users with basic knowledge. We call this approach “evidence‐based data analysis”. Just as evidence‐based medicine applies the scientific method to the practice of medicine, evidence‐based data analysis applies the scientific method to the practice of data analysis. Combining massive scale education with evidence‐based data analysis can allow us to quickly test data analytic practices (bit.ly/1PWCdtQ) in a population most at risk for data analytic mistakes.
In much the same way that the epidemiologist John Snow helped end a London cholera epidemic by convincing officials to remove the handle of an infected water pump, we have an opportunity to attack the crisis of scientific reproducibility at its source. Dramatic increases in data science education, coupled with robust evidence‐based data analysis practices, have the potential to prevent problems with reproducibility and replication before they can cause permanent damage to the credibility of science.
References
Citing Literature
Number of times cited according to CrossRef: 76
- Peter M. B. Cahusac, Problems with Values, Evidence‐Based Statistics, 10.1002/9781119549833, (223-230), (2020).
- Shelby Rauh, Trevor Torgerson, Austin L. Johnson, Jonathan Pollard, Daniel Tritz, Matt Vassar, Reproducible and transparent research practices in published neurology research, Research Integrity and Peer Review, 10.1186/s41073-020-0091-5, 5, 1, (2020).
- Florian Keusch, Frauke Kreuter, Zukunft der Aus- und Weiterbildung in der Markt- und Sozialforschung, Marktforschung für die Smart Data World, 10.1007/978-3-658-28664-4, (3-25), (2020).
- Stephan Guttinger, The limits of replicability, European Journal for Philosophy of Science, 10.1007/s13194-019-0269-1, 10, 2, (2020).
- Federico Bonofiglio, Martin Schumacher, Harald Binder, Recovery of original individual person data (IPD) inferences from empirical IPD summaries only: Applications to distributed computing under disclosure constraints, Statistics in Medicine, 10.1002/sim.8470, 39, 8, (1183-1198), (2020).
- Álvaro Corral, Frederic Udina, Elsa Arcaute, Truncated lognormal distributions and scaling in the size of naturally defined population clusters, Physical Review E, 10.1103/PhysRevE.101.042312, 101, 4, (2020).
- Loic Desquilbet, Enhancing Clinical Decision-Making: Challenges of making decisions on the basis of significant statistical associations, Journal of the American Veterinary Medical Association, 10.2460/javma.256.2.187, 256, 2, (187-193), (2020).
- Robin A. Richardson, David W. Wright, Wouter Edeling, Vytautas Jancauskas, Jalal Lakhlili, Peter V. Coveney, EasyVVUQ: A Library for Verification, Validation and Uncertainty Quantification in High Performance Computing, Journal of Open Research Software, 10.5334/jors.303, 8, (2020).
- Patricia Kabitzke, Diana Morales, Dansha He, Kimberly Cox, Jane Sutphen, Lucinda Thiede, Emily Sabath, Taleen Hanania, Barbara Biemans, Daniela Brunner, Mouse model systems of autism spectrum disorder: Replicability and informatics signature, Genes, Brain and Behavior, 10.1111/gbb.12676, 19, 7, (2020).
- Esther Kaufmann, How Accurately Do Teachers’ Judge Students? Re-Analysis of Meta-Analysis, Contemporary Educational Psychology, 10.1016/j.cedpsych.2020.101902, (101902), (2020).
- Ronald L. Wasserstein, Nicole A. Lazar, ASA Statement on Statistical Significance and p-Values, The Theory of Statistics in Psychology, 10.1007/978-3-030-48043-1, (1-10), (2020).
- D. G. O’Neill, C. Pegram, P. Crocker, D. C. Brodbelt, D. B. Church, R. M. A. Packer, Unravelling the health status of brachycephalic dogs in the UK using multivariable analysis, Scientific Reports, 10.1038/s41598-020-73088-y, 10, 1, (2020).
- Danny Valdez, Colby J. Vorland, Andrew W. Brown, Evan Mayo-Wilson, Justin Otten, Richard Ball, Sean Grant, Rachel Levy, Dubravka Svetina Valdivia, David B. Allison, Improving open and rigorous science: ten key future research opportunities related to rigor, reproducibility, and transparency in scientific research, F1000Research, 10.12688/f1000research.26594.1, 9, (1235), (2020).
- Danilo Dessì, Francesco Osborne, Diego Reforgiato Recupero, Davide Buscaldi, Enrico Motta, Harald Sack, AI-KG: An Automatically Generated Knowledge Graph of Artificial Intelligence, The Semantic Web – ISWC 2020, 10.1007/978-3-030-62466-8_9, (127-143), (2020).
- Michael Michaelides, Large Sample Size Bias in Empirical Finance, Finance Research Letters, 10.1016/j.frl.2020.101835, (101835), (2020).
- Danny Valdez, Patricia Goodson, Language Bias in Health Research: External Factors That Influence Latent Language Patterns, Frontiers in Research Metrics and Analytics, 10.3389/frma.2020.00004, 5, (2020).
- Henrique Castro Martins, A importância da Ciência Aberta (Open Science) na pesquisa em Administração, Revista de Administração Contemporânea, 10.1590/1982-7849rac2020190380, 24, 1, (1-2), (2020).
- Win Cowger, Andy M. Booth, Bonnie M. Hamilton, Clara Thaysen, Sebastian Primpke, Keenan Munno, Amy L. Lusher, Alexandre Dehaut, Vitor P. Vaz, Max Liboiron, Lisa I. Devriese, Ludovic Hermabessiere, Chelsea Rochman, Samantha N. Athey, Jennifer M. Lynch, Hannah De Frond, Andrew Gray, Oliver A.H. Jones, Susanne Brander, Clare Steele, Shelly Moore, Alterra Sanchez, Holly Nel, Reporting Guidelines to Increase the Reproducibility and Comparability of Research on Microplastics, Applied Spectroscopy, 10.1177/0003702820930292, (000370282093029), (2020).
- Daniela Schmid, Neville A. Stanton, Exploring Bayesian analyses of a small-sample-size factorial design in human systems integration: the effects of pilot incapacitation, Human-Intelligent Systems Integration, 10.1007/s42454-020-00012-0, (2020).
- Benjamin Riedle, Andrew A. Neath, Joseph E. Cavanaugh, Reconceptualizing the p -value from a likelihood ratio test: a probabilistic pairwise comparison of models based on Kullback-Leibler discrepancy measures , Journal of Applied Statistics, 10.1080/02664763.2020.1754360, (1-28), (2020).
- Lynette M. Smith, Fang Yu, Kendra Schmid, Role of Replication Research in Biostatistics Graduate Education, Journal of Statistics Education, 10.1080/10691898.2020.1844105, (1-17), (2020).
- William S. Sanders, Srishti Srivastava, Ioana Banicescu, undefined, 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 10.1109/IPDPSW.2019.00070, (373-381), (2019).
- Kyryll Udod, Volodymyr Kushnarenko, Stefan Wesner, Volodymyr Svjatnyj, undefined, 2019 10th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS), 10.1109/IDAACS.2019.8924459, (809-813), (2019).
- Alejandra Gonzalez-Beltran, Francesco Osborne, Silvio Peroni, Sahar Vahdati, Editorial: Special Issue on Scholarly Data Analysis (Semantics, Analytics, Visualisation), Data Science, 10.3233/DS-190023, (1-3), (2019).
- Robert Schleusener, Eine überlebenswichtige Warnung … interessiert?, Osteopathische Medizin, 10.1016/S1615-9071(19)30117-0, 20, 4, (22-26), (2019).
- Guillaume A. Rousselet, Georgina Hazell, Anne Cooke, Jeffrey W. Dalley, Promoting and supporting credibility in neuroscience, Brain and Neuroscience Advances, 10.1177/2398212819844167, 3, (239821281984416), (2019).
- Jacob J. Oleson, Grant D. Brown, Ryan McCreery, Essential Statistical Concepts for Research in Speech, Language, and Hearing Sciences, Journal of Speech, Language, and Hearing Research, 10.1044/2018_JSLHR-S-ASTM-18-0239, 62, 3, (489-497), (2019).
- Paul Scherz, , Science and Christian Ethics, 10.1017/9781108593694, (2019).
- William M. Goodman, Susan E. Spruill, Eugene Komaroff, A Proposed Hybrid Effect Size Plus p -Value Criterion: Empirical Evidence Supporting its Use , The American Statistician, 10.1080/00031305.2018.1564697, 73, sup1, (168-185), (2019).
- Raymond Hubbard, Brian D. Haig, Rahul A. Parsa, The Limited Role of Formal Statistical Inference in Scientific Inference, The American Statistician, 10.1080/00031305.2018.1464947, 73, sup1, (91-98), (2019).
- Yaoli Mao, Dakuo Wang, Michael Muller, Kush R. Varshney, Ioana Baldini, Casey Dugan, Aleksandra Mojsilović, How Data ScientistsWork Together With Domain Experts in Scientific Collaborations, Proceedings of the ACM on Human-Computer Interaction, 10.1145/3361118, 3, GROUP, (1-23), (2019).
- Emily Draeger, Amit Sawant, Christopher Johnstone, Brandon Koger, Stewart Becker, Zeljko Vujaskovic, Isabel-Lauren Jackson, Yannick Poirier, A Dose of Reality: How 20 Years of Incomplete Physics and Dosimetry Reporting in Radiobiology Studies May Have Contributed to the Reproducibility Crisis, International Journal of Radiation Oncology*Biology*Physics, 10.1016/j.ijrobp.2019.06.2545, (2019).
- David J.F. Maree, Burning the straw man: What exactly is psychological science?, SA Journal of Industrial Psychology, 10.4102/sajip.v45i0.1731, 45, (2019).
- Jae H. Kim, In Choi, Choosing the Level of Significance: A Decision‐theoretic Approach, Abacus, 10.1111/abac.12172, 0, 0, (2019).
- Elizabeth C. Considine, The Search for Clinically Useful Biomarkers of Complex Disease: A Data Analysis Perspective, Metabolites, 10.3390/metabo9070126, 9, 7, (126), (2019).
- Jae H. Kim, Andrew P. Robinson, Interval-Based Hypothesis Testing and Its Applications to Economics and Finance, Econometrics, 10.3390/econometrics7020021, 7, 2, (21), (2019).
- Thomas R. Dyckman, Stephen A. Zeff, Important Issues in Statistical Testing and Recommended Improvements in Accounting Research, Econometrics, 10.3390/econometrics7020018, 7, 2, (18), (2019).
- Enrique Claver-Cortés, Bartolomé Marco-Lajara, Pedro Seva-Larrosa, Lorena Ruiz-Fernández, Eduardo Sánchez-García, Analysis of the Relationship between Support Institutions and Industrial Districts in Spain: A Regional Approach, Social Sciences, 10.3390/socsci8020034, 8, 2, (34), (2019).
- Anastasia M. Lucas, Nicole E. Palmiero, John McGuigan, Kristin Passero, Jiayan Zhou, Deven Orie, Marylyn D. Ritchie, Molly A. Hall, CLARITE Facilitates the Quality Control and Analysis Process for EWAS of Metabolic-Related Traits, Frontiers in Genetics, 10.3389/fgene.2019.01240, 10, (2019).
- Jenine K. Harris, Todd B. Combs, Kimberly J. Johnson, Bobbi J. Carothers, Douglas A. Luke, Xiaoyan Wang, Three Changes Public Health Scientists Can Make to Help Build a Culture of Reproducible Research, Public Health Reports, 10.1177/0033354918821076, (003335491882107), (2019).
- Timothy J. Grigsby, Justin McLawhorn, Missing Data Techniques and the Statistical Conclusion Validity of Survey-Based Alcohol and Drug Use Research Studies: A Review and Comment on Reproducibility, Journal of Drug Issues, 10.1177/0022042618795878, 49, 1, (44-56), (2018).
- Andrew Zieffler, Joan Garfield, Elizabeth Fry, What Is Statistics Education?, International Handbook of Research in Statistics Education, 10.1007/978-3-319-66195-7_2, (37-70), (2018).
- Greg Finak, Bryan Mayer, William Fulp, Paul Obrecht, Alicia Sato, Eva Chung, Drienna Holman, Raphael Gottardo, DataPackageR: Reproducible data preprocessing, standardization and sharing using R/Bioconductor for collaborative data analysis, Gates Open Research, 10.12688/gatesopenres.12832.2, 2, (31), (2018).
- Greg Finak, Bryan Mayer, William Fulp, Paul Obrecht, Alicia Sato, Eva Chung, Drienna Holman, Raphael Gottardo, DataPackageR: Reproducible data preprocessing, standardization and sharing using R/Bioconductor for collaborative data analysis, Gates Open Research, 10.12688/gatesopenres.12832.1, 2, (31), (2018).
- Ben G. Fitzpatrick, Elena Koustova, Yun Wang, Getting personal with the “reproducibility crisis”: interviews in the animal research community, Lab Animal, 10.1038/s41684-018-0088-6, 47, 7, (175-177), (2018).
- William M. Briggs, Testing, Prediction, and Cause in Econometric Models, Econometrics for Financial Applications, 10.1007/978-3-319-73150-6_1, (3-19), (2018).
- Neri Kafkafi, Joseph Agassi, Elissa J. Chesler, John C. Crabbe, Wim E. Crusio, David Eilam, Robert Gerlai, Ilan Golani, Alex Gomez-Marin, Ruth Heller, Fuad Iraqi, Iman Jaljuli, Natasha A. Karp, Hugh Morgan, George Nicholson, Donald W. Pfaff, S. Helene Richter, Philip B. Stark, Oliver Stiedl, Victoria Stodden, Lisa M. Tarantino, Valter Tucci, William Valdar, Robert W. Williams, Hanno Würbel, Yoav Benjamini, Reproducibility and replicability of rodent phenotyping in preclinical studies, Neuroscience & Biobehavioral Reviews, 10.1016/j.neubiorev.2018.01.003, 87, (218-232), (2018).
- Nancy J. Cox, Jennifer E. Below, Critical Evaluation of Data Requires Rigorous but Broadly Based Statistical Inference, Circulation Research, 10.1161/CIRCRESAHA.118.312530, 122, 8, (1049-1051), (2018).
- Giuseppe Cardellini, Tatiana Valada, Claire Cornillier, Estelle Vial, Marian Dragoi, Venceslas Goudiaby, Volker Mues, Bruno Lasserre, Arkadiusz Gruchala, Per Kristian Rørstad, Mathias Neumann, Miroslav Svoboda, Risto Sirgmets, Olli-Pekka Näsärö, Frits Mohren, Wouter M. J. Achten, Liesbet Vranken, Bart Muys, EFO-LCI: A New Life Cycle Inventory Database of Forestry Operations in Europe, Environmental Management, 10.1007/s00267-018-1024-7, 61, 6, (1031-1047), (2018).
- Robert Gerlai, Reproducibility and replicability in zebrafish behavioral neuroscience research, Pharmacology Biochemistry and Behavior, 10.1016/j.pbb.2018.02.005, (2018).
- Elena L. Grigorenko, Data Analyses that Meet Current Standards of the Profession, Guide to Publishing in Psychology Journals, 10.1017/9781108304443, (82-99), (2018).
- Gary An, The Crisis of Reproducibility, the Denominator Problem and the Scientific Role of Multi-scale Modeling, Bulletin of Mathematical Biology, 10.1007/s11538-018-0497-0, 80, 12, (3071-3080), (2018).
- Michael Harwell, Nidhi Kohli, Yadira Peralta-Torres, A Survey of Reporting Practices of Computer Simulation Studies in Statistical Research, The American Statistician, 10.1080/00031305.2017.1342692, 72, 4, (321-327), (2018).
- Peter Ivie, Douglas Thain, Reproducibility in Scientific Computing, ACM Computing Surveys, 10.1145/3186266, 51, 3, (1-36), (2018).
- Jenine K. Harris, Kimberly J. Johnson, Bobbi J. Carothers, Todd B. Combs, Douglas A. Luke, Xiaoyan Wang, Use of reproducible research practices in public health: A survey of public health analysts, PLOS ONE, 10.1371/journal.pone.0202447, 13, 9, (e0202447), (2018).
- Hyemin Han, Joonsuk Park, Using SPM 12’s Second-Level Bayesian Inference Procedure for fMRI Analysis: Practical Guidelines for End Users, Frontiers in Neuroinformatics, 10.3389/fninf.2018.00001, 12, (2018).
- Daphna Harel, Murray Baron, Methods for shortening patient-reported outcome measures, Statistical Methods in Medical Research, 10.1177/0962280218795187, (096228021879518), (2018).
- P. Cristian Gugiu, Mihaiela Ristei Gugiu, Michael Barnes, Belinda Gimbert, Megan Sanders, The Development and Validation of the Parental Involvement Survey in their Children’s Elementary Studies (PISCES), Journal of Child and Family Studies, 10.1007/s10826-018-1294-y, (2018).
- George C. Mayne, David I. Watson, Karen Chiam, Damian J. Hussey, ASO Author Reflections: Predicting the Response of Esophageal Adenocarcinoma to Chemoradiotherapy Before Surgery Using MicroRNA Biomarkers Offers Hope to Improve Outcomes by Tailoring Treatment to Predicted Responses, Annals of Surgical Oncology, 10.1245/s10434-018-6958-8, (2018).
- Noam Tractinsky, The Usability Construct: A Dead End?, Human–Computer Interaction, 10.1080/07370024.2017.1298038, 33, 2, (131-177), (2017).
- E. C. Considine, G. Thomas, A. L. Boulesteix, A. S. Khashan, L. C. Kenny, Critical review of reporting of the data analysis step in metabolomics, Metabolomics, 10.1007/s11306-017-1299-3, 14, 1, (2017).
- Alexandra Paxton, Thomas L. Griffiths, Finding the traces of behavioral and cognitive processes in big data and naturally occurring datasets, Behavior Research Methods, 10.3758/s13428-017-0874-x, 49, 5, (1630-1638), (2017).
- Juan Sebastian Beleno Diaz, Claudia Bauzer Medeiros, undefined, 2017 IEEE 13th International Conference on e-Science (e-Science), 10.1109/eScience.2017.26, (138-147), (2017).
- Jan Pablo Burgard, Jan-Philipp Kolb, Hariolf Merkle, Ralf Münnich, Synthetic data for open and reproducible methodological research in social sciences and official statistics, AStA Wirtschafts- und Sozialstatistisches Archiv, 10.1007/s11943-017-0214-8, 11, 3-4, (233-244), (2017).
- Cooper S. Schumacher, Lianne Sheppard, Re, Epidemiology, 10.1097/EDE.0000000000000620, 28, 3, (e27-e28), (2017).
- S Helene Richter, Systematic heterogenization for better reproducibility in animal experimentation, Lab Animal, 10.1038/laban.1330, 46, 9, (343-349), (2017).
- PHILIPP KELLMEYER, Ethical and Legal Implications of the Methodological Crisis in Neuroimaging, Cambridge Quarterly of Healthcare Ethics, 10.1017/S096318011700007X, 26, 04, (530-554), (2017).
- Jae H. Kim, Philip Ji, Kamran Ahmed, Significance Testing in Accounting Research: A Critical Evaluation Based on Evidence, SSRN Electronic Journal, 10.2139/ssrn.3032438, (2017).
- João D. Ferreira, Bruno Inácio, Reza M. Salek, Francisco M. Couto, Assessing Public Metabolomics Metadata, Towards Improving Quality, Journal of Integrative Bioinformatics, 10.1515/jib-2017-0054, 14, 4, (2017).
- Edo Pellizzari, Kathleen Lohr, Alan Blatecky, Darryl Creel, , Reproducibility: A Primer on Semantics and Implications for Research, 10.3768/rtipress.2017.bk.0020.1708, (2017).
- Justin Esarey, Ahra Wu, Measuring the effects of publication bias in political science, Research & Politics, 10.1177/2053168016665856, 3, 3, (205316801666585), (2016).
- William Briggs, William Briggs, Modelling Goals, Strategies, and Mistakes, Uncertainty, 10.1007/978-3-319-39756-6, (203-244), (2016).
- F. Peters, Becoming Goldilocks, Perspectives on Data Science for Software Engineering, 10.1016/B978-0-12-804206-9.00036-2, (193-197), (2016).
- Ronald L. Wasserstein, Nicole A. Lazar, The ASA's Statement on p -Values: Context, Process, and Purpose , The American Statistician, 10.1080/00031305.2016.1154108, 70, 2, (129-133), (2016).
- R P Gale, A Hochhaus, M-J Zhang, What is the (p-) value of the P-value?, Leukemia, 10.1038/leu.2016.193, 30, 10, (1965-1967), (2016).
- Jae H. Kim, How to Choose the Level of Significance: A Pedagogical Note, SSRN Electronic Journal, 10.2139/ssrn.2652773, (2015).





