The ‘Why’ behind including ‘Y’ in your imputation model

Missing data is a common challenge when analyzing epidemiological data, and imputation is often used to address this issue. Here, we investigate the scenario where a covariate used in an analysis has missingness and will be imputed. There are recommendations to include the outcome from the analysis model in the imputation model for missing covariates, but it is not necessarily clear if this recommendation always holds and why this is sometimes true.

Power and sample size calculations for testing the ratio of reproductive values in phylogenetic samples

The quality of the inferences we make from pathogen sequence data is determined by the number and composition of pathogen sequences that make up the sample used to drive that inference. However, there remains limited guidance on how to best structure and power studies when the end goal is phylogenetic inference. One question that we can attempt to answer with molecular data is whether some people are more likely to transmit a pathogen than others.

The Study of the Epidemiology of Pediatric Hypertension Registry (SUPERHERO): Rationale and Methods

[Peer Reviewed Article]

By Andrew M. South, Victoria C. Giammattei, Kiri W. Bagley, Christine Y. Bakhoum, William H. Beasley, Morgan B. Bily, Shupti Biswas, Aaron M. Bridges, Rushelle L. Byfield, Jessica Fallon Campbell, Rahul Chanchlani, Ashton Chen, Lucy D’Agostino McGowan, Stephen M. Downs, Gina M. Fergeson, Jason H. Greenberg, Taylor A. Hill-Horowitz, Elizabeth T. Jensen, Mahmoud Kallash, Margret Kamel, Stefan G. Kiessling, David M. Kline, John R. Laisure, Gang Liu, Jackson Londeree, Caroline B. Lucas, Sai Sudha Mannemuddhu, Kuo-Rei Mao, Jason M. Misurac, Margaret O. Murphy, James T. Nugent, Elizabeth A. Onugha, Ashna Pudpuakkam, Kathy M. Redmond, Sandeep Riar, Christine B. Sethna, Sahar Siddiqui, Ashley L. Thumann, Stephen R. Uss, Carol L. Vincent, Irina V. Viviano, Michael J. Walsh, Blanche D. White, Robert P. Woroniecki, Michael Wu, Ikuyo Yamaguchi, Emily Yun, Donald J. Weaver, Jr.

February 20, 2024

Causal Inference is not just a statistics problem

This paper introduces a collection of four data sets, similar to Anscombe’s Quartet, that aim to highlight the challenges involved when estimating causal effects. Each of the four data sets is generated based on a distinct causal mechanism: the first involves a collider, the second involves a confounder, the third involves a mediator, and the fourth involves the induction of M-Bias by an included factor. The paper includes a mathematical summary of each data set, as well as directed acyclic graphs that depict the relationships between the variables.

Design Principles for Data Analysis

The data revolution has led to an increased interest in the practice of data analysis. While much has been written about statistical thinking, a complementary form of thinking that appears in the practice of data analysis is design thinking – the problem-solving process to understand the people for whom a solution is being designed. For a given problem, there can be significant or subtle differences in how a data analyst (or producer of a data analysis) constructs, creates, or designs a data analysis, including differences in the choice of methods, tooling, and workflow.

Randomized controlled trial: Quantifying the impact of disclosing uncertainty on adherence to hypothetical health recommendations

We conducted a randomized controlled trial to assess whether disclosing elements of uncertainty in an initial public health statement will change the likelihood that participants will accept new, different advice that arises as more evidence is uncovered. Proportional odds models were fit, stratified by the baseline likelihood to agree with the final advice. 298 participants were randomized to the treatment arm and 298 in the control arm. Among participants who were more likely to agree with the final recommendation at baseline, those who were initially shown uncertainty had a 46% lower odds of being more likely to agree with the final recommendation compared to those who were not (OR: 0.

Sensitivity Analyses for Unmeasured Confounders

This review expands on sensitivity analyses for unmeasured confounding techniques, demonstrating state-of-the-art methods as well as specifying which should be used under various scenarios, depending on the information about a potential unmeasured confounder available to the researcher. Recent Findings: Methods to assess how sensitive an observed estimate is to unmeasured confounding have been developed for decades. Recent advancements have allowed for the incorporation of measured confounders in these assessments, updated the methods used to quantify the impact of an unmeasured confounder, whether specified in terms of the magnitude of the effect from a regression standpoint, for example, as a risk ratio, or with respect to the percent of variation in the outcome or exposure explained by the unmeasured confounder.

A Visual Diagnostic Tool for Causal Inference

Rosenbaum and Rubin (1983) suggested a visual representation, that can be used as a diagnostic tool, for examining whether the relationship between the confounders and the outcome are sufficiently controlled, or whether there is a more complex relationship that requires further adjustment. This short commentary highlights this simple tool, providing an example of its utility along with relevant R code.


By D’Agostino McGowan L, D’Agostino Sr RB, D’Agostino Jr RB

August 1, 2022

Pathogenesis, Symptomatology, and Transmission of SARS-CoV-2 through Analysis of Viral Genomics and Structure

The novel coronavirus SARS-CoV-2, which emerged in late 2019, has since spread around the world and infected hundreds of millions of people with coronavirus disease 2019 (COVID-19). While this viral species was unknown prior to January 2020, its similarity to other coronaviruses that infect humans has allowed for rapid insight into the mechanisms that it uses to infect human hosts, as well as the ways in which the human immune system can respond.

[Peer Reviewed Article]

By Rando HM, MacLean AL, Lee AJ, Lordan R, Ray S, Bansal V, Skelly AN, Sell E, Dziak JJ, Shinholster L, D’Agostino McGowan L, Ben Guebila M, Wellhausen N, Knyazev S, Boca SM, Capone S, Qi Y, Park Y, Mai D, Sun Y, Boerckel JD, Brueffer C, Byrd JB, Kamil JP, Wang J, Velazquez R, Szeto GL, Barton JP, Goel RR, Mangul S, Lubiana T; COVID-19 Review Consortium Vikas Bansal, John P. Barton, Simina M. Boca, Joel D. Boerckel, Christian Brueffer, James Brian Byrd, Stephen Capone, Shikta Das, Anna Ada Dattoli, John J. Dziak, Jeffrey M. Field, Soumita Ghosh, Anthony Gitter, Rishi Raj Goel, Casey S. Greene, Marouen Ben Guebila, Daniel S. Himmelstein, Fengling Hu, Nafisa M. Jadavji, Jeremy P. Kamil, Sergey Knyazev, Likhitha Kolla, Alexandra J. Lee, Ronan Lordan, Tiago Lubiana, Temitayo Lukan, Adam L. MacLean, David Mai, Serghei Mangul, David Manheim, Lucy D’Agostino McGowan, Amruta Naik, YoSon Park, Dimitri Perrin, Yanjun Qi, Diane N. Rafizadeh, Bharath Ramsundar, Halie M. Rando, Sandipan Ray, Michael P. Robson, Vincent Rubinetti, Elizabeth Sell, Lamonica Shinholster, Ashwin N. Skelly, Yuchen Sun, Yusha Sun, Gregory L. Szeto, Ryan Velazquez, Jinhui Wang, Nils Wellhausen, Gitter A, Greene CS

October 25, 2021

The Expert Next Door: A Commentary on Interactions with Friends and Family During the SARS-CoV-2 Pandemic

The coronavirus disease 2019 (COVID-19) pandemic thrust the field of public health into the spotlight. For many epidemiologists, biostatisticians, and other public health professionals, this caused the professional aspects of our lives to collide with the personal, as friends and family reached out with concerns and questions. Learning how to navigate this space was new for many and required refining our communication depending on context, setting, and audience. Some of us took to social media, utilizing our existing personal accounts to share information after sorting through and summarizing the rapidly emerging literature to keep loved ones safe.


By Molino, Andrea R and Andersen, Kathleen M and Sawyer, Simone B and Ðoàn, Lan N and Rivera, Yonaira M and James, Bryan D and Fox, Matthew P and Murray, Eleanor J and D’Agostino McGowan, Lucy and Jarrett, Brooke A

October 7, 2021

Welcome to the Tidyverse

[Peer Reviewed Article]

By H Wickham, M Averick, J Bryan, W Chang, L D’Agostino McGowan, R François, G Grolemund, A Hayes, L Henry, J Hester, M Kuhn, T Lin Pedersen, E Miller, S Milton Bache, K Müller, J Ooms, D Robinson, D Paige Seidel, V Spinu, K Takahashi, D Vaughan, C Wilke, K Woo, and H Yutani

November 1, 2019

Meta-analysis Comparing Outcomes of Two Different Negative Pressure Therapy Systems in Closed Incision Management.

Background:Closed incision negative pressure therapy (ciNPT) is an emerging approach to managing closed incisions of patients at risk of postoperative complications. There are primarily 2 different commercially available ciNPT systems. Both systems consist of a single-use, battery-powered device and foam- or gauze-based peel-and-place dressing designed for closed incisions. These systems vary in design, and there are no data comparing outcomes between the 2 systems. Methods:We performed 2 separate meta-analyses to compare surgical site infection (SSI) rates postuse of (1) ciNPT with foam dressing (FOAM) versus conventional dressings and (2) ciNPT with multilayer absorbent dressing (MLA) versus conventional dressings.

[Peer Reviewed Article]

By DP Singh, A Gabriel, RP Silverman, LP Griffin, L D’Agostino McGowan, RB D’Agostino Jr

June 1, 2019

Caring for Critically Ill Patients with the ABCDEF Bundle: Results of the ICU Liberation Collaborative in Over 15,000 Adults

Objective: Decades-old, common ICU practices including deep sedation, immobilization, and limited family access are being challenged. We endeavoured to evaluate the relationship between ABCDEF bundle performance and patient-centered outcomes in critical care. Design: Prospective, multicenter, cohort study from a national quality improvement collaborative. Setting: 68 academic, community, and federal ICUs collected data during a 20-month period. Patients: 15,226 adults with at least one ICU day. Interventions: We defined ABCDEF bundle performance (our main exposure) in two ways: 1) complete performance (patient received every eligible bundle element on any given day) and 2) proportional performance (percentage of eligible bundle elements performed on any given day).

[Peer Reviewed Article]

By BT Pun, MC Balas, MA Barnes-Daly, JL Thompson, JM Aldrich, J Barr, D Byrum, SS Carson, JW Devlin, HJ Engel, CL Esbrook, KD Hargett, LRRT Harmon, C Hielsberg, JC Jackson, TL Kelly, V Kumar, L Millner, A Morse, CS Perme, PJ Posa, KA Puntillo, WD Schweickert, JL Stollings, A Tan, L D’Agostino McGowan, EW Ely

January 1, 2019

Metformin use and incidence cancer risk: evidence for a selective protective effect against liver cancer

Purpose Several observational studies suggest that metformin reduces incidence cancer risk; however, many of these studies suffer from time-related biases and several cancer outcomes have not been investigated due to small sample sizes. Methods We constructed a propensity score-matched retrospective cohort of 84,434 veterans newly prescribed metformin or a sulfonylurea as monotherapy. We used Cox proportional hazard regression to assess the association between metformin use compared to sulfonylurea use and incidence cancer risk for 10 solid tumors.

[Peer Reviewed Article]

By HJ Murff, CL Roumie, RA Greevy, AJ Hackstadt, L D’Agostino McGowan, AM Hung, CG Grijalva, MR Griffin

July 18, 2018

Sulfonylureas as second line treatment for type 2 diabetes

New evidence helps individualise treatment decisions and minimise harm. Sulfonylureas and insulin were the cornerstone of diabetes management until 1998 when metformin became recommended as initial treatment by the American Diabetes Association and the European Association for the Study of Diabetes. Before 2008, treatments for diabetes were often approved on the basis of their ability to lower glycated haemoglobin (HbA1c) by 0.5% (5.5 mmol/mol) or other surrogate outcomes. After the controversy surrounding cardiovascular risk associated with thiazolidinediones, regulatory agencies such as the US Food and Drug Administration and the European Medicines Agency issued guidance for industries to evaluate the cardiovascular safety of antidiabetes drugs.

Improving Modern Techniques of Causal Inference: Finite Sample Performance of ATM and ATO Doubly Robust Estimators, Variance Estimation for ATO Estimators, and Contextualized Tipping Point Sensitivity Analyses for Unmeasured Confounding

While estimators that incorporate both direct covariate adjustment and inverse probability weighting have drawn considerable interest, their finite sample properties have been challenged in seminal papers, such as Freedman and Berk (2008). We derive a doubly robust ATO estimator and demonstrate excellent finite sample performance for ATO and ATM doubly robust estimators in the setting of Freedman and Berk (2008). The methods and performance of variance estimators for IPW and IPW doubly robust estimators incorporating the recently defined ATO weights are an important open question in the field.

Second-generation p-values: Improved rigor, reproducibility, & transparency in statistical analyses

Verifying that a statistically significant result is scientifically meaningful is not only good scientific practice, it is a natural way to control the Type I error rate. Here we introduce a novel extension of the p-value—a second-generation p-value ($p_δ$)–that formally accounts for scientific relevance and leverages this natural Type I Error control. The approach relies on a pre-specified interval null hypothesis that represents the collection of effect sizes that are scientifically uninteresting or are practically null.

Comparative Safety of Sulfonylurea and Metformin Monotherapy on the Risk of Heart Failure: A Cohort Study

Background Medications that impact insulin sensitivity or cause weight gain may increase heart failure risk. Our aim was to compare heart failure and cardiovascular death outcomes among patients initiating sulfonylureas for diabetes mellitus treatment versus metformin. Methods and Results National Veterans Health Administration databases were linked to Medicare, Medicaid, and National Death Index data. Veterans aged ≥18 years who initiated metformin or sulfonylureas between 2001 and 2011 and whose creatinine was <1.

[Peer Reviewed Article]

By CL Roumie, JY Min, L D’Agostino McGowan, C Presley, CG Grijalva, AJ Hackstadt, AM Hung, RA Greevy, T Elasy, MR Griffin

April 1, 2017


Location Bias in ROC Studies

Location bias occurs when a reader detects a false lesion in a subject with disease and the falsely detected lesion is considered a true positive. In this study, we examine the effect of location bias in two large MRMC ROC studies, comparing three ROC scoring methods. We compare one method that only uses the maximum confidence score and does not take location bias into account (maxROC), and two methods that take location bias into account: the region of interest ROC (ROI–ROC) and the free-response ROC (FROC).

Secondary consent to biospecimen use in a prostate cancer biorepository

Background Biorepository research has substantial societal benefits. This is one of the few studies to focus on male willingness to allow future research use of biospecimens. Methods This study analyzed the future research consent questions from a prostate cancer biorepository study (N = 1931). The consent form asked two questions regarding use of samples in future studies (1) without and (2) with protected health information (PHI). Yes to both questions of use of samples was categorized as Yes-Always; Yes to without and No to with PHI was categorized as Yes-Conditional; No to without PHI was categorized as Never.

Quantitative evaluation of the community research fellows training program

Context The community research fellows training (CRFT) program is a community-based participatory research (CBPR) initiative for the St. Louis area. This 15-week program, based on a Master in Public Health curriculum, was implemented by the Division of Public Health Sciences at Washington University School of Medicine and the Siteman Cancer Center. Objectives We measure the knowledge gained by participants and evaluate participant and faculty satisfaction of the CRFT program both in terms of meeting learning objectives and actively engaging the community in the research process.

[Peer Reviewed Article]

By L D’Agostino McGowan, JD Stafford, VL Thompson, B Johnson-Javois, MS Goodman

July 27, 2015


Mouse low-grade gliomas contain cancer stem cells with unique molecular and functional properties

The availability of adult malignant glioma stem cells (GSCs) has provided unprecedented opportunities to identify the mechanisms underlying treatment resistance. Unfortunately, there is a lack of comparable reagents for the study of pediatric low-grade glioma (LGG). Leveraging a neurofibromatosis-1 (Nf1) genetically-engineered mouse LGG model, we report the isolation of CD133+ multi-potent low-grade glioma stem cells (LG-GSCs), which generate glioma-like lesions histologically similar to the parent tumor following injection into immunocompetent hosts.

[Peer Reviewed Article]

By Y Chen, L D’Agostino McGowan, PJ Cimino, S Dahiya, JR Leonard, DA Lee, DH Gutmann

March 24, 2015


Effects of racial and ethnic group and health literacy on responses to genomic risk information in a medically underserved population

Objective Few studies have examined how individuals respond to genomic risk information for common, chronic diseases. This randomized study examined differences in responses by type of genomic information [genetic test/family history] and disease condition [diabetes/heart disease] and by race/ethnicity in a medically underserved population. Methods 1057 English-speaking adults completed a survey containing one of four vignettes (two-by-two randomized design). Differences in dependent variables (i.e., interest in receiving genomic assessment, discussing with doctor or family, changing health habits) by experimental condition and race/ethnicity were examined using chi-squared tests and multivariable regression analysis.

[Peer Reviewed Article]

By KA Kaphingst, JD Stafford, L D’Agostino McGowan, J Seo, CR Lachance, MS Goodman

February 1, 2015


Is low health literacy associated with increased emergency department utilization and recidivism?

Objectives The objective was to determine whether patients with low health literacy have higher emergency department (ED) utilization and higher ED recidivism than patients with adequate health literacy. Methods The study was conducted at an urban academic ED with more than 95,000 annual visits that is part of a 13-hospital health system, using electronic records that are captured in a central data repository. As part of a larger, cross-sectional, convenience sample study, health literacy testing was performed using the short test of functional health literacy in adults (S-TOFHLA) and standard test thresholds identifying those with inadequate, marginal, and adequate health literacy.

[Peer Reviewed Article]

By RT Griffey, SK Kennedy, L D’Agostino McGowan, MS Goodman, KA Kaphingst

October 1, 2014


Screening for colorectal cancer: Using data to set prevention priorities

Introduction Adherence to colorectal cancer screening recommendations is known to vary by state, but less information is available about within-state variability. In the current study, we assess county-level screening rates for Missouri, with the goal of better targeting public health efforts to increase screening. Methods Prevalence of colorectal cancer screening among Missouri adults between the ages of 50 and 74 was obtained from 2008 and 2010 Behavioral Risk Factor Surveillance System data.

Using small-area analysis to estimate county-level racial disparities in obesity demonstrating the necessity of targeted interventions

Data on the national and state levels is often used to inform policy decisions and strategies designed to reduce racial disparities in obesity. Obesity-related health outcomes are realized on the individual level, and policies based on state and national-level data may be inappropriate due to the variations in health outcomes within and between states. To examine county-level variation of obesity within states, we use a small-area analysis technique to fill the void for county-level obesity data by race.