These days I like to discuss
- Analytic Design Theory
- Statistical Communication
- The Casual Inference Podcast
- Large-scale medical data
- Italian
- Co-founding R-Ladies Nashville
- Disney World
over coffee
Lucy D’Agostino McGowan
Lucy D’Agostino McGowan is an assistant professor in the Department of Statistical Sciences at Wake Forest University. She received her PhD in Biostatistics from Vanderbilt University and completed her postdoctoral training at Johns Hopkins University Bloomberg School of Public Health. Her research focuses on analytic design theory, statistical communication, causal inference, and data science pedagogy. Dr. D’Agostino McGowan is the 2023 chair of the American Statistical Association’s Section on Statistical Graphics and can be found blogging at livefreeordichotomize.com, on Twitter @LucyStats, and podcasting on the American Journal of Epidemiology partner podcast, Casual Inference.
Awards
Lucy was selected for the Teaching in the Health Sciences Young Investigator Award for her paper Design Principles for Data Analysis. She was also selected as an ASA StatsForward Fellow.
Listen to the Casual Inference Podcast
Recent & Upcoming Talks
Bridging the Gap Between Theory and Practice: When to Include the Outcome in Your Imputation Model
Missing data is a common challenge when analyzing epidemiological data, and imputation is often used to address this issue. This talk will investigate the scenario where a covariate used in an analysis has missingness and will be imputed. There are recommendations to include the outcome from the analysis model in the imputation model for missing covariates, but it is not necessarily clear if this recommendation always holds and why this is sometimes true. We examine deterministic imputation (i.e., single imputation with a fixed value) and stochastic imputation (i.e., single or multiple imputation with random values) methods and their implications for estimating the relationship between the imputed covariate and the outcome. We mathematically demonstrate that including the outcome variable in imputation models is not just a recommendation but a requirement to achieve unbiased results when using stochastic imputation methods. Moreover, we dispel common misconceptions about deterministic imputation models and demonstrate why the outcome should not be included in these models. This talk aims to bridge the gap between imputation in theory and in practice, providing mathematical derivations to explain common statistical recommendations.
Read moreBridging the gap between imputation theory and practice
Handling missing data presents a significant challenge in epidemiological data analysis, with imputation frequently employed to handle this issue. It is often advised to use the outcome variable in the imputation model for missing covariates, though the rationale of this advice is not always clear. This presentation will explore both deterministic imputation (i.e., single imputation using fixed values) and stochastic imputation (i.e., single or multiple imputation using random values) approaches and their effects on estimating the association between an imputed covariate and outcome. We will show that the inclusion of the outcome variable in imputation models is not merely a suggestion but a necessity for obtaining unbiased estimates in stochastic imputation approaches. Furthermore, we will clarify misconceptions regarding deterministic imputation models and explain why the outcome variable should be excluded from these models. The goal of this presentation is to connect theory behihnd imputation and its practical application, offering mathematical proofs to elucidate common statistical guidelines.
Read moreCausal Inference is Not Just a Statistics Problem
In this talk we will discuss four datasets, similar to Anscombe’s quartet, that aim to highlight the challenges involved when estimating causal effects. Each of the four datasets is generated based on a distinct causal mechanism: the first involves a collider, the second involves a confounder, the third involves a mediator, and the fourth involves the induction of M-Bias by an included factor. Despite the fact that the statistical summaries and visualizations for each dataset are identical, the true causal effect differs, and estimating it correctly requires knowledge of the data-generating mechanism. These example datasets can help practitioners gain a better understanding of the assumptions underlying causal inference methods and emphasize the importance of gathering more information beyond what can be obtained from statistical tools alone.
Read moreTeaching
STA 363 -- WFU Spring 2023
Statistical learning. Learn the theory behind cutting edge statistical and machine learning techniques. Gain hands on experience with real data from a variety of disciplines. The course will focus on the statistical computing language R.
Read moreSTA 112 -- WFU Fall 2022
Statistical models. Learn to explore, visualize, model, evaluate, and communicate data in a reproducible manner. Gain hands on experience with real data from a variety of disciplines. The course will focus on the statistical computing language R.
Read moreSTA 379/679 -- WFU Spring 2022
Causal Inference. From Correlation to Causation. The goal of this course is to give students the skills needed to conduct analyses and communicate results when causality is the goal. Students will learn how to implement causal inference techniques including matching and weighting, evaluate assumptions, and conduct sensitivity analyses.
Read moreWriting
The 'Why' behind including 'Y' in your imputation model
Missing data is a common challenge when analyzing epidemiological data, and imputation is often used to address this issue. Here, we investigate the scenario where a covariate used in an analysis has missingness and will be imputed. There are recommendations to include the outcome from the analysis model in the imputation model for missing covariates, but it is not necessarily clear if this recommendation always holds and why this is sometimes true.
Read morePower and sample size calculations for testing the ratio of reproductive values in phylogenetic samples
The quality of the inferences we make from pathogen sequence data is determined by the number and composition of pathogen sequences that make up the sample used to drive that inference. However, there remains limited guidance on how to best structure and power studies when the end goal is phylogenetic inference. One question that we can attempt to answer with molecular data is whether some people are more likely to transmit a pathogen than others.
Read more