Why You Must Include the Outcome in Your Imputation Model (and Why It's Not Double Dipping)

By Lucy D'Agostino McGowan in Invited Oral Presentation

May 28, 2024


Handling missing data is a frequent challenge in analyses of health data, and imputation techniques are often employed to address this issue. This talk focuses on scenarios where a covariate with missing values is to be imputed and examines the prevailing recommendation to include the outcome variable in the imputation model. Specifically, we delve into stochastic imputation methods and their effects on accurately estimating the relationship between the imputed covariate and the outcome. Through mathematical proofs and a series of simulations, we demonstrate that incorporating the outcome variable in imputation models is essential for achieving unbiased results with stochastic imputation. Furthermore, we address the concern that this practice constitutes "double dipping" or data dredging. By providing both theoretical and empirical evidence, we show why including the outcome variable is a legitimate and necessary approach rather than a source of bias.


May 28, 2024


12:00 PM – 1:00 PM


Wake Forest University School of Medicine Division of Public Health Sciences Grand Rounds 2024