The Case for Deterministic Imputation in Predictive Modeling

By Lucy D'Agostino McGowan in Invited Oral Presentation

May 13, 2025

Abstract

While multiple imputation is widely accepted for handling missing data in clinical research, its default use in predictive modeling may be inappropriate. Multiple imputation relies on access to the outcome variable to avoid bias, an assumption that breaks down in real-world deployment where the outcome is unknown. This talk argues that deterministic imputation methods, which do not depend on the outcome and are computationally efficient, are better suited for building predictive models intended for deployment. We present theoretical results and simulation evidence demonstrating that deterministic imputation maintains model validity and performance without introducing information leakage. We conclude that for predictive tasks, particularly in clinical settings where transparency, reproducibility, and alignment with deployment conditions are essential, deterministic imputation should be the standard.

Date

May 13, 2025

Time

9:00 AM – 10:00 AM

Event

St. Jude Biostatistics and Data Science Research Forum