Keynote

Mind the Gap: Causal Inference is Not Just a Statistics Problem

In this talk we will discuss some of the major challenges in causal inference, and why statistical tools alone cannot uncover the data-generating mechanism when attempting to answer causal questions. We will showcase the Causal Quartet, which consists of four datasets that have the same statistical properties, but different true causal effects due to different ways in which the data was generated. These examples illustrate the limitations of relying solely on statistical tools in data analyses and highlight the crucial role of domain-specific knowledge.

The Role of Congeniality in Multiple Imputation for Doubly Robust Causal Estimation

This talk provides clear and practical guidance on the specification of imputation models when multiple imputation is used in conjunction with doubly robust estimation methods for causal inference. Through theoretical arguments and targeted simulations, we demonstrate that if a confounder has missing data, the corresponding imputation model must include all variables appearing in either the propensity score model or the outcome model, in addition to both the exposure and the outcome, and that these variables must enter the imputation model in the same functional form as in the final analysis. Violating these conditions can lead to biased treatment effect estimates, even when both components of the doubly robust estimator are correctly specified. We present a mathematical framework for doubly robust estimation combined with multiple imputation, establish the theoretical requirements for proper imputation in this setting, and demonstrate the consequences of misspecification through simulation. Based on these findings, we offer concrete recommendations to ensure valid inference when using multiple imputation with doubly robust methods in applied causal analyses.

There and Back Again, a Data Scientist’s Tale

We are in an exciting new age with access to an overwhelming amount of data and information. This talk will focus on three areas that have become increasingly important as a result. First, we will discuss the importance of reproducibility during this age of information overload. As quantitatively minded people, we are being pushed to innovate and develop best practices for reproducibility. We will talk a bit about tools that make this possible and the next steps in this important area. We will then discuss new opportunities for developing innovative methods, particularly in the observational research space. This portion will include a brief introduction to causal inference for the data scientist. Finally, we will examine the importance of well-developed communication skills for quantitatively savvy people. These aspects will be discussed in the context of my winding path to data science, speckled with some advice and lessons learned.