The Role of Congeniality in Multiple Imputation for Doubly Robust Causal Estimation

This talk provides clear and practical guidance on the specification of imputation models when multiple imputation is used in conjunction with doubly robust estimation methods for causal inference. Through theoretical arguments and targeted simulations, we show that when a confounder has missing data the corresponding imputation model must include all variables used in either the propensity score model or the outcome model, and that these variables must appear in the same functional form as in the final analysis. Violating these conditions can lead to biased treatment eﬀect estimates, even when both components of the doubly robust estimator are correctly specified. We present a mathematical framework for doubly robust estimation combined with multiple imputation, establish the theoretical requirements for proper imputation in this setting, and demonstrate the consequences of misspecification through simulation. Based on these findings, we oﬀer concrete recommendations to ensure valid inference when using multiple imputation with doubly robust methods in applied causal analyses.

The Art of Data Refinement: Severance Analyses

This talk demonstrates data extraction from multiple sources using the popular television series Severance as an example. For example, we collected and analyzed elevator sounds predict narrative events, voice recordings underwent cepstral analysis to estimate fundamental frequencies and characterize speaker- specific distributions, with k-nearest neighbors used for classification, and text mining was performed on episode scripts to quantify dialogue patterns. These analyses illustrate how statistical methods can be applied to unconventional data sources from entertainment media.

Exploring the Potential of Large Language Models in Generating Saturated DAGs for Causal Inference

This talk investigates whether large language models (LLMs) could potentially assist in the creation of “saturated DAGs”, graphical representations that exhaustively map all possible causal pathways in a system. We’ll critically examine if and how LLMs might help identify the full space of plausible causal relationships that traditional approaches may overlook. The presentation will assess the strengths and limitations of prompting LLMs to generate comprehensive causal structures, identify backdoor paths, and navigate complex causal systems.

Causal Inference Is Not Just A Statistics Problem

In this talk we will discuss four datasets, similar to Anscombe’s quartet, that aim to highlight the challenges involved when estimating causal effects. Each of the four datasets is generated based on a distinct causal mechanism: the first involves a collider, the second involves a confounder, the third involves a mediator, and the fourth involves the induction of M-Bias by an included factor. Despite the fact that the statistical summaries and visualizations for each dataset are identical, the true causal effect differs, and estimating it correctly requires knowledge of the data-generating mechanism. These example datasets can help practitioners gain a better understanding of the assumptions underlying causal inference methods and emphasize the importance of gathering more information beyond what can be obtained from statistical tools alone.

Causal Inference in R

In this workshop, we’ll teach the essential elements of answering causal questions in R through causal diagrams, and causal modeling techniques such as propensity scores and inverse probability weighting. In both data science and academic research, prediction modeling is often not enough; to answer many questions, we need to approach them causally. In this workshop, we’ll teach the essential elements of answering causal questions in R through causal diagrams, and causal modeling techniques such as propensity scores and inverse probability weighting. We’ll also show that by distinguishing predictive models from causal models, we can better take advantage of both tools. You’ll be able to use the tools you already know–the tidyverse, regression models, and more–to answer the questions that are important to your work. This workshop is for you if you: know how to fit a linear regression model in R, have a basic understanding of data manipulation and visualization using tidyverse tools, and are interested in understanding the fundamentals behind how to move from estimating correlations to causal relationships.

Immaculate vibes, questionable code

Inspired by the practical problem of chairing a useR conference session, this talk walks through building a Slido replacement in pure “vibe coding” mode — no specs, no tests, just a vision and a stubborn refusal to stop until it worked (sort of). We’ll explore the chaotic joy, the accidental brilliance, and the inevitable questionable code that got us there.

Building Strong ASA Student Chapters: Strategies for Growth, Inclusivity, and Maximizing Value

This invited panel session offers guidance for statistics and data science faculty and students seeking to establish or enhance their ASA student chapter. The panel will feature a diverse group of faculty advisors and chapter officers who will share their experiences starting and running a successful student chapter. The panelists will discuss the benefits of ASA affiliation, strategies for maintaining an active and inclusive chapter, and resources available to support students and faculty. Attendees will gain practical tips and strategies for transitioning from a statistics club to an ASA student chapter, overcoming challenges, and fostering a strong community of statistics and data science learners.

Understanding Statistics in Medical Literature

In today’s fast-paced healthcare landscape, understanding data and statistics is essential for making informed decisions. Whether you’re a medical student navigating your first journal article or a healthcare professional hoping to apply the latest research to patient care, the ability to critically evaluate medical literature is a vital skill. This course is designed to introduce you to the core concepts of data and statistics, equipping you with the tools to extract meaningful insights from research without becoming bogged down in complex mathematical notation.

The Case for Deterministic Imputation in Predictive Modeling

While multiple imputation is widely accepted for handling missing data in clinical research, its default use in predictive modeling may be inappropriate. Multiple imputation relies on access to the outcome variable to avoid bias, an assumption that breaks down in real-world deployment where the outcome is unknown. This talk argues that deterministic imputation methods, which do not depend on the outcome and are computationally efficient, are better suited for building predictive models intended for deployment. We present theoretical results and simulation evidence demonstrating that deterministic imputation maintains model validity and performance without introducing information leakage. We conclude that for predictive tasks, particularly in clinical settings where transparency, reproducibility, and alignment with deployment conditions are essential, deterministic imputation should be the standard.

The Why Behind Including Y in your Imputation Model

Handling missing data is a frequent challenge in analyses of health data, and imputation techniques are often employed to address this issue. This talk focuses on scenarios where a covariate with missing values is to be imputed and examines the prevailing recommendation to include the outcome variable in the imputation model. Specifically, we delve into stochastic imputation methods and their effects on accurately estimating the relationship between the imputed covariate and the outcome. Through mathematical proofs and a series of simulations, we demonstrate that incorporating the outcome variable in imputation models is essential for achieving unbiased results with stochastic imputation. Furthermore, we address the concern that this practice constitutes “double dipping” or data dredging. By providing both theoretical and empirical evidence, we show why including the outcome variable is a legitimate and necessary approach rather than a source of bias.

Recent & Upcoming Talks

2025