Recent & Upcoming Talks

2025

Causal Inference in R

In this workshop, we’ll teach the essential elements of answering causal questions in R through causal diagrams, and causal modeling techniques such as propensity scores and inverse probability weighting. In both data science and academic research, prediction modeling is often not enough; to answer many questions, we need to approach them causally. In this workshop, we’ll teach the essential elements of answering causal questions in R through causal diagrams, and causal modeling techniques such as propensity scores and inverse probability weighting. We’ll also show that by distinguishing predictive models from causal models, we can better take advantage of both tools. You’ll be able to use the tools you already know–the tidyverse, regression models, and more–to answer the questions that are important to your work. This workshop is for you if you: know how to fit a linear regression model in R, have a basic understanding of data manipulation and visualization using tidyverse tools, and are interested in understanding the fundamentals behind how to move from estimating correlations to causal relationships.

Understanding Statistics in Medical Literature

In today’s fast-paced healthcare landscape, understanding data and statistics is essential for making informed decisions. Whether you’re a medical student navigating your first journal article or a healthcare professional hoping to apply the latest research to patient care, the ability to critically evaluate medical literature is a vital skill. This course is designed to introduce you to the core concepts of data and statistics, equipping you with the tools to extract meaningful insights from research without becoming bogged down in complex mathematical notation.

The Case for Deterministic Imputation in Predictive Modeling

While multiple imputation is widely accepted for handling missing data in clinical research, its default use in predictive modeling may be inappropriate. Multiple imputation relies on access to the outcome variable to avoid bias, an assumption that breaks down in real-world deployment where the outcome is unknown. This talk argues that deterministic imputation methods, which do not depend on the outcome and are computationally efficient, are better suited for building predictive models intended for deployment. We present theoretical results and simulation evidence demonstrating that deterministic imputation maintains model validity and performance without introducing information leakage. We conclude that for predictive tasks, particularly in clinical settings where transparency, reproducibility, and alignment with deployment conditions are essential, deterministic imputation should be the standard.

The Why Behind Including Y in your Imputation Model

Handling missing data is a frequent challenge in analyses of health data, and imputation techniques are often employed to address this issue. This talk focuses on scenarios where a covariate with missing values is to be imputed and examines the prevailing recommendation to include the outcome variable in the imputation model. Specifically, we delve into stochastic imputation methods and their effects on accurately estimating the relationship between the imputed covariate and the outcome. Through mathematical proofs and a series of simulations, we demonstrate that incorporating the outcome variable in imputation models is essential for achieving unbiased results with stochastic imputation. Furthermore, we address the concern that this practice constitutes “double dipping” or data dredging. By providing both theoretical and empirical evidence, we show why including the outcome variable is a legitimate and necessary approach rather than a source of bias.

Untangling Causal Effects: Understanding the Limits of Statistics

This talk will delve into two major causal inference obstacles: (1) identifying which variables to account for and (2) assessing the impact of unmeasured variables. The first half of the talk will showcase a Causal Quartet. In the spirit of Anscombe’s Quartet, this is a set of four datasets with identical statistical properties, yet different true causal effects due to differing data generating mechanisms. These simple datasets provide a straightforward example for statisticians to point to when explaining these concepts to collaborators and students. The second half of the talk will focus on how statistical techniques can be leveraged to examine the impact of a potential unmeasured confounder. We will examine sensitivity analyses under several scenarios with varying levels of information about potential unmeasured confounders, introducing the tipr R package, which provides tools for conducting sensitivity analyses in a flexible and accessible manner.

2024

It’s ME hi, I’m the collider it’s ME

This talk will focus on framing measurement error as a collider from a causal inference perspective. We will begin by demonstrating how to visually display measurement error in directed acyclic graphs (DAGs). We will then show how these graphs can be used to help communicate when corrections for measurement error are needed and how to implement these corrections in order to estimate unbiased effects. Finally, we will demonstrate how sensitivity analyses traditionally used to address omitted variable bias can be used to quantify the potential impact of measurement error.

Including the outcome in your imputation model – why isn’t this ‘double dipping’?

An often repeated question is whether including the outcome in an imputation model is ‘double dipping’ or ‘peeking’ at the outcome in a way that can negatively impact the Type 1 error in studies. This talk will dive into this myth and help dispel these concerns. We mathematically demonstrate that including the outcome variable in imputation models when using stochastic methods is required to avoid biased results. A discussion of these results along with practical advice will follow.

Power and sample size calculations for testing the ratio of reproductive values in phylogenetic samples

The quality of the inferences we make from pathogen sequence data is determined by the number and composition of pathogen sequences that make up the sample used to drive that inference. However, there remains limited guidance on how to best structure and power studies when the end goal is phylogenetic inference. One question that we can attempt to answer with molecular data is whether some people are more likely to transmit a pathogen than others. In this talk we will present an estimator to quantify differential transmission, as measured by the ratio of reproductive numbers between people with different characteristics, using transmission pairs linked by molecular data, along with a sample size calculation for this estimator. We will also provide extensions to our method to correct for imperfect identification of transmission linked pairs, overdispersion in the transmission process, and group imbalance. We validate this method via simulation and provide tools to implement it in an R package, phylosamp.

The Art of the Invite: Crafting Successful Invited Session Proposals

Invited sessions at conferences provide important opportunities for the exchange of ideas. But how do we get invited? And how can we do the inviting? In this panel, we will bring together experienced women in statistics from all career stages to share their tips on organizing invited sessions. Our panelists have planned and participated in numerous successful invited sessions at statistical conferences and have served on program committees to plan and select these sessions on a large scale. This panel is intended to demystify the invited session proposal process and to empower researchers to submit their ideas in the future.

Evaluating the Alignment of a Data Analysis between Analyst and Audience

A challenge that all data analysts face is building a data analysis that is useful for a given audience. In this talk, we will begin by proposing a set of principles for describing data analyses. We will then introduce a concept that we call the alignment of a data analysis between the data analyst and audience. We define a successfully aligned data analysis as the matching of principles between the analyst and the audience for whom the analysis is developed. We will propose a statistical model and general framework for evaluating the alignment of a data analysis. This framework can be used as a guide for practicing data scientists and students in data science courses for how to build better data analyses.

← Newer
1 of 9
Older →