Recent & Upcoming Talks

2024

Power and sample size calculations for testing the ratio of reproductive values in phylogenetic samples

The quality of the inferences we make from pathogen sequence data is determined by the number and composition of pathogen sequences that make up the sample used to drive that inference. However, there remains limited guidance on how to best structure and power studies when the end goal is phylogenetic inference. One question that we can attempt to answer with molecular data is whether some people are more likely to transmit a pathogen than others. In this talk we will present an estimator to quantify differential transmission, as measured by the ratio of reproductive numbers between people with different characteristics, using transmission pairs linked by molecular data, along with a sample size calculation for this estimator. We will also provide extensions to our method to correct for imperfect identification of transmission linked pairs, overdispersion in the transmission process, and group imbalance. We validate this method via simulation and provide tools to implement it in an R package, phylosamp.

Evaluating the Alignment of a Data Analysis between Analyst and Audience

A challenge that all data analysts face is building a data analysis that is useful for a given audience. In this talk, we will begin by proposing a set of principles for describing data analyses. We will then introduce a concept that we call the alignment of a data analysis between the data analyst and audience. We define a successfully aligned data analysis as the matching of principles between the analyst and the audience for whom the analysis is developed. We will propose a statistical model and general framework for evaluating the alignment of a data analysis. This framework can be used as a guide for practicing data scientists and students in data science courses for how to build better data analyses.

Causal Inference in R

This workshop will use the NHANES Epidemiologic Follow-up Study (NHEFS) data. In this workshop, we’ll teach the essential elements of answering causal questions in R through causal diagrams, and causal modeling techniques such as propensity scores and inverse probability weighting.

Why You Must Include the Outcome in Your Imputation Model (and Why It’s Not Double Dipping)

Handling missing data is a frequent challenge in analyses of health data, and imputation techniques are often employed to address this issue. This talk focuses on scenarios where a covariate with missing values is to be imputed and examines the prevailing recommendation to include the outcome variable in the imputation model. Specifically, we delve into stochastic imputation methods and their effects on accurately estimating the relationship between the imputed covariate and the outcome. Through mathematical proofs and a series of simulations, we demonstrate that incorporating the outcome variable in imputation models is essential for achieving unbiased results with stochastic imputation. Furthermore, we address the concern that this practice constitutes “double dipping” or data dredging. By providing both theoretical and empirical evidence, we show why including the outcome variable is a legitimate and necessary approach rather than a source of bias.

May 28, 2024

12:00 PM – 1:00 PM

Wake Forest University School of Medicine Division of Public Health Sciences Grand Rounds 2024


By Lucy D'Agostino McGowan in Invited Oral Presentation

slides

Causal Inference in R

In this workshop, we’ll teach the essential elements of answering causal questions in R through causal diagrams, and causal modeling techniques such as propensity scores and inverse probability weighting.

May 15, 2024

9:00 AM – 5:00 PM

New York R Conference 2024


By Lucy D'Agostino McGowan and Malcolm Barrett in Invited Workshop

details

When to Include the Outcome in Your Imputation Model: A Mathematical Demonstration and Practical Advice

Missing data is a common challenge when analyzing epidemiological data, and imputation is often used to address this issue. This talk will investigate the scenario where a covariate used in an analysis has missingness and will be imputed. There are recommendations to include the outcome from the analysis model in the imputation model for missing covariates, but it is not necessarily clear if this recommendation always holds and why this is sometimes true. We examine deterministic imputation (i.e., single imputation with a fixed value) and stochastic imputation (i.e., single or multiple imputation with random values) methods and their implications for estimating the relationship between the imputed covariate and the outcome. We mathematically demonstrate that including the outcome variable in imputation models is not just a recommendation but a requirement to achieve unbiased results when using stochastic imputation methods. Likewise, we mathematically demonstrate that including the outcome variable in imputation models when using deterministic methods is not recommended, and doing so will induce biased results. A discussion of these results along with practical advice will follow.

Bridging the Gap Between Theory and Practice: When to Include the Outcome in Your Imputation Model

Missing data is a common challenge when analyzing epidemiological data, and imputation is often used to address this issue. This talk will investigate the scenario where a covariate used in an analysis has missingness and will be imputed. There are recommendations to include the outcome from the analysis model in the imputation model for missing covariates, but it is not necessarily clear if this recommendation always holds and why this is sometimes true. We examine deterministic imputation (i.e., single imputation with a fixed value) and stochastic imputation (i.e., single or multiple imputation with random values) methods and their implications for estimating the relationship between the imputed covariate and the outcome. We mathematically demonstrate that including the outcome variable in imputation models is not just a recommendation but a requirement to achieve unbiased results when using stochastic imputation methods. Moreover, we dispel common misconceptions about deterministic imputation models and demonstrate why the outcome should not be included in these models. This talk aims to bridge the gap between imputation in theory and in practice, providing mathematical derivations to explain common statistical recommendations.

Bridging the gap between imputation theory and practice

Handling missing data presents a significant challenge in epidemiological data analysis, with imputation frequently employed to handle this issue. It is often advised to use the outcome variable in the imputation model for missing covariates, though the rationale of this advice is not always clear. This presentation will explore both deterministic imputation (i.e., single imputation using fixed values) and stochastic imputation (i.e., single or multiple imputation using random values) approaches and their effects on estimating the association between an imputed covariate and outcome. We will show that the inclusion of the outcome variable in imputation models is not merely a suggestion but a necessity for obtaining unbiased estimates in stochastic imputation approaches. Furthermore, we will clarify misconceptions regarding deterministic imputation models and explain why the outcome variable should be excluded from these models. The goal of this presentation is to connect theory behihnd imputation and its practical application, offering mathematical proofs to elucidate common statistical guidelines.

March 5, 2024

10:00 AM – 11:00 AM

National Institute for Research in Digital Science and Technology 2024


By Lucy D'Agostino McGowan in Invited Oral Presentation

slides