Recent & Upcoming Talks


ConTESSA: A Shiny App to Help Quantify Contact Tracing Efficacy

This talk will focus on an application, ConTESSA, along with the accompanying R package, tti, designed to help quantify the efficacy of contact tracing programs. The talk will walk through the technical aspects of the underlying model as well as highlight how R, and in particular shiny, were used to create this product.

Equipping and empowering future data scientists with confidence, intuition, and communication skills

This talk will focus on bringing pedagogical best practices into the data science classroom. We will begin by focusing on building confident coders, followed by an exploration of developing quantitative intuition, with a particular focus on understanding uncertainty. Finally, we will wrap up with tips for empowering strong data science communicators.

October 15, 2020

9:00 AM – 10:00 AM

JupyterCon 2020

By Lucy D'Agostino McGowan in Invited Keynote


The Ups and Downs of Communicating Complex Statistics

In the age of “big data” there is an information overload. It is increasingly important for people to be able to sift through what is important and what is noise, what is evidence and what is an anecdote. Accordingly, the effective communication of statistical concepts to diverse audiences is currently an education and public health priority. This talk focuses on techniques to strike an appropriate balance, with specifics on how to communicate complex statistical concepts in an engaging manner without sacrificing truth and content, specifically addressing how to help the general public read past headlines to the actual evidence, or lack there of. We will discuss engaging with the public via organizations such as TED Ed - focusing both best practices and lessons learned.

Causal Inference in R

In both data science and academic research, prediction modeling is often not enough; to answer many questions, we need to approach them causally. In this workshop, we’ll teach the essential elements of answering causal questions in R through causal diagrams, and causal modeling techniques such as propensity scores and inverse probability weighting. We’ll also show that by distinguishing predictive models from causal models, we can better take advantage of both tools. You’ll be able to use the tools you already know–the tidyverse, regression models, and more–to answer the questions that are important to your work.

July 29 – 20, 2020

10:00 AM – 12:00 PM


By Lucy D'Agostino McGowan and Malcolm Barrett in Invited Workshop


Best Practices for Teaching R A Randomized Controlled Trial

We are interested in studying best practices for introducing students in statistics or data science to the programming language R. The “tidyverse” is a suite of R packages created to help with common statistics and data science tasks that follow a consistent philosophy. We have created two sets of online learning modules, one that introduces tidyverse concepts first and then dives into idiosyncrasies of R as a programming language, the second that takes a more traditional approach, first introducing R broadly and then following with an introduction to a particular suite of packages, the tidyverse. We have created a randomized study to examine whether the order certain concepts are introduced impacts whether learning objectives are met and/or how engaged students are with the material. This talk will focus on the mechanics of this study: how it was designed, how we enrolled participants, and how we evaluated outcomes.

Panel: Engaging Students during the COVID-19 Health Crisis

In this session we will discuss three different aspects of engaging students during the COVID-19 health crisis:

How do we engage students with a sensitive topic like COVID-19?
How do we engage students with COVID-19 data?
How do we engage students in a virtual environment?

May 20, 2020

11:00 AM – 12:00 PM

eCOTS 2020

By Laura Le, Kari Lock Morgan, Lucy D'Agostino McGowan in Invited Panel


Tools for analyzing R code the tidy way

With the current emphasis on reproducibility and replicability, there is an increasing need to examine how data analyses are conducted. In order to analyze the between researcher variability in data analysis choices as well as the aspects within the data analysis pipeline that contribute to the variability in results, we have created two R packages: matahari and tidycode. These packages build on methods created for natural language processing; rather than allowing for the processing of natural language, we focus on R code as the substrate of interest. The matahari package facilitates the logging of everything that is typed in the R console or in an R script in a tidy data frame. The tidycode package contains tools to allow for analyzing R calls in a tidy manner. We demonstrate the utility of these packages as well as walk through two examples.

Using RStudio Cloud in the Classroom

This workshop covers set up, implementation, and tips and tricks for integrating RStudio Cloud in your classroom. RStudio Cloud is a great way to incorporate R in the classroom without the hassle of installation and complex set up.

January 29, 2020

4:30 PM – 5:30 PM

ASA K-12 Virtual Workshops 2020

By Lucy D'Agostino McGowan and Shannon Ellis in Invited Workshop



Challenges in Augmenting Randomized Trials with Observational Health Records

This talk addresses challenges with making health record data and clinical trial data compatible. The data collected in trials is collected regularly and in an organized way, while data from health records is messier and more haphazard. A clinical trial has a clear start and endpoint, while health record data is collected continuously. Additionally, clinical trial participants may be healthier than patients we see in health records. Covariates are defined in advance for a trial, but must be predicted or imputed from the health record. In this talk I will discuss some of the challenges we have encountered in trying to integrate trial data with observational health records to improve power and design new trials.