# Statistical Myths and Misuse: Abandoning Outdated Statistical Practices

This talk will focus on best practices for using modern statistics in health sciences.

This talk will focus on best practices for using modern statistics in health sciences.

This 6 week series will cover causal inference model building and evaluation techniques. In this workshop, we’ll teach the essential elements of answering causal questions in R through causal diagrams, and causal modeling techniques such as propensity scores and inverse probability weighting. We’ll also show that by distinguishing predictive models from causal models, we can better take advantage of both tools. You’ll be able to use the tools you already know–the tidyverse, regression models, and more–to answer the questions that are important to your work.

This talk will focus on an application, ConTESSA, along with the accompanying R package, tti, designed to help quantify the impact of contact tracing programs. The talk will walk through the technical aspects of the underlying model as well as highlight how R, and in particular shiny, were used to create this product.

In this workshop, we’ll teach the essential elements of answering causal questions in R through causal diagrams, and causal modeling techniques such as propensity scores and inverse probability weighting.

The Wake Forest Conference on Analytics Impact is focused on the impactful use of analytics to solve problems in business, non-profits, government agencies and society. During the pandemic, government officials and healthcare professionals have more so than ever before, had to communicate to the public using healthcare data. How to communicate these data statistically and visually to influence people’s behavior has proven very challenging. What have we learned about communicating with data during this crisis? What did we get right and what failed? This year’s Conference on Analytics Impact is focused on communicating with health care data and lessons learned from the pandemic.

The Wake Forest Conference on Analytics Impact is focused on the impactful use of analytics to solve problems in business, non-profits, government agencies and society. During the pandemic, government officials and healthcare professionals have more so than ever before, had to communicate to the public using healthcare data. How to communicate these data statistically and visually to influence people’s behavior has proven very challenging. What have we learned about communicating with data during this crisis? What did we get right and what failed? This year’s Conference on Analytics Impact is focused on communicating with health care data and lessons learned from the pandemic.

Without strong communication skills, all the advanced analysis we have performed might be overrun. At this event, our expert panelists will share tips and advice on how to clearly and effectively communicate statistics, particularly in social media, and answer questions from the audience.

Without strong communication skills, all the advanced analysis we have performed might be overrun. At this event, our expert panelists will share tips and advice on how to clearly and effectively communicate statistics, particularly in social media, and answer questions from the audience.

The debate over the value and interpretation of p-value has endured since the time of its inception nearly 100 years ago. The use and interpretation of p-values vary by a host of factors, especially by discipline. These differences have proven to be a barrier when developing and implementing boundary-crossing clinical and translational science. The purpose of this panel discussion is to discuss misconceptions, debates, and alternatives to the p-value.

This talk will focus on leveraging social media to communicate statistical concepts. From summarizing other’s content to promoting your own work, we will discuss best practices for effective statistical communication that simultaneously is clear, engaging, and understandable while remaining rigorous and mathematically correct. It is increasingly important for people to be able to sift through what is important and what is noise, what is evidence and what is an anecdote. This talk focuses on techniques to strike an appropriate balance, with specifics on how to communicate complex statistical concepts in an engaging manner without sacrificing truth and content.

This talk will focus on leveraging social media to communicate statistical concepts. From summarizing other’s content to promoting your own work, we will discuss best practices for effective statistical communication that simultaneously is clear, engaging, and understandable while remaining rigorous and mathematically correct. It is increasingly important for people to be able to sift through what is important and what is noise, what is evidence and what is an anecdote. This talk focuses on techniques to strike an appropriate balance, with specifics on how to communicate complex statistical concepts in an engaging manner without sacrificing truth and content.

Clear statistical communication is both an educational and public health priority. This talk will focus on best practices for effective statistical communication that simultaneously is clear, engaging, and understandable while remaining rigorous and mathematically correct. It is increasingly important for people to be able to sift through what is important and what is noise, what is evidence and what is an anecdote. This talk focuses on techniques to strike an appropriate balance, with specifics on how to communicate complex statistical concepts in an engaging manner without sacrificing truth and content.

Clear statistical communication is both an educational and public health priority. This talk will focus on best practices for effective statistical communication that simultaneously is clear, engaging, and understandable while remaining rigorous and mathematically correct. It is increasingly important for people to be able to sift through what is important and what is noise, what is evidence and what is an anecdote. This talk focuses on techniques to strike an appropriate balance, with specifics on how to communicate complex statistical concepts in an engaging manner without sacrificing truth and content.

We are thrilled to host a series of experts to discuss their experiences working with different types of COVID-19 data, insights they’ve gleaned, and challenges they’ve encountered with these complex and rapidly evolving data.

In both data science and academic research, prediction modeling is often not enough; to answer many questions, we need to approach them causally. In this workshop, we’ll teach the essential elements of answering causal questions in R through causal diagrams, and causal modeling techniques such as propensity scores and inverse probability weighting. We’ll also show that by distinguishing predictive models from causal models, we can better take advantage of both tools. You’ll be able to use the tools you already know–the tidyverse, regression models, and more–to answer the questions that are important to your work.

We are interested in studying best practices for introducing students in statistics or data science to the programming language R. The “tidyverse” is a suite of R packages created to help with common statistics and data science tasks that follow a consistent philosophy. We have created two sets of online learning modules, one that introduces tidyverse concepts first and then dives into idiosyncrasies of R as a programming language, the second that takes a more traditional approach, first introducing R broadly and then following with an introduction to a particular suite of packages, the tidyverse. We have created a randomized study to examine whether the order certain concepts are introduced impacts whether learning objectives are met and/or how engaged students are with the material. This talk will focus on the mechanics of this study: how it was designed, how we enrolled participants, and how we evaluated outcomes.

Clear statistical communication is both an educational and public health priority. This session will focus on best practices for effective statistical communication that simultaneously is clear, engaging, and understandable while remaining rigorous and mathematically correct. The panelists have a range of experience with communicating complex statistical concepts to both technical and lay audiences via multiple communication mechanisms including podcasting, Twitter, engaging with journalists in print, and television correspondence on networks such as CNN and BBC. The session will begin with moderated questions posed by the organizer and then open the discussion to audience members.

This talk will cover two R packages: matahari ( https://github.com/jhudsl/matahari) and tidycode ( https://lucymcgowan.github.io/tidycode/). The matahari package is a simple package for tidy logging of everything you type into the R console. The tidycode package allows users to analyze R expressions in a tidy way (i.e. take the code captured from matahari and put it in a tidy table for downstream analysis with the tidyverse).

This talk will walk through Sir Austin Bradford Hill’s viewpoints for causality, using XKCD comics along the way.

In both data science and academic research, prediction modeling is often not enough; to answer many questions, we need to approach them causally. In this workshop, we’ll teach the essential elements of answering causal questions in R through causal diagrams, and causal modeling techniques such as propensity scores and inverse probability weighting. We’ll also show that by distinguishing predictive models from causal models, we can better take advantage of both tools. You’ll be able to use the tools you already know–the tidyverse, regression models, and more–to answer the questions that are important to your work.

This talk will focus on the tipr R package.

This talk will focus on an application, ConTESSA, along with the accompanying R package, tti, designed to help quantify the efficacy of contact tracing programs. The talk will walk through the technical aspects of the underlying model as well as highlight how R, and in particular shiny, were used to create this product.

This talk will focus on bringing pedagogical best practices into the data science classroom. We will begin by focusing on building confident coders, followed by an exploration of developing quantitative intuition, with a particular focus on understanding uncertainty. Finally, we will wrap up with tips for empowering strong data science communicators.

In the age of “big data” there is an information overload. It is increasingly important for people to be able to sift through what is important and what is noise, what is evidence and what is an anecdote. Accordingly, the effective communication of statistical concepts to diverse audiences is currently an education and public health priority. This talk focuses on techniques to strike an appropriate balance, with specifics on how to communicate complex statistical concepts in an engaging manner without sacrificing truth and content, specifically addressing how to help the general public read past headlines to the actual evidence, or lack there of. We will discuss engaging with the public via organizations such as TED Ed - focusing both best practices and lessons learned.

In both data science and academic research, prediction modeling is often not enough; to answer many questions, we need to approach them causally. In this workshop, we’ll teach the essential elements of answering causal questions in R through causal diagrams, and causal modeling techniques such as propensity scores and inverse probability weighting. We’ll also show that by distinguishing predictive models from causal models, we can better take advantage of both tools. You’ll be able to use the tools you already know–the tidyverse, regression models, and more–to answer the questions that are important to your work.

We are interested in studying best practices for introducing students in statistics or data science to the programming language R. The “tidyverse” is a suite of R packages created to help with common statistics and data science tasks that follow a consistent philosophy. We have created two sets of online learning modules, one that introduces tidyverse concepts first and then dives into idiosyncrasies of R as a programming language, the second that takes a more traditional approach, first introducing R broadly and then following with an introduction to a particular suite of packages, the tidyverse. We have created a randomized study to examine whether the order certain concepts are introduced impacts whether learning objectives are met and/or how engaged students are with the material. This talk will focus on the mechanics of this study: how it was designed, how we enrolled participants, and how we evaluated outcomes.

In this session we will discuss three different aspects of engaging students during the COVID-19 health crisis:

How do we engage students with a sensitive topic like COVID-19?

How do we engage students with COVID-19 data?

How do we engage students in a virtual environment?

With the current emphasis on reproducibility and replicability, there is an increasing need to examine how data analyses are conducted. In order to analyze the between researcher variability in data analysis choices as well as the aspects within the data analysis pipeline that contribute to the variability in results, we have created two R packages: matahari and tidycode. These packages build on methods created for natural language processing; rather than allowing for the processing of natural language, we focus on R code as the substrate of interest. The matahari package facilitates the logging of everything that is typed in the R console or in an R script in a tidy data frame. The tidycode package contains tools to allow for analyzing R calls in a tidy manner. We demonstrate the utility of these packages as well as walk through two examples.

This workshop covers set up, implementation, and tips and tricks for integrating RStudio Cloud in your classroom. RStudio Cloud is a great way to incorporate R in the classroom without the hassle of installation and complex set up.

This talk addresses challenges with making health record data and clinical trial data compatible. The data collected in trials is collected regularly and in an organized way, while data from health records is messier and more haphazard. A clinical trial has a clear start and endpoint, while health record data is collected continuously. Additionally, clinical trial participants may be healthier than patients we see in health records. Covariates are defined in advance for a trial, but must be predicted or imputed from the health record. In this talk I will discuss some of the challenges we have encountered in trying to integrate trial data with observational health records to improve power and design new trials.

We are in an exciting new age with access to an overwhelming amount of data and information. This talk will focus on three areas that have become increasingly important as a result. First, we will discuss the importance of reproducibility during this age of information overload. As quantitatively minded people, we are being pushed to innovate and develop best practices for reproducibility. We will talk a bit about tools that make this possible and the next steps in this important area. We will then discuss new opportunities for developing innovative methods, particularly in the observational research space. This portion will include a brief introduction to causal inference for the data scientist. Finally, we will examine the importance of well-developed communication skills for quantitatively savvy people. These aspects will be discussed in the context of my winding path to data science, speckled with some advice and lessons learned.

“If you’re navigating a dense information jungle, coming across a beautiful graphic or a lovely data visualization, it’s a relief. It’s like coming across a clearing in the jungle.” – David McCandless.

The ability to create polished, factual, and easily-understood data visualizations is a crucial skill for the modern statistician. Visualizations aid with all steps of the data analysis pipeline, from exploratory data analysis to effectively communicating results to a broad audience. This tutorial will first cover best practices in data visualization. We will then dive into a hands on experience building intuitive and elegant graphics using R with the ggplot2 package, a system for creating visualizations based on The Grammar of Graphics.

The principle limitation of all observational studies is the potential for unmeasured confounding. Various study designs may perform similarly in controlling for bias due to measured confounders while differing in their sensitivity to unmeasured confounding. Design sensitivity (Rosenbaum, 2004) quantifies the strength of an unmeasured confounder needed to nullify an observed finding. In this presentation, we explore how robust certain study designs are to various unmeasured confounding scenarios. We focus particularly on two exciting new study designs - ATM and ATO weights. We illustrate the performance in a large electronic health records based study and provide recommendations for sensitivity to unmeasured confounding analyses in ATM and ATO weighted studies, focusing primarily on the potential reduction in finite-sample bias.

Making believable causal claims can be difficult, especially with the much repeated adage “correlation is not causation”. This talk will walk through some tools often used to practice safe causation, such as propensity scores and sensitivity analyses. In addition, we will cover principles that suggest causation such as the understanding of counterfactuals, and applying Hill’s criteria in a data science setting. We will walk through specific examples, as well as provide R code for all methods discussed.

Join us for a GitHub journey, guided by Lucy D’Agostino McGowan! We’ll answer questions like:

What is so great about GitHub?

How can I make it work for me and my workflow?

How can I show the world some of the cool things I’m working on?

This will be a hands-on workshop that will give you all the tools to have a delightful time incorporating version control & R (and blogdown (
https://github.com/rstudio/blogdown) if you are so inclined). All levels are welcome!

There is an industry-wide push toward making workflows seamless and reproducible. Incorporating reproducibility into the workflow has many benefits; among them are increased transparency, time savings, and accuracy. We walk through how to seamlessly integrate SAS®, LaTeX, and R into a single reproducible document. We also discuss best practices for general principles such as literate programming and version control.

In studies where randomization is not possible, imbalance in baseline covariates (confounding by indication) is a fundamental concern. Propensity score matching (PSM) is a popular method to minimize this potential bias, matching individuals who received treatment to those who did not, to reduce the imbalance in pre-treatment covariate distributions. PSM methods continue to advance, as computing resources expand. Optimal matching, which selects the set of matches that minimizes the average difference in propensity scores between mates, has been shown to outperform less computationally intensive methods. However, many find the implementation daunting. SAS/IML® software allows the integration of optimal matching routines that execute in R, e.g. the R optmatch package. This presentation walks through performing optimal PSM in SAS® through implementing R functions, assessing whether covariate trimming is necessary prior to PSM. It covers the propensity score analysis in SAS, the matching procedure, and the post-matching assessment of covariate balance using SAS/STAT® 13.2 and SAS/IML procedures.

The Behavioral Risk Factor Surveillance System (BRFSS) collects data on health practices and risk behaviors via telephone survey. This study focuses on the question, On average, how many hours of sleep do you get in a 24-hour period? Recall bias is a potential concern in interviews and questionnaires, such as BRFSS. The 2013 BRFSS data is used to illustrate the proper methods for implementing PROC SURVEYREG and PROC SURVEYLOGISTIC, using the complex weighting scheme that BRFSS provides.

Existing health literacy assessment tools developed for research purposes have constraints that limit their utility for clinical practice. The measurement of health literacy in clinical practice can be impractical due to the time requirements of existing assessment tools. Single Item Literacy Screener (SILS) items, which are self-administered brief screening questions, have been developed to address this constraint. We developed a model to predict limited health literacy that consists of two SILS and demographic information (for example, age, race, and education status) using a sample of patients in a St. Louis emergency department. In this paper, we validate this prediction model in a separate sample of patients visiting a primary care clinic in St. Louis. Using the prediction model developed in the previous study, we use SAS/STAT® software to validate this model based on three goodness of fit criteria: rescaled R-squared, AIC, and BIC. We compare models using two different measures of health literacy, Newest Vital Sign (NVS) and Rapid Assessment of Health Literacy in Medicine Revised (REALM-R). We evaluate the prediction model by examining the concordance, area under the ROC curve, sensitivity, specificity, kappa, and gamma statistics. Preliminary results show 69% concordance when comparing the model results to the REALM-R and 66% concordance when comparing to the NVS. Our conclusion is that validating a prediction model for inadequate health literacy would provide a feasible way to assess health literacy in fast-paced clinical settings. This would allow us to reach patients with limited health literacy with educational interventions and better meet their information needs.