Partitioned Local Depth (PaLD) Community Analyses in R

Partitioned Local Depth (PaLD) is a framework for holistic consideration of community structure for distance-based data. This paper describes an R package, pald, for calculating Partitioned Local Depth (PaLD) probabilities, implementing community analyses, determining community clusters, and creating data visualizations to display community structure. We present essentials of the PaLD approach, describe how to use the pald package, walk through several examples, and discuss the method in relation to commonly used techniques.

Quantifying the Alignment of a Data Analysis between Analyst and Audience

Using Mathlink Cubes to Introduce Data Wrangling with Examples in R

This article explores an innovative approach to teaching data wrangling skills to students through hands-on activities before transitioning to coding. Data wrangling, a critical aspect of data analysis, involves cleaning, transforming, and restructuring data. We introduce the use of a physical tool, mathlink cubes, to facilitate a tangible understanding of datasets. This approach helps students grasp the concepts of data wrangling before implementing them in coding languages such as R. We detail a classroom activity that includes hands-on tasks paralleling common data wrangling processes such as filtering, selecting, and mutating, followed by their coding equivalents using R’s dplyr package.

Combining Straight-Line and Map-Based Distances to Investigate the Connection Between Proximity to Healthy Foods and Disease

Healthy foods are essential for a healthy life, but accessing healthy food can be more challenging for some people than others. This disparity in food access may lead to disparities in well-being, potentially with disproportionate rates of diseases in communities that face more challenges in accessing healthy food (i.e., low-access communities). Identifying low-access, high-risk communities for targeted interventions is a public health priority, but current methods to quantify food access rely on distance measures that are either computationally simple (like the length of the shortest straight-line route) or accurate (like the length of the shortest map-based driving route), but not both.

Data Jamboree: A Party of Open-Source Software Solving Real-World Data Science Problems

The evolving focus in statistics and data science education highlights the growing importance of computing. This paper presents the Data Jamboree, a live event that combines computational methods with traditional statistical techniques to address real-world data science problems. Participants, ranging from novices to experienced users, followed workshop leaders in using open-source tools like Julia, Python, and R to perform tasks such as data cleaning, manipulation, and predictive modeling. The Jamboree showcased the educational benefits of working with open data, providing participants with practical, hands-on experience.

The ‘Why’ behind including ‘Y’ in your imputation model

Missing data is a common challenge when analyzing epidemiological data, and imputation is often used to address this issue. Here, we investigate the scenario where a covariate used in an analysis has missingness and will be imputed. There are recommendations to include the outcome from the analysis model in the imputation model for missing covariates, but it is not necessarily clear if this recommendation always holds and why this is sometimes true.

Power and sample size calculations for testing the ratio of reproductive values in phylogenetic samples

The quality of the inferences we make from pathogen sequence data is determined by the number and composition of pathogen sequences that make up the sample used to drive that inference. However, there remains limited guidance on how to best structure and power studies when the end goal is phylogenetic inference. One question that we can attempt to answer with molecular data is whether some people are more likely to transmit a pathogen than others.

The Study of the Epidemiology of Pediatric Hypertension Registry (SUPERHERO): Rationale and Methods

Causal Inference is not just a statistics problem

This paper introduces a collection of four data sets, similar to Anscombe’s Quartet, that aim to highlight the challenges involved when estimating causal effects. Each of the four data sets is generated based on a distinct causal mechanism: the first involves a collider, the second involves a confounder, the third involves a mediator, and the fourth involves the induction of M-Bias by an included factor. The paper includes a mathematical summary of each data set, as well as directed acyclic graphs that depict the relationships between the variables.

Design Principles for Data Analysis

The data revolution has led to an increased interest in the practice of data analysis. While much has been written about statistical thinking, a complementary form of thinking that appears in the practice of data analysis is design thinking – the problem-solving process to understand the people for whom a solution is being designed. For a given problem, there can be significant or subtle differences in how a data analyst (or producer of a data analysis) constructs, creates, or designs a data analysis, including differences in the choice of methods, tooling, and workflow.

Randomized controlled trial: Quantifying the impact of disclosing uncertainty on adherence to hypothetical health recommendations

We conducted a randomized controlled trial to assess whether disclosing elements of uncertainty in an initial public health statement will change the likelihood that participants will accept new, different advice that arises as more evidence is uncovered. Proportional odds models were fit, stratified by the baseline likelihood to agree with the final advice. 298 participants were randomized to the treatment arm and 298 in the control arm. Among participants who were more likely to agree with the final recommendation at baseline, those who were initially shown uncertainty had a 46% lower odds of being more likely to agree with the final recommendation compared to those who were not (OR: 0.

Sensitivity Analyses for Unmeasured Confounders

This review expands on sensitivity analyses for unmeasured confounding techniques, demonstrating state-of-the-art methods as well as specifying which should be used under various scenarios, depending on the information about a potential unmeasured confounder available to the researcher. Recent Findings: Methods to assess how sensitive an observed estimate is to unmeasured confounding have been developed for decades. Recent advancements have allowed for the incorporation of measured confounders in these assessments, updated the methods used to quantify the impact of an unmeasured confounder, whether specified in terms of the magnitude of the effect from a regression standpoint, for example, as a risk ratio, or with respect to the percent of variation in the outcome or exposure explained by the unmeasured confounder.

Pathogenesis, Symptomatology, and Transmission of SARS-CoV-2 through Analysis of Viral Genomics and Structure

The novel coronavirus SARS-CoV-2, which emerged in late 2019, has since spread around the world and infected hundreds of millions of people with coronavirus disease 2019 (COVID-19). While this viral species was unknown prior to January 2020, its similarity to other coronaviruses that infect humans has allowed for rapid insight into the mechanisms that it uses to infect human hosts, as well as the ways in which the human immune system can respond.

Will Podcasting and Social Media Replace Journals and Traditional Science Communication? No, but…

Maximizing and evaluating the impact of test-trace-isolate programs: A modeling study

Evaluating Sources of Bias in Observational Studies of ACE Inhibitor/ARB Use During COVID-19: Beyond Confounding

Quantifying uncertainty in infectious disease mechanistic models

Tools for Analyzing R Code the Tidy Way

Mental health conditions and the risk of chronic opioid therapy among patients with rheumatoid arthritis: a retrospective veterans affairs cohort study

Welcome to the Tidyverse

Comparative Effectiveness of Two Collagen-containing Dressings: Oxidized Regenerated Cellulose (ORC)/Collagen/Silver-ORC Dressing Versus Ovine Collagen Extracellular Matrix

Meta-analysis Comparing Outcomes of Two Different Negative Pressure Therapy Systems in Closed Incision Management.

Background:Closed incision negative pressure therapy (ciNPT) is an emerging approach to managing closed incisions of patients at risk of postoperative complications. There are primarily 2 different commercially available ciNPT systems. Both systems consist of a single-use, battery-powered device and foam- or gauze-based peel-and-place dressing designed for closed incisions. These systems vary in design, and there are no data comparing outcomes between the 2 systems. Methods:We performed 2 separate meta-analyses to compare surgical site infection (SSI) rates postuse of (1) ciNPT with foam dressing (FOAM) versus conventional dressings and (2) ciNPT with multilayer absorbent dressing (MLA) versus conventional dressings.

Caring for Critically Ill Patients with the ABCDEF Bundle: Results of the ICU Liberation Collaborative in Over 15,000 Adults

Objective: Decades-old, common ICU practices including deep sedation, immobilization, and limited family access are being challenged. We endeavoured to evaluate the relationship between ABCDEF bundle performance and patient-centered outcomes in critical care. Design: Prospective, multicenter, cohort study from a national quality improvement collaborative. Setting: 68 academic, community, and federal ICUs collected data during a 20-month period. Patients: 15,226 adults with at least one ICU day. Interventions: We defined ABCDEF bundle performance (our main exposure) in two ways: 1) complete performance (patient received every eligible bundle element on any given day) and 2) proportional performance (percentage of eligible bundle elements performed on any given day).

Metformin use and incidence cancer risk: evidence for a selective protective effect against liver cancer

Purpose Several observational studies suggest that metformin reduces incidence cancer risk; however, many of these studies suffer from time-related biases and several cancer outcomes have not been investigated due to small sample sizes. Methods We constructed a propensity score-matched retrospective cohort of 84,434 veterans newly prescribed metformin or a sulfonylurea as monotherapy. We used Cox proportional hazard regression to assess the association between metformin use compared to sulfonylurea use and incidence cancer risk for 10 solid tumors.

Second-generation p-values: Improved rigor, reproducibility, & transparency in statistical analyses

Verifying that a statistically significant result is scientifically meaningful is not only good scientific practice, it is a natural way to control the Type I error rate. Here we introduce a novel extension of the p-value—a second-generation p-value ($p_δ$)–that formally accounts for scientific relevance and leverages this natural Type I Error control. The approach relies on a pre-specified interval null hypothesis that represents the collection of effect sizes that are scientifically uninteresting or are practically null.

Comparative Safety of Sulfonylurea and Metformin Monotherapy on the Risk of Heart Failure: A Cohort Study

Background Medications that impact insulin sensitivity or cause weight gain may increase heart failure risk. Our aim was to compare heart failure and cardiovascular death outcomes among patients initiating sulfonylureas for diabetes mellitus treatment versus metformin. Methods and Results National Veterans Health Administration databases were linked to Medicare, Medicaid, and National Death Index data. Veterans aged ≥18 years who initiated metformin or sulfonylureas between 2001 and 2011 and whose creatinine was <1.

Location Bias in ROC Studies

Location bias occurs when a reader detects a false lesion in a subject with disease and the falsely detected lesion is considered a true positive. In this study, we examine the effect of location bias in two large MRMC ROC studies, comparing three ROC scoring methods. We compare one method that only uses the maximum confidence score and does not take location bias into account (maxROC), and two methods that take location bias into account: the region of interest ROC (ROI–ROC) and the free-response ROC (FROC).

Secondary consent to biospecimen use in a prostate cancer biorepository

Background Biorepository research has substantial societal benefits. This is one of the few studies to focus on male willingness to allow future research use of biospecimens. Methods This study analyzed the future research consent questions from a prostate cancer biorepository study (N = 1931). The consent form asked two questions regarding use of samples in future studies (1) without and (2) with protected health information (PHI). Yes to both questions of use of samples was categorized as Yes-Always; Yes to without and No to with PHI was categorized as Yes-Conditional; No to without PHI was categorized as Never.

Quantitative evaluation of the community research fellows training program

Context The community research fellows training (CRFT) program is a community-based participatory research (CBPR) initiative for the St. Louis area. This 15-week program, based on a Master in Public Health curriculum, was implemented by the Division of Public Health Sciences at Washington University School of Medicine and the Siteman Cancer Center. Objectives We measure the knowledge gained by participants and evaluate participant and faculty satisfaction of the CRFT program both in terms of meeting learning objectives and actively engaging the community in the research process.

Mouse low-grade gliomas contain cancer stem cells with unique molecular and functional properties

The availability of adult malignant glioma stem cells (GSCs) has provided unprecedented opportunities to identify the mechanisms underlying treatment resistance. Unfortunately, there is a lack of comparable reagents for the study of pediatric low-grade glioma (LGG). Leveraging a neurofibromatosis-1 (Nf1) genetically-engineered mouse LGG model, we report the isolation of CD133+ multi-potent low-grade glioma stem cells (LG-GSCs), which generate glioma-like lesions histologically similar to the parent tumor following injection into immunocompetent hosts.

Effects of racial and ethnic group and health literacy on responses to genomic risk information in a medically underserved population

Objective Few studies have examined how individuals respond to genomic risk information for common, chronic diseases. This randomized study examined differences in responses by type of genomic information [genetic test/family history] and disease condition [diabetes/heart disease] and by race/ethnicity in a medically underserved population. Methods 1057 English-speaking adults completed a survey containing one of four vignettes (two-by-two randomized design). Differences in dependent variables (i.e., interest in receiving genomic assessment, discussing with doctor or family, changing health habits) by experimental condition and race/ethnicity were examined using chi-squared tests and multivariable regression analysis.

Is low health literacy associated with increased emergency department utilization and recidivism?

Objectives The objective was to determine whether patients with low health literacy have higher emergency department (ED) utilization and higher ED recidivism than patients with adequate health literacy. Methods The study was conducted at an urban academic ED with more than 95,000 annual visits that is part of a 13-hospital health system, using electronic records that are captured in a central data repository. As part of a larger, cross-sectional, convenience sample study, health literacy testing was performed using the short test of functional health literacy in adults (S-TOFHLA) and standard test thresholds identifying those with inadequate, marginal, and adequate health literacy.

Screening for colorectal cancer: Using data to set prevention priorities

Introduction Adherence to colorectal cancer screening recommendations is known to vary by state, but less information is available about within-state variability. In the current study, we assess county-level screening rates for Missouri, with the goal of better targeting public health efforts to increase screening. Methods Prevalence of colorectal cancer screening among Missouri adults between the ages of 50 and 74 was obtained from 2008 and 2010 Behavioral Risk Factor Surveillance System data.

Using small-area analysis to estimate county-level racial disparities in obesity demonstrating the necessity of targeted interventions

Data on the national and state levels is often used to inform policy decisions and strategies designed to reduce racial disparities in obesity. Obesity-related health outcomes are realized on the individual level, and policies based on state and national-level data may be inappropriate due to the variations in health outcomes within and between states. To examine county-level variation of obesity within states, we use a small-area analysis technique to fill the void for county-level obesity data by race.

Peer Reviewed Article