Statistical Theory and Methods
Bayesian Methods
In contrast to frequentist approaches, Bayesian methods provide a principled framework for combining data with prior information when making inferences. Bayesian methods allow for more precision in small samples. In large samples, Bayesian nonparametric/machine learning methods can capture complex, nonlinear relationships in the data to produce accurate predictions and uncertainty quantification. Bayesian methods are widely used to solve complex inference problems in microsimulations, genomics, causal inference, missing data problems, and more.
-
Roberta DeVito
Thomas J. and Alice M. Tisch Assistant Professor of Biostatistics and Data Science -
Arman Oganisian
Assistant Professor of Biostatistics
Bioinformatics
Bioinformatics research includes the development and application of novel statistical methodology for analyzing complex biological data typically at a molecular level (nucleic acid, proteins and metabolites), often referred to as –omics data. The methods development includes: data preprocessing for obtaining more accurate and precise measurements from new technologies; identifying biomarkers associated with various phenotypes of interest; identifying the interaction between genetic components; identifying the interaction between genetic and environmental factors in development, disease etiology and evolution; discovering biological networks and their dynamics in the biological systems ranging from single cell to large human populations. Examples of bioinformatics research at the Center include novel methods for analyzing -omics data including genome, epigenome, proteome, and transcriptome, as well as collaborative projects involving bioinformatics in cancer, evolution, aging and development.
-
Zhijin (Jean) Wu
Professor of Biostatistics, Director of the Doctoral Graduate Program in Biostatistics -
Ying Ma
Assistant Professor of Biostatistics
Biomarker Evaluation
Biomarker refers to a broad subcategory of medical or imaging characteristics that can be measured and evaluated as indicators of normal biological processes, pathogenic processes, or pharmacologic responses to a therapeutic intervention. Biomarkers serve numerous purposes, including the assessment of risk, the screening and identification of diseases, diagnostic processes, prognostic evaluations, determining the likelihood of therapeutic benefit, and monitoring the progression or remission of diseases. Studies on prognostic biomarkers aim to ascertain or investigate the ability of biomarkers to distinguish between risks of specific clinical outcomes. Studies of predictive biomarkers focus on evaluating the utility of a biomarker in forecasting the likelihood of a patient experiencing benefit from a specified therapeutic intervention.
Statistical methods are pivotal in the evaluation of biomarkers to ensure that the assessments are reliable, valid, and applicable to clinical practice. Receiver Operating Characteristic (ROC) Analysis is a common tool used to assess the diagnostic accuracy of biomarkers. Logistic regression models can estimate the probability of a disease or condition as a function of a biomarker's level, while controlling for other variables, which can help in understanding the independent effect of a biomarker on disease risk. Survival Analysis allows researchers to relate biomarker levels to time-to-event data, which is particularly important in prognostic biomarker evaluation. Advanced statistical methods, including machine learning algorithms like random forests and neural networks, can handle complex interactions between multiple biomarkers and other variables, potentially identifying patterns that might not be evident with more traditional statistical methods.
-
Fenghai Duan
Professor of Biostatistics
Causal Inference
Randomized clinical trials are the gold standard for estimating the effects of interventions. However, in many studies in medicine, epidemiology and public health, randomized trials may suffer from unintended complications or are infeasible because of financial, ethical or logistical considerations. Faculty member Roee Gutman has been developing statistical methods to address these issues. In observational studies, where interventions are not randomized, causal estimates can easily be biased due to confounding. Several faculty members in the Center have been working on addressing issues related to confounding and dependencies in causal inference problems in observational studies.
-
Jon Steingrimsson
Associate Professor of Biostatistics, Director of the NextGen Graduate Program in Biostatistics -
Youjin Lee
Manning Assistant Professorship of Biostatistics -
Arman Oganisian
Assistant Professor of Biostatistics -
Roee Gutman
Professor of Biostatistics -
Joseph W. Hogan
Carole and Lawrence Sivorich Professor of Public Health, Professor of Biostatistics, Department Chair of Biostatistics -
Tao Liu
Professor of Biostatistics -
Christopher Schmid
Professor of Biostatistics
Clinical Trials Methodology
Clinical trials prospectively assign participants to two or more interventions and prospectively measure outcomes on the participants. Clinical trials are the gold standard for studying the effectiveness of interventions and are usually required for regulatory approval. Several members of the department are involved in developing methods for design, analysis, and interpretation of clinical trials. For example, Dr. Schmid and others have published extensively on meta-analysis for clinical trials and Dr. Schmid is a leader in the developments of methods for and applications of N-of-1 trials which are single person multiple cross-over studies. Dr. Steingrimsson works on developing machine learning methods for data-driven discovery of subgroups with enhanced treatment effects. ECOG-ACRIN has carried out large trials comparing different screening modalities in cancer.
-
Jon Steingrimsson
Associate Professor of Biostatistics, Director of the NextGen Graduate Program in Biostatistics -
Christopher Schmid
Professor of Biostatistics
Data Fusion
In many applications in public health, medicine and social science, patient characteristics are dispersed over multiple files, platforms, and/or studies. Analysis that links two or more separate data sources is increasingly important as researchers seek to integrate administrative and clinical datasets while adapting to privacy regulations that limit access to unique identifiers. Dr. Gutman has developed novel Bayesian procedures to link units that appear in two datasets by treating the unknown linking as missing data. He is collaborating with health services researchers and clinicians to estimate the effects of policies and interventions as well as predict health outcomes from clinical and demographic variables. Also Dr. De Vito has developed novel statistical techniques to integrate multiple studies in one task, to concurrently estimate common characteristics shared among all the studies and study-specific component.
-
Roee Gutman
Professor of Biostatistics -
Roberta DeVito
Thomas J. and Alice M. Tisch Assistant Professor of Biostatistics and Data Science
Data Science
Data science lives at the intersection of statistics, computational sciences, and domain matter knowledge. The Center for Statistical Sciences is heavily invested in health data science through a variety of projects in areas such as computational biology, machine learning, Bayesian statistics, network analysis, causal inference with big data, and analysis of neuroimaging data. The Department of Biostatistics is one of four core departments in Brown's Data Science Initiative. Dr. Hogan serves as the Deputy Director of the Initiative, Dr. De Vito is currently a member of the Data Science Executive Committee, and Dr. Eloyan is a member of the DSI Campus Advisory Board.
-
Joseph W. Hogan
Carole and Lawrence Sivorich Professor of Public Health, Professor of Biostatistics, Department Chair of Biostatistics -
Roberta DeVito
Thomas J. and Alice M. Tisch Assistant Professor of Biostatistics and Data Science -
Ani Eloyan
Vice Chair of the Department of Biostatistics, Associate Professor of Biostatistics -
Youjin Lee
Manning Assistant Professorship of Biostatistics -
Arman Oganisian
Assistant Professor of Biostatistics
Deep Learning
Loosely speaking, deep learning is a branch of machine learning that uses multi-layer neural networks to build models for several tasks such as prediction or diagnosis. The unknown parameters, commonly referred to as weights, are estimated by minimizing a loss function often subject to some form of regularization. Deep learning has shown promise in many domains, with imaging based analysis being a common application area where deep learning models have shown promising performance. Several center members are working on deep learning related research including analyzing medical images, uncertainty quantification, and interpretability of deep learning models.
-
Jon Steingrimsson
Associate Professor of Biostatistics, Director of the NextGen Graduate Program in Biostatistics -
Fenghai Duan
Professor of Biostatistics
High Dimensional Data Methods
With the rapid advancement of technology, data collection procedures have continuously improved. This has created a crucial need for statistical methods that can handle massive (and often noisy) data sets in many application areas. Center faculty have been at the forefront of developing theory and software that address key challenges when working in such high-dimensional settings. This broadly includes, but is not limited to dealing with missing data, finding scalable solutions for estimating model parameters, overcoming combinatorial issues when trying to identify nonlinear interactions, effectively modeling non-continuous outcomes (e.g. categorical data), and quantifying uncertainty with novel model validation/calibration techniques.
-
Stavroula Chrysanthopoulou
Assistant Professor of Biostatistics, Director of the Master's Graduate Program in Biostatistics -
Lorin Crawford
Distinguished Senior Fellow in Biostatistics -
Roberta DeVito
Thomas J. and Alice M. Tisch Assistant Professor of Biostatistics and Data Science -
Fenghai Duan
Professor of Biostatistics -
Zhijin (Jean) Wu
Professor of Biostatistics, Director of the Doctoral Graduate Program in Biostatistics
Latent Variable Modeling
Latent variable models link observed (or manifest) variables to unobserved (or latent) constructs. They comprise of two parts: a measurement model specifying the relationship between manifest and latent variables, and a structural model delineating the relationships among the latent variables themselves. Both the manifest and the latent variables can be either discrete or continuous in nature. When both are continuous, one obtains the factor analytic models used widely in psychology, e.g., to measure latent constructs such as human intelligence. When both are discrete, one obtains the latent class models used to categorize observations into distinct groups, e.g., to classify individuals into diseased vs. non-diseased according to their constellation of symptoms. Widely used in educational testing are Item Response Theory models (also known as Latent Trait models) that relate a group of categorical manifest variables to a continuous latent variable, e.g., using answers to a multiple choice test to measure mastery of a particular academic subject. Finally, finite mixture models (also known as Latent Profile Analysis) relate a set of continuous manifest variables to underlying categorical constructs, e.g., by partitioning clinical trial participants into homogeneous groups across behavioral and cognitive dimensions of engagement with physical activity interventions. Originally developed for cross-sectional data, latent variable models have recently been generalized to longitudinal data. For example, Latent Transition Analysis has been used to model movement across stages of change in studies of smoking cessation. An example of latent variable modeling by our faculty is given by the 2-parameter logistic IRT models fit to the DSM-IV criteria for nicotine dependence by Dr. Papandonatos and his students. They uncovered a 2-dimensional structure with two positively correlated latent factors, thus contradicting conventional wisdom that DSM-IV symptoms measure a single dimension of liability to nicotine dependence.
-
George Papandonatos
Professor of Biostatistics (R)
Longitudinal & Multivariate Data
Data from the Public Health and Medical Research are often subject to clustering either due to the way they are collected, e.g., multiple observations on the same subject over the duration of the observation period (longitudinal data) or due to some other inherent heterogeneity between groups (strata) of the sampling units. Advanced multivariate statistical methods (e.g., Generalized Estimating Equations (GEE) and Mixed-Effects models) have been developed to correctly account for and describe the sources of heterogeneity and variability/correlation structure between and within groups of study subjects. Multivariate statistical methodology involves detecting, analyzing, and characterizing associations among multidimensional data. Related supervised or unsupervised techniques are mainly concerned with the dimension reduction of a system. Center faculty conduct extensive research on novel statistical techniques for analyzing longitudinal and multivariate data including methods for analyzing individual and aggregated results from personalized (N-of-1) trials of treatment interventions, methods for developing and assessing predictive models for ordinal health outcomes.
-
Stavroula Chrysanthopoulou
Assistant Professor of Biostatistics, Director of the Master's Graduate Program in Biostatistics -
Christopher Schmid
Professor of Biostatistics
Meta-Analysis
Center faculty are leaders in developing and applying methods for meta-analysis, the quantitative combination of results from different studies. Prof. Gatsonis has pioneered the use of hierarchical summary ROC curves for assessing sensitivity and specificity and is developing methods for summarizing the predictive accuracy of diagnostic tests. Prof. Trikalinos heads the Center for Evidence Synthesis in Health which he co-founded with Prof. Schmid. They have developed a variety of different methods and software tools for synthesizing different types of data and studies including meta-analysis of diagnostic tests, multivariate outcomes and networks of treatments. Prof. Schmid also heads the Evidence Synthesis Academy, which aims to promote the wider use and understanding of meta-analysis among decision-makers.
-
Constantine Gatsonis
Henry Ledyard Goddard Professor of Biostatistics, Director of the Center for Biostatistics and Health Data Science -
Thomas Trikalinos
Professor of Health Services, Policy & Practice and Professor of Biostatistics, Director of the Center for Evidence Synthesis in Health -
Christopher Schmid
Professor of Biostatistics -
Roberta DeVito
Thomas J. and Alice M. Tisch Assistant Professor of Biostatistics and Data Science -
George Papandonatos
Professor of Biostatistics (R)
Methods for Health Services & Outcomes Research
Health Services Research (HSR) is a field in public health that investigates the how social factors, policies, insurance systems, organizational structures and processes, health technologies, and personal behaviors influence access, quality and cost of health care. When HSR studies involves comparison of interventions, these studies are sometimes referred to as comparative effectiveness studies. Studies in HSR commonly involve data sources that are not collected for research purposes (e.g. claims, electronic health records (EHR), etc.). Thus, they may suffer from complexities that are mitigated in well-designed prospective studies. For example, EHR may suffer from missing values because patients change their providers. Identifying patient chronic condition may be recorded inaccurately in claims data when a condition does not affect payments. Statistical methods to address these complexities attempt to obtain accurate and precise estimates. Dr. Hogan, Dr. Gatsonis, Dr. Liu, Dr. Steingrimsson and Dr. Gutman develop various methods to address different complexities that arise in such studies.
-
Stavroula Chrysanthopoulou
Assistant Professor of Biostatistics, Director of the Master's Graduate Program in Biostatistics -
Ilana F. Gareen
Associate Professor of Epidemiology (Research) -
Roee Gutman
Professor of Biostatistics -
Christopher Schmid
Professor of Biostatistics
Methods for HIV/AIDS
Statistical methodology research on HIV/AIDS spans a broad spectrum and includes statistical causal inference (e.g. causal pathway analysis of HIV intervention involving behavioral changes); statistical/machine learning methods (e.g. super-learning for risk modeling of treatment failure and prediction); Bayesian statistical modeling of the treatment continuum; clinical decision making for optimizing HIV treatment in resource limited settings; micro-simulation modeling; etc. Professors Hogan, Liu, and Chrysanthopoulou’s collaborative and methodological research has secured rich research fund from NIAID, NIAAA, NIAID, NHLBI, NICHD, USAID, etc.
-
Stavroula Chrysanthopoulou
Assistant Professor of Biostatistics, Director of the Master's Graduate Program in Biostatistics -
Joseph W. Hogan
Carole and Lawrence Sivorich Professor of Public Health, Professor of Biostatistics, Department Chair of Biostatistics -
Tao Liu
Professor of Biostatistics -
Jon Steingrimsson
Associate Professor of Biostatistics, Director of the NextGen Graduate Program in Biostatistics
Missing Data Methods
Missing data are unavoidable in many studies, especially those that collect information on humans. Failure to address missing data may result in misleading conclusions. A dataset may contain missing values for a variety of reasons. For example, survey respondents may refuse to answer questions of a sensitive nature or patients participating in longitudinal studies may drop out before its conclusion. Center faculty have been at the forefront of developing statistical methods to handle missing data. Specifically, Joseph Hogan has done significant work on missing data in longitudinal studies and sensitivity analysis. Roee Gutman has developed various imputation methods for application in health services research.
-
Stavroula Chrysanthopoulou
Assistant Professor of Biostatistics, Director of the Master's Graduate Program in Biostatistics -
Roee Gutman
Professor of Biostatistics -
Joseph W. Hogan
Carole and Lawrence Sivorich Professor of Public Health, Professor of Biostatistics, Department Chair of Biostatistics -
Tao Liu
Professor of Biostatistics -
Christopher Schmid
Professor of Biostatistics -
Jon Steingrimsson
Associate Professor of Biostatistics, Director of the NextGen Graduate Program in Biostatistics
Modeling and Microsimulation
Simulation models have been broadly used as a valuable tool in cost-effectiveness analyses, comparative effectiveness research, etc., for evidence-informed Public Health Decision making. Recent advancements in computing technology have facilitated the development of increasingly intricate predictive models aimed at describing complex health processes and systems using Monte Carlo simulation techniques. Depending on their specific characteristics there is a large variety of these models including, but not limited to, state transition, discrete event simulation, dynamic transmission, compartmental, microsimulation, and agent-based models. Microsimulation models, in particular, synthesize information from multiple resources and use computer technology to combine mathematical and statistical models for simulating individual trajectories related to the course of a disease, usually in conjunction with some treatment or other interventions.
Center faculty have extensive expertise in this area, working on statistical approaches for developing, evaluating, and implementing this type of simulation models with applications to cancer, sexually transmitted diseases, opioid use disorder, COVID-19, dementia, etc. Dr Trikalinos is the PI of the NCI CISNET bladder cancer incubator site and has been core PI of several projects involving development and applications of simulation models for Public Health Decision Making. Dr Chrysanthopoulou specializes in statistical techniques for calibration, validation, and predictive accuracy assessment of microsimulation models, has developed the open-source MIcrosimulation Lung Cancer (MILC) model of the natural history of lung cancer, and is involved in collaborative projects at Brown University and other institutions for building complex simulation models used in decision analysis.
-
Stavroula Chrysanthopoulou
Assistant Professor of Biostatistics, Director of the Master's Graduate Program in Biostatistics -
Thomas Trikalinos
Professor of Health Services, Policy & Practice and Professor of Biostatistics, Director of the Center for Evidence Synthesis in Health
N-of-1 Studies
N-of-1 trials are randomized multi-crossover experiments conducted on a single individual in order to determine the personalized relative efficacy of two or more treatments measured repeatedly over time. Prof. Schmid and a team of graduate students are developing time series and multilevel methods and software for the design and analysis of single trials as well as the meta-analysis of a series of N-of-1 trials that can estimate both individual and population level effects. The group has served as the analytic hub for several large federally and non-federally funded studies using the N-of-1 framework. These include alternative treatments for chronic pain, diets for inflammatory bowel disease, triggers of atrial fibrillation and behavioral interventions for anxiety and stress. The group is collaborating with other Brown scientists to develop a mobile app that can flexibly setup, run and analyze and interpret data from one or more N-of-1 trials.
-
Christopher Schmid
Professor of Biostatistics
Social Network Analysis
Statistical and causal inference problems routinely assume that subjects in data are independent of one another. However, this assumption is easily violated when subjects are interacting with others through network ties in a large, high-dimensional dataset.Groups in the Center for Statistical Sciences have been developed new approaches that would be valid even though subjects are interconnected with others. Applications of the new methods vary in diverse fields including HIV, alcohol and substance use research, and neuroimaging networks. Furthermore, we are working on how to utilize network interactions from diverse sources of dataset to improve overall public health outcomes.
-
Joseph W. Hogan
Carole and Lawrence Sivorich Professor of Public Health, Professor of Biostatistics, Department Chair of Biostatistics -
Youjin Lee
Manning Assistant Professorship of Biostatistics
Spatio-Temporal Statistical Methods
We develop novel statistical models and methods for analyzing spatio-temporal data. Our research is primarily motivated by inter-disciplinary collaborations from researchers in neuroscience, psychiatry, epidemiology and public health. The common statistical themes in our research are spectral methods using localized waveforms, dimension reduction, spatio-temporal covariance modeling and Bayesian hierarchical models.
-
Matthew Harrison
Associate Professor of Applied Mathematics
Statistical Learning
Statistical Learning is a framework under the broad umbrella of Machine Learning that uses techniques from functional analysis to understand data. Statistical learning is often divided into two common categories: (i) supervised, and (ii) unsupervised learning. Briefly, supervised learning involves building a predictive model based on some response or outcome of interest; while, unsupervised learning learns about relationships and data structures without any supervising outcome variable. Many faculty members in the Center are developing novel statistical learning approaches to tackle specific public health related problems. Some of these areas include: artificial neural networks for medical imaging, anomaly detection methods for clinical trials, online learning techniques for real-time clinical prognostics, and dimensionality reduction and structured prediction models in genome-wide association studies.
-
Jon Steingrimsson
Associate Professor of Biostatistics, Director of the NextGen Graduate Program in Biostatistics -
Arman Oganisian
Assistant Professor of Biostatistics
Survival Analysis
Survival analysis is the branch of statistics that deals with analyzing data when the outcome of interest is the time to some event, such as time to death or disease progression. Such outcomes are often only partially observed due to participants dropping out of the study or not having experienced the event of interest before the end of the study period (referred to as censoring). This partial missingness creates statistical challenges and several faculty members work on developing methods to address these challenges. Dr. Steingrimsson works on adapting machine learning algorithms for censored data and Dr. Chrysanthopoulou works on approaches for simulating time to event (accounting for censoring) data in the context of complex simulation models (e.g., microsimulation models) used in Public Health Decision Making. In addition, several faculty members are involved in interdisciplinary collaborations that involve analysis of time-to-event outcomes.
-
Jon Steingrimsson
Associate Professor of Biostatistics, Director of the NextGen Graduate Program in Biostatistics -
Stavroula Chrysanthopoulou
Assistant Professor of Biostatistics, Director of the Master's Graduate Program in Biostatistics
Topological Data Analysis
Topological data analysis (TDA) visualizes the “shape” of data from the spatial connectivity between discrete points. Prof. Crawford and his lab group use TDA to summarize complex patterns that underlie high-dimensional biological data. They are particularly interested in the “sub-image” selection problem where the goal is to identify the physical features of a collection of 3D shapes (e.g., tumors and single cell formations) that best explain the variation in a given trait or phenotype. Actively collaborating with faculty in the Center for Computational Molecular Biology, the School of Engineering, and the Robert J. & Nancy D. Carney Institute for Brain Science, the Crawford Lab works to develop unified statistical and machine learning frameworks that generalize the use of topological summary statistics in 3D shape analyses. Current application areas include: radiomics with clinical imaging of brain-based diseases, molecular biology with 3D microscopy of cells, biophysics with molecular dynamics simulations, and anthropology with computed tomography (CT) scans of bones.
-
Lorin Crawford
Distinguished Senior Fellow in Biostatistics