Statistical Theory and Methods
In contrast to frequentist approaches, Bayesian methods provide a principled framework for combining data with prior information when making inferences. Bayesian methods allow for more precision in small samples. In large samples, Bayesian nonparametric/machine learning methods can capture complex, nonlinear relationships in the data to produce accurate predictions and uncertainty quantification. Bayesian methods are widely used to solve complex inference problems in microsimulations, genomics, causal inference, missing data problems, and more.
Roberta De Vito  Arman Oganisian 
Clinical trials prospectively assign participants to two or more interventions and prospectively measure outcomes on the participants. Clinical trials are the gold standard for studying the effectiveness of interventions and are usually required for regulatory approval. Several members of the department are involved in developing methods for design, analysis, and interpretation of clinical trials. For example, Dr. Schmid and others have published extensively on metaanalysis for clinical trials and Dr. Schmid is a leader in the developments of methods for and applications of Nof1 trials which are single person multiple crossover studies. Dr. Steingrimsson works on developing machine learning methods for datadriven discovery of subgroups with enhanced treatment effects. ECOGACRIN has carried out large trials comparing different screening modalities in cancer.
Jon Steingrimsson  Christopher Schmid 
Data science lives at the intersection of statistics, computational sciences, and domain matter knowledge. The Center for Statistical Sciences is heavily invested in health data science through a variety of projects in areas such as computational biology, machine learning, Bayesian statistics, network analysis, causal inference with big data, and analysis of neuroimaging data. The Department of Biostatistics is one of four core departments in Brown's Data Science Initiative. Dr. Hogan serves as the Deputy Director of the Initiative, Dr. De Vito is currently a member of the Data Science Executive Committee, and Dr. Eloyan is a member of the DSI Campus Advisory Board.
Joseph Hogan  Roberta DeVito  Ani Eloyan 
Youjin Lee  Arman Oganisian 
Data Fusion
In many applications in public health, medicine and social science, patient characteristics are dispersed over multiple files, platforms, and/or studies. Analysis that links two or more separate data sources is increasingly important as researchers seek to integrate administrative and clinical datasets while adapting to privacy regulations that limit access to unique identifiers. Dr. Gutman has developed novel Bayesian procedures to link units that appear in two datasets by treating the unknown linking as missing data. He is collaborating with health services researchers and clinicians to estimate the effects of policies and interventions as well as predict health outcomes from clinical and demographic variables. Also Dr. De Vito has developed novel statistical techniques to integrate multiple studies in one task, to concurrently estimate common characteristics shared among all the studies and studyspecific component.
Roee Gutman  Roberta DeVito 
Social network analysis
Statistical and causal inference problems routinely assume that subjects in data are independent of one another. However, this assumption is easily violated when subjects are interacting with others through network ties in a large, highdimensional dataset.Groups in the Center for Statistical Sciences have been developed new approaches that would be valid even though subjects are interconnected with others. Applications of the new methods vary in diverse fields including HIV, alcohol and substance use research, and neuroimaging networks. Furthermore, we are working on how to utilize network interactions from diverse sources of dataset to improve overall public health outcomes.
Joseph Hogan  Youjin Lee  Ashley Buchanan 
Causal inference and big data
Causal inference problems are often challenged by complexities in data from different sources, such as massive online experiments or electronic medical records. To unravel the causal relationships buried in a large data set, we

establish identification conditions needed for causal identification,

develop nonparametric methods to estimate meaningful causal quantities flexibly, and

deliver impactful causal implications for public health from big data.
Jon Steingrimsson  Youjin Lee  Arman Oganisian 
Roee Gutman  Joseph Hogan 
Loosely speaking, deep learning is a branch of machine learning that uses multilayer neural networks to build models for several tasks such as prediction or diagnosis. The unknown parameters, commonly referred to as weights, are estimated by minimizing a loss function often subject to some form of regularization. Deep learning has shown promise in many domains, with imaging based analysis being a common application area where deep learning models have shown promising performance. Several center members are working on deep learning related research including analyzing medical images, uncertainty quantification, and interpretability of deep learning models.
Jon Steingrimsson  Fenghai Duan 
Latent variable models link observed (or manifest) variables to unobserved (or latent) constructs. They comprise of two parts: a measurement model specifying the relationship between manifest and latent variables, and a structural model delineating the relationships among the latent variables themselves. Both the manifest and the latent variables can be either discrete or continuous in nature. When both are continuous, one obtains the factor analytic models used widely in psychology, e.g., to measure latent constructs such as human intelligence. When both are discrete, one obtains the latent class models used to categorize observations into distinct groups, e.g., to classify individuals into diseased vs. nondiseased according to their constellation of symptoms. Widely used in educational testing are Item Response Theory models (also known as Latent Trait models) that relate a group of categorical manifest variables to a continuous latent variable, e.g., using answers to a multiple choice test to measure mastery of a particular academic subject. Finally, finite mixture models (also known as Latent Profile Analysis) relate a set of continuous manifest variables to underlying categorical constructs, e.g., by partitioning clinical trial participants into homogeneous groups across behavioral and cognitive dimensions of engagement with physical activity interventions. Originally developed for crosssectional data, latent variable models have recently been generalized to longitudinal data. For example, Latent Transition Analysis has been used to model movement across stages of change in studies of smoking cessation. An example of latent variable modeling by our faculty is given by the 2parameter logistic IRT models fit to the DSMIV criteria for nicotine dependence by Dr. Papandonatos and his students. They uncovered a 2dimensional structure with two positively correlated latent factors, thus contradicting conventional wisdom that DSMIV symptoms measure a single dimension of liability to nicotine dependence.
George Papandonatos 
Data from the Public Health and Medical Research are often subject to clustering either due to the way they are collected, e.g., multiple observations on the same subject over the duration of the observation period (longitudinal data) or due to some other inherent heterogeneity between groups (strata) of the sampling units. Advanced multivariate statistical methods (e.g., Generalized Estimating Equations (GEE) and MixedEffects models) have been developed to correctly account for and describe the sources of heterogeneity and variability/correlation structure between and within groups of study subjects. Multivariate statistical methodology involves detecting, analyzing, and characterizing associations among multidimensional data. Related supervised or unsupervised techniques are mainly concerned with the dimension reduction of a system. Center faculty conduct extensive research on novel statistical techniques for analyzing longitudinal and multivariate data including methods for analyzing individual and aggregated results from personalized (Nof1) trials of treatment interventions, methods for developing and assessing predictive models for ordinal health outcomes.
Stavroula Chrysanthopoulou  Christopher Schmid 
Center faculty are leaders in developing and applying methods for metaanalysis, the quantitative combination of results from different studies. Prof. Gatsonis has pioneered the use of hierarchical summary ROC curves for assessing sensitivity and specificity and is developing methods for summarizing the predictive accuracy of diagnostic tests. Prof. Trikalinos heads the Center for Evidence Synthesis in Health which he cofounded with Prof. Schmid. They have developed a variety of different methods and software tools for synthesizing different types of data and studies including metaanalysis of diagnostic tests, multivariate outcomes and networks of treatments. Prof. Schmid also heads the Evidence Synthesis Academy, which aims to promote the wider use and understanding of metaanalysis among decisionmakers.
Constantine Gatsonis  Thomas A. Trikalinos  Christopher Schmid  Roberta DeVito  George Papandonatos 
Statistical methodology research on HIV/AIDS spans a broad spectrum and includes statistical causal inference (e.g. causal pathway analysis of HIV intervention involving behavioral changes); statistical/machine learning methods (e.g. superlearning for risk modeling of treatment failure and prediction); Bayesian statistical modeling of the treatment continuum; clinical decision making for optimizing HIV treatment in resource limited settings; microsimulation modeling; etc. Professors Hogan, Liu, and Chrysanthopoulou’s collaborative and methodological research has secured rich research fund from NIAID, NIAAA, NIAID, NHLBI, NICHD, USAID, etc.
Simulation models have been broadly used as a valuable tool in costeffectiveness analyses, comparative effectiveness research, etc., for evidenceinformed Public Health Decision making. Recent advancements in computing technology have facilitated the development of increasingly intricate predictive models aimed at describing complex health processes and systems using Monte Carlo simulation techniques. Depending on their specific characteristics there is a large variety of these models including, but not limited to, state transition, discrete event simulation, dynamic transmission, compartmental, microsimulation, and agentbased models. Microsimulation models, in particular, synthesize information from multiple resources and use computer technology to combine mathematical and statistical models for simulating individual trajectories related to the course of a disease, usually in conjunction with some treatment or other interventions.
Center faculty have extensive expertise in this area, working on statistical approaches for developing, evaluating, and implementing this type of simulation models with applications to cancer, sexually transmitted diseases, opioid use disorder, COVID19, dementia, etc. Dr Trikalinos is the PI of the NCI CISNET bladder cancer incubator site and has been core PI of several projects involving development and applications of simulation models for Public Health Decision Making. Dr Chrysanthopoulou specializes in statistical techniques for calibration, validation, and predictive accuracy assessment of microsimulation models, has developed the opensource MIcrosimulation Lung Cancer (MILC) model of the natural history of lung cancer, and is involved in collaborative projects at Brown University and other institutions for building complex simulation models used in decision analysis.


Nof1 trials are randomized multicrossover experiments conducted on a single individual in order to determine the personalized relative efficacy of two or more treatments measured repeatedly over time. Prof. Schmid and a team of graduate students are developing time series and multilevel methods and software for the design and analysis of single trials as well as the metaanalysis of a series of Nof1 trials that can estimate both individual and population level effects. The group has served as the analytic hub for several large federally and nonfederally funded studies using the Nof1 framework. These include alternative treatments for chronic pain, diets for inflammatory bowel disease, triggers of atrial fibrillation and behavioral interventions for anxiety and stress. The group is collaborating with other Brown scientists to develop a mobile app that can flexibly setup, run and analyze and interpret data from one or more Nof1 trials.
Christopher Schmid 
Statistical Learning is a framework under the broad umbrella of Machine Learning that uses techniques from functional analysis to understand data. Statistical learning is often divided into two common categories: (i) supervised, and (ii) unsupervised learning. Briefly, supervised learning involves building a predictive model based on some response or outcome of interest; while, unsupervised learning learns about relationships and data structures without any supervising outcome variable. Many faculty members in the Center are developing novel statistical learning approaches to tackle specific public health related problems. Some of these areas include: artificial neural networks for medical imaging, anomaly detection methods for clinical trials, online learning techniques for realtime clinical prognostics, and dimensionality reduction and structured prediction models in genomewide association studies.
Jon Steingrimsson  Arman Oganisian 
Survival analysis is the branch of statistics that deals with analyzing data when the outcome of interest is the time to some event, such as time to death or disease progression. Such outcomes are often only partially observed due to participants dropping out of the study or not having experienced the event of interest before the end of the study period (referred to as censoring). This partial missingness creates statistical challenges and several faculty members work on developing methods to address these challenges. Dr. Steingrimsson works on adapting machine learning algorithms for censored data and Dr. Chrysanthopoulou works on approaches for simulating time to event (accounting for censoring) data in the context of complex simulation models (e.g., microsimulation models) used in Public Health Decision Making. In addition, several faculty members are involved in interdisciplinary collaborations that involve analysis of timetoevent outcomes.
Jon Steingrimsson  Stavroula Chrysanthopoulou 
Topological data analysis (TDA) visualizes the “shape” of data from the spatial connectivity between discrete points. Prof. Crawford and his lab group use TDA to summarize complex patterns that underlie highdimensional biological data. They are particularly interested in the “subimage” selection problem where the goal is to identify the physical features of a collection of 3D shapes (e.g., tumors and single cell formations) that best explain the variation in a given trait or phenotype. Actively collaborating with faculty in the Center for Computational Molecular Biology, the School of Engineering, and the Robert J. & Nancy D. Carney Institute for Brain Science, the Crawford Lab works to develop unified statistical and machine learning frameworks that generalize the use of topological summary statistics in 3D shape analyses. Current application areas include: radiomics with clinical imaging of brainbased diseases, molecular biology with 3D microscopy of cells, biophysics with molecular dynamics simulations, and anthropology with computed tomography (CT) scans of bones.
Lorin Crawford 