Health Data Science: State of the Art and a Look Into the Future
A Symposium celebrating 30 Years of Biostatistics at Brown University
Health Data Science: State of the Art and a Look Into the Future
A Symposium celebrating 30 Years of Biostatistics at Brown University
October 18th, 2024 | 8:30am- Welcome and refreshments
Agenda
8:30am
Welcome and refreshments
9:00am
Introduction by Dean of the School of Public Health Ashish K. Jha
9:10am
Remarks by Biostatistics Vice Chair Ani Eloyan
9:15am
Introduction by School of Public Health Biostatistics Chair Joseph Hogan
9:30am
Session 1: Clinical Evaluation of AI, future of Clinical trial design
Session Chair: Ani Eloyan
- Jon Steingrimsson, Brown University
Title: Use of AI and machine learning in clinical trials - Sumithra Mandrekar, Mayo Clinic
Title: Design and Conduct of Cancer Clinical Trials: 2024 and Beyond - Jean Feng, University of California San Francisco
Title: Towards a post-market monitoring framework for machine learning-based medical devices - Discussant: Constantine Gatsonis, Brown University
10:45am
Coffee break
11:00am
Session 2: Analysis of large observational data
Session Chair: Arman Oganisian
- Liz Stuart, Johns Hopkins Bloomberg School of Public Health
Title: Integrating Data for Causal Inference - Mike Daniels, University of Florida
Title: A Bayesian Nonparametric Approach For Causal Inference In EHR Data In The Presence Of Nonignorable Missingness - Youjin Lee, Brown University
Title: Replicable causal inference research using several treatment assignment mechanisms - Discussant: Rebecca Hubbard, Brown University
12:15pm
Lunch break
1:30pm
Session 3: Statistical inference of Massive Data
Session Chair: Anarina Murillo
- Yi Zhao, Indiana University School of Medicine
Title: Beyond Massive Univariate Tests: Covariance Regression Reveals Complex Patterns of Brain Functional Connectivity - Ying Ma, Brown University
Lorin Crawford, Microsoft Research and Brown University
Title: Statistical opportunities in defining, modeling, and targeting cell state in cancer
- Discussant: Jean Wu, Brown University
2:55pm
Break
3:10pm
Panel Discussion: Biostatistics research and education as essential academic units in the era of HDS and AI
Moderated by Joseph Hogan:
- Kiros Berhane, Columbia University
- Alice Paul, Brown University
- Xihong Lin, Harvard University
4:10pm
Closing Remarks
Poster session highlighting the work of our faculty, alumni and students
Drinks and appetizers served
Abstract and Bios
Sumithra J. Mandrekar, Ph.D., is currently Professor of Biostatistics and Oncology at the Mayo Clinic, Rochester MN, and the Group Statistician and Program Director for the Statistics and Data Management Center for the Alliance for Clinical Trials in Oncology. Alliance is one of the 4 NCI- funded national clinical trials networks for the conduct of phase II and III clinical trials in adult cancer. She is widely recognized for significant contributions to the statistical methodology for the design, conduct and analysis of clinical trials, particularly in oncology; for leadership in clinical trials and data management coordination at Mayo Clinic and the Alliance for Clinical Trials in Oncology; for leadership on national and international steering committees and advisory panels related to cancer, including the National Cancer Institute Clinical and Translational Advisory Committee (CTAC). She is a fellow and past president of the Society for Clinical Trials. Dr. Mandrekar’s primary research interests include adaptive dose-finding early phase trial designs, trial designs in the late phase and marker validation setting, and general clinical trial methodology related to streamlining the conduct of clinical trials and identification of alternative cancer clinical trial endpoints.
Title: Design and Conduct of Cancer Clinical Trials: 2024 and Beyond
Elizabeth A. Stuart, Ph.D., is the Frank Hurley and Catharine Dorrier Chair and Bloomberg Professor of American Health in the Department of Biostatistics at the Johns Hopkins Bloomberg School of Public Health, with joint appointments in the Department of Mental Health and the Department of Health Policy and Management. She was previously Executive Vice Dean for Academic Affairs at the School. She received her PhD in Statistics from Harvard University in 2004. Her research interests are in design and analysis approaches for estimating causal effects in experimental and non-experimental studies, including questions around the external validity of randomized trials and the internal validity of non-experimental studies, as well as methods for combining data sources to assess treatment effect heterogeneity and methods for evidence synthesis. She has published over 350 papers and has received research funding for her work from the National Science Foundation, the Institute of Education Sciences, the WT Grant Foundation, and the National Institutes of Health and has served on advisory panels for the US Department of Education, and the Patient Centered Outcomes Research Institute. She is a fellow of the American Statistical Association and the American Association for the Advancement of Science, received the mid-career award from the Health Policy Statistics Section of the ASA, the Gertrude Cox Award for applied statistics, Harvard University’s Myrto Lefkopoulou Award for excellence in Biostatistics, and the Society for Epidemiologic Research Marshall Joffe Epidemiologic Methods award. She currently serves on the National Academies of Sciences, Engineering, and Medicine (NASEM) Committee on National Statistics and co-chairs NASEM's Committee on Applied and Theoretical Statistics.
Title: Integrating Data for Causal Inference
Abstract
Many causal questions of interest cannot be answered through analysis of a single dataset, and as data becomes increasingly available, there is more and more interest in leveraging that data to answer nuanced questions. Such questions might include examining the generalizability of randomized trial results to target populations, to better understanding of effect heterogeneity by combining small (unbiased) randomized trials with large (but confounded) non-experimental data sources. This talk will discuss methods for causal inference in such integrated datasets, including both the promise and potential for doing so, as well as implementation challenges, such as when the measures in the different data sources are discordant. Motivating examples will come from medicine and public health, and with lessons for a range of fields, and with final comments on the broader field of evidence synthesis for causal inference.
Yi Zhao, Ph.D., is an Associate Professor, Siu L. Hui Scholar of Biostatistics, in the Department of Biostatistics and Health Data Science at Indiana University School of Medicine. Her study interest includes causal mediation analysis, decomposition methods, multiview data integration, density regression, and neuroimaging applications.
Title: Beyond Massive Univariate Tests: Covariance Regression Reveals Complex Patterns of Brain Functional Connectivity
Abstract
Studies of brain functional connectivity typically involve massive univariate tests, performing statistical analysis on each individual connection. In this study, we consider the problem of regressing covariance matrices on associated covariates. The goal is to use covariates to explain variation in covariance matrices across units. As such, we introduce Covariate Assisted Principal (CAP) regression, an optimization-based method for identifying components associated with the covariates using a generalized linear model approach. For high-dimensional data, a well-conditioned linear shrinkage estimator of the covariance matrix is introduced. With multiple covariance matrices, the shrinkage coefficients are proposed to be common across matrices. Theoretical studies demonstrate that the proposed covariance matrix estimator is optimal achieving the uniformly minimum quadratic loss asymptotically among all linear combinations of the identity matrix and the sample covariance matrix. Under regularity conditions, the proposed estimator of the model parameters is consistent. We develop computationally efficient algorithms to jointly search for common linear projections of the covariance matrices, as well as the regression coefficients. The superior performance of the proposed approach over existing methods is illustrated through simulation studies. Applied to resting-state functional magnetic resonance imaging (fMRI) studies, the proposed approach regresses whole-brain functional connectivity on covariates and enables the identification of relevant brain subnetworks.
Mike Daniels, Ph.D., received his undergraduate degree from Brown University in Applied Math and doctoral degree from Harvard University in Biostatistics. He has been on the faculty at Iowa State and University of Texas at Austin. Currently, Daniels is Professor, Andrew Banks Family Endowed Chair, and Chair in the Department of Statistics at the University of Florida. He is a past president of ENAR. He is a fellow of the American Statistical Association, former chair of the Statistics in Epidemiology Section of the American Statistical Association (ASA), former chair of the Biometrics Section of the ASA, and former editor of Biometrics. He has received the Lagakos Distinguished Alumni Award from Harvard Biostatistics and the L. Adrienne Cupples Award from Boston University. He has published extensively on Bayesian methods for missing data, longitudinal data and causal inference and has been funded by NIH R01 grants from as PI and/or MPI since 2001. He also has a strong and productive record of collaborative research, with a focus on behavioral trials in smoking cessation and weight management, muscular dystrophy, and HIV.
Title: A Bayesian Nonparametric Approach For Causal Inference In EHR Data In The Presence Of Nonignorable Missingness
Abstract
We propose an approach for missingness in EHRs using Bayesian nonparametric (BNP) models. We show how to introduce sensitivity parameters corresponding to nonignorable missingness in the outcome and confounders by extracting unidentified distributions from the BNP model and reconstructing the distribution of interest. We also flexibly include auxiliary covariates to move closer to MAR. We use G-computation based on the reconstructed distribution to compute causal estimands of interest. We use our approach to assess the comparative effectiveness of two bariatic surgeries on BMI 18 months after surgery.
Joint work with David Lindberg (University of Florida) and Sebastien Haneuse (Harvard University)
Jean Feng, Ph.D. is an Assistant Professor in the Department of Epidemiology and Biostatistics at the University of California, San Francisco and the UCSF-UC Berkeley Joint Program in Computational Precision Health and a principal investigator at the UCSF-Stanford Center of Excellence in Regulatory Science and Innovation (CERSI). She is also the data science lead on the predictive analytics team for the Zuckerberg San Francisco General Hospital. Her research interests span the interpretability, reliability, and regulation of machine learning (ML) algorithms in healthcare.
Title: Towards a post-market monitoring framework for machine learning-based medical devices
Abstract:
After a machine learning (ML)-based system is deployed in clinical practice, performance monitoring is widely recognized to be a crucial component to ensuring the safety and effectiveness of the algorithm over time. Nevertheless, designing an effective monitoring strategy is highly complex given the multitude of design decisions, including the data source (e.g. observational versus interventional data), the performance criteria tracked, the assumptions required by the procedure, and more. After reviewing existing approaches to designing clinical AI monitoring systems, we discuss the need for a systematic framework for designing post-market monitoring systems, the unique considerations in this setting, and the importance of causal thinking.
Kiros Berhane, Ph.D., is the Cynthia and Robert Citrone-Roslyn and Leslie Goldstein Professor and Chair of the Department of Biostatistics at Columbia University. His expertise is in development of methods for complex data structures on multi-factorial health effects. He is Contact PI of the U2R component of the GEOHealth Hub for Eastern Africa focusing on health impacts of environmental hazards and climate change. He is a well-funded researcher in statistical methodology development and their application to a wide array of domain areas of public health as well as training programs that include the “Advancing Public Health Research in Eastern Africa through Data Science Training (APHREA-DST)” to develop new graduate training programs in public health data science at University of Nairobi (Kenya) and Addis Ababa University (Ethiopia) – as part of NIH’s DS-I Africa initiative. He recently served as a member of the committee of the National Academy of Science, Engineering and Medicine (NASEM) on Assessing Causality from a Multidisciplinary Evidence Base for National Ambient Air Quality Standards and also as a member of the core panel for Lancet Commission on the Future of Health and Economic Resilience of Africa (FHERA). He serves on the editorial boards of several scientific journals, and he is currently serving as a member of Science magazine’s Board of Reviewing Editors. He was a Fulbright Scholar in 2016-2017. He is an elected fellow of the American Statistical Association.
Xihong Lin
Youjin Lee, Ph.D., is the Manning Assistant Professor of Biostatistics in the Department of Biostatistics at Brown University. Her research interests include developing novel statistical and causal inference methods for complex observational studies, including large network data, clustered data, and brain network data. She is particularly interested in settings where standard assumptions, such as independent observations or the assumption of no unmeasured confounding, are violated. She is currently working on causal inference methods with applications to policy effect evaluation and effective connectivity research. She is an Associate Editor for Reproducibility for the Journal of the American Statistical Association and the Secretary for the Society for Causal Inference.
Jon Steingrimsson, Ph.D., is a faculty member in biostatistics at Brown University. Steingrimsson obtained his Ph.D. in statistics from Cornell University and prior to joining Brown University he was a postdoctoral fellow in the Department of Biostatistics at Johns Hopkins University.
Title: Use of AI and machine learning in clinical trials
Abstract
The process of conducting a clinical trial is expensive, time consuming, and the success rate for late stage clinical trials is low. How to use artificial intelligence (AI) and machine learning (ML) to improve the conduct of clinical trials has received a lot of attention. In this talk we discuss how AI/ML are currently used to improve the conduct of clinical trials, future potential, and challenges.
Alice Paul, Ph.D., is an Assistant Professor of Biostatistics at Brown University where she is the Director of the Undergraduate Statistics Concentration and the Associate Director of the Biostatistics Masters Program. She received her Ph.D. in Operations Research from Cornell University in 2017 before completing a postdoctoral fellowship at Brown’s Data Science Institute. Before her tenure at Brown, she served as an Assistant Professor of Applied Mathematics and Computer Science at Olin College of Engineering. Alice’s research interests span algorithms, optimization, data science, and education, focusing on the design and analysis of optimization algorithms for machine learning, with applications in clustering, variable selection, risk models, and shared mobility systems. At Brown, she was honored with the Dean’s Award for Excellence in Classroom Teaching in the School of Public Health in 2022. Additionally, she authored the online book "Mastering Health Data Science Using R," with plans for an extended print version. Alice also serves on the editorial board for the INFORMS Journal Transactions on Education.
Lorin Crawford, Ph.D., is a Principal Researcher at Microsoft Research, and he holds a faculty affiliate position as a Distinguished Senior Fellow of Biostatistics at Brown University. His research program involves developing interpretable machine learning algorithms to understand how non-additive variation plays a role in complex traits and contributes to disease in diverse human populations. Some of his most recent work has landed me a place on Forbes 30 Under 30 list and recognition as a member of The Root 100 Most Influential African Americans in 2019. He has also been awarded an Alfred P. Sloan Research Fellowship, a David & Lucile Packard Foundation Fellowship for Science and Engineering, and a COPSS Emerging Leader Award. Prior to joining both MSR and Brown, Dr. Crawford received his PhD from the Department of Statistical Science at Duke University and a Bachelor of Science degree in Mathematics from Clark Atlanta University.
Title: Statistical opportunities in defining, modeling, and targeting cell state in cancer
Abstract:
Project Ex Vivo is a joint cancer research collaboration between Microsoft and the Broad Institute of MIT and Harvard. Our group views cancers as complex (eco)systems, beyond just mutational variation, that necessitate systems-level understanding and intervention. In this talk, I will discuss a series of multimodal statistical and deep learning approaches to understand accurate representations of tumors by integrating genetic markers, expression state, and microenvironmental interactions. These representations help us precisely define and quantify the trajectory of each tumor in each patient. Our ultimate objective is to more effectively model cancer ex vivo – outside the body – in a patient-specific manner. In doing so, we aim to unlock the ability to better stratify patient populations and identify therapies that target diverse aspects of human cancers.
Ying Ma, Ph.D., is an Assistant Professor at the Department of Biostatistics and a core faculty member at the Center for Computational Molecular Biology at Brown University. Her research interests focus on developing efficient statistical learning methods to address a variety of biological problems and computational challenges in genomics and genetics, particularly single-cell RNA-sequencing, and spatially resolved transcriptomics. In addition to her genomics research, she also works on genetic risk prediction analysis for common health exposure traits in large biobanks such as UK Biobank, and the Michigan Genomics Initiative (MGI).
Title: Statistical and AI powered methods in Spatial Transcriptomics
Abstract:
Spatially resolved transcriptomics (SRT) studies are becoming increasingly common and large, offering unprecedented opportunities to characterize complex tissues' spatial and functional organization. In this talk, I will present our recent method IRIS, which is designed to detect spatial domains in multi-sample SRT studies. IRIS leverages scRNA-seq data to improve spatial domain detection, integrates multiple SRT tissue slices, and incorporates spatial correlation within and across slices for biologically meaningful domain identification. IRIS achieves unprecedented accuracy gains across various tissue types and spatial resolutions, revealing fine-scale tissue architecture, tumor microenvironment diversity, and structural changes in disease states. If time permits, I will also discuss my other work such as CARD for cell type deconvolution and ongoing work on gene-gene network analysis and the integration of spatial multimodal data.