Robust and reliable machine learning

Apr 27, 2017

Probabilistic models—a foundation of modern data analysis—rely on simplifying assumptions of complex real-life phenomena. Crucially, these methods operate on the assumption that the model is correct. Complex models are often necessary to accurately learn about sophisticated real-world phenomena, but even with careful model checking, some amount of model misspecification is inevitable. In some cases, this misspecification may lead to undesirable behavior, such as uninterpretable or even misleading results.

My research focuses on studying misspecification in probabilistic models with the goal of understanding when our model assumptions lead to desirable behaviors and when they lead to misleading and uninterpretable inferences. My goal is to develop and understand statistical machine learning models under misspecification and distribution shift and to study the impact on downstream tasks, such as decision making.

approximate models

A few current reseach directions for understanding and developing more robust machine learning methods include studying:

Model misspecification in latent variables, e.g., mixture models, and methods for Bayesian robustness
Misspecification in network models including for sparsity and power laws
Online changepoint detection for expensive models
Multi-source meta-learning and transfer learning
Flexible likelihood approximations via kernels in inverse reinforcement learning

selected

Publications

Kernel density Bayesian inverse reinforcement learning
Aishwarya Mandyam, Didong Li, Diana Cai, Andrew Jones, Barbara E. Engelhardt
Transactions on Machine Learning Research, 2024

Finite mixture models do not reliably learn the number of components
Diana Cai*, Trevor Campbell*, Tamara Broderick
Proceedings of the 38th International Conference on Machine Learning (ICML), 2021
Oral presentation (short)

PDF Poster ICML arXiv ICML talk BibTeX

Active multi-fidelity Bayesian online changepoint detection
Gregory G. Gundersen, Diana Cai, Chuteng Zhou, Barbara E. Engelhardt, Ryan P. Adams
Proceedings of the 37th Conference on Uncertainty in Artificial Intelligence (UAI), 2021

PDF Code arXiv BibTeX

Power posteriors do not reliably learn the number of components in a finite mixture
Diana Cai*, Trevor Campbell*, Tamara Broderick
NeurIPS Workshop: I Can’t Believe It’s Not Better, 2020
Best Paper Award (Didactic Track) & spotlight presentation

PDF Workshop Link

Weighted meta-learning
Diana Cai, Rishit Sheth, Lester Mackey, Nicolo Fusi
arXiv e-print 2003.09465, 2020

PDF arXiv BibTeX

Finite mixture models are typically inconsistent for the number of components
Diana Cai, Trevor Campbell, Tamara Broderick
NeurIPS Workshop on Machine Learning With Guarantees, 2019

PDF

Diana Cai

Center for Computational Mathematics

I am broadly interested in machine learning and statistics, and in particular, developing robust and reliable methods for modeling and inference.