Robust and reliable machine learning

Robust and reliable machine learning

Probabilistic models—a foundation of modern data analysis—rely on simplifying assumptions of complex real-life phenomena. Crucially, these methods operate on the assumption that the model is correct. Complex models are often necessary to accurately learn about sophisticated real-world phenomena, but even with careful model checking, some amount of model misspecification is inevitable. In some cases, this misspecification may lead to undesirable behavior, such as uninterpretable or even misleading results.

My research focuses on studying misspecification in probabilistic models with the goal of understanding when our model assumptions lead to desirable behaviors and when they lead to misleading and uninterpretable inferences. My goal is to develop and understand statistical machine learning models under misspecification and distribution shift and to study the impact on downstream tasks, such as decision making.

approximate models

A few current reseach directions for understanding and developing more robust machine learning methods include studying:

See also our workshop at NeurIPS 2021: “Your Model is Wrong: Robustness and misspecification in probabilistic machine learning”.


Kernel density Bayesian inverse reinforcement learning

Submitted (preliminary version appeared in AABI), 2023

Finite mixture models do not reliably learn the number of components

Proceedings of the 38th International Conference on Machine Learning (ICML), 2021
Oral presentation (short)

PDF Poster ICML arXiv ICML talk BibTeX

Active multi-fidelity Bayesian online changepoint detection

Proceedings of the 37th Conference on Uncertainty in Artificial Intelligence (UAI), 2021

PDF Code arXiv BibTeX

Power posteriors do not reliably learn the number of components in a finite mixture

NeurIPS Workshop: I Can’t Believe It’s Not Better, 2020
Best Paper Award (Didactic Track) & spotlight presentation

PDF Workshop Link

Weighted meta-learning

arXiv e-print 2003.09465, 2020

PDF arXiv BibTeX

Finite mixture models are typically inconsistent for the number of components

NeurIPS Workshop on Machine Learning With Guarantees, 2019


Diana Cai
Center for Computational Mathematics

I am broadly interested in machine learning and statistics, and in particular, developing robust and reliable methods for modeling and inference.