DIANA CAI

Email: dcai at cs.princeton.edu  

Research interests: I am broadly interested in probabilistic modeling, and in particular, theory and methods for robust, scalable, and nonparametric Bayesian modeling.

Recent News

  • I'll be giving a talk at the 12th Conference on Bayesian Nonparametrics in Oxford in June 2019.
  • I'll be giving a talk at the AMS Special Session on Advances in Regularity Lemmas at JMM in January 2019.
  • Our paper "A Bayesian nonparametric view on count-min sketch" was accepted to NIPS 2018.
  • I'm co-organizing the NIPS 2018 Workshop: All of Bayesian Nonparametrics (Especially the Useful Bits).
  • I'll be giving a talk at the Tufts CS Rising Star Colloquium in September 2018.
  • Our paper "Exchangeable Trait Allocations" has been accepted to the Electronic Journal of Statistics.

Selected Papers

View all papers



About Me

I am a graduate student at Princeton University in computer science, working on problems in Bayesian statistics and machine learning. I am advised by Ryan P. Adams and Barbara Engelhardt, and I also work closely with Tamara Broderick at MIT. At Princeton, I am a member of the Laboratory for Intelligent Probabilistic Systems Group and the Biological and Evolutionary Explorations using Hierarchical Integrative Statistical Models Group. Previously, I received an M.S. in statistics at the University of Chicago, and an A.B. in computer science and statistics from Harvard University.

I am an organizer for the NIPS 2018 Workshop "All of Bayesian Nonparametrics (Especially the Useful Bits)." I was an organizer for the 2016 Women in Machine Learning Workshop in Barcelona, Spain.



Past travel & updates

Projects

Projects, by topic:

  1. probabilistic modeling,
  2. graphs and networks,
  3. other projects.



Probabilistic modeling and scalable inference


Probabilistic modeling is a powerful tool for understanding data in a wide-variety of applications. My research focuses on developing and analyzing the properties of flexible probabilistic models for unsupervised learning, such as clustering, feature modeling, topic modeling, and network modeling.




Bayesian count-min sketch

A Bayesian nonparametric view on count-min sketch

(with Michael Mitzenmacher and Ryan P. Adams).

The count-min sketch is a computationally-efficient randomized data structure that provides a point estimate of the number of times an item has appeared in a data stream. We present a Bayesian nonparametric view on the count-min sketch, using the same data structure, but providing a posterior distribution over the frequencies that characterizes the uncertainty arising from the hash-based approximation. We take a nonparametric approach and consider tokens generated from a from a Dirichlet process random measure, which allows for an unbounded number of unique tokens.
Advances in Neural Information Processing Systems (NIPS), 2018.
pdf spotlight video






Finite mixture models are typically inconsistent
for the number of components

(with Trevor Campbell and Tamara Broderick).

A generative model is a vast simplification of the complex real-world phenomena that govern any observed data set, and for any model of a real-world data set, model misspecification is necessary for tractable data analysis. But model misspecification can also lead to fundamentally inaccurate, misleading, or uninterpretable inferences. We study finite mixture models with a prior on the number of components under misspecification of the mixture family, focusing on the (in)consistency and finite-sample properties of the posterior on the number of components, and we discuss the implications of misspecification for cluster analysis.

Extended abstract in the NIPS Workshop on Advances in Approximate Bayesian Inference, 2017. pdf




Exchangeable trait allocations

(with Trevor Campbell and Tamara Broderick).

We study exchangeable trait allocations, a class of combinatorial models that generalizes partitions, feature allocations, and more. In this work, we characterize the class of exchangeable trait allocations, which we call the trait paintbox. We also characterize a subclass of models particularly amenable to MCMC and variational inference algorithms. We show how constrained trait allocations can be applied to graphs as a generalization of edge-exchangeable graphs and hypergraphs; additional details can be found here.
Electronic Journal of Statistics, 2018.
pdf arxiv

Preliminary version in the NIPS 2016 Workshop on Practical Bayesian Nonparametrics, 2016. pdf




Efficient variational approximations for
online Bayesian changepoint detection

(with Ryan P. Adams).

We develop a scalable method for online changepoint detection in conditionally-conjugate latent variable models, detecting global changes in the latent components. We develop a scalable online variational inference algorithm that adaptively decreases the number of sufficient statistics, and demonstrate inference on mixture models and topic modeling applications.
github



Graphs and networks


Many modern data sources are generated by complex interactions between entities, such as online social and communication networks, biological networks including gene and protein interaction networks, and databases. My research focuses on developing nonparametric methods and theory for exchangeable random graphs and relational data.



Edge-exchangeable graphs: sparsity, power laws,
paintboxes, and probability functions

(with Trevor Campbell and Tamara Broderick).

We study edge exchangeability; here the order of the edges does not affect the distribution of the graph. We show that, unlike many popular graph models that are traditionally vertex exchangeable, that edge exchangeability admits sparsity and power laws. We also characterize the class of edge-exchangeable graphs and a subclass that is particular amenable to posterior inference.

Edge-exchangeable graphs and sparsity.
Advances in Neural Information Processing Systems (NIPS), 2016.
pdf arXiv spotlight video poster

    Preliminary versions appeared as:
  • Completely random measures for modeling power laws in sparse graphs.
    NIPS workshop on Networks in the Social and Information Sciences, 2015. pdf
  • Edge-exchangeable graphs and sparsity.
    NIPS workshop on Networks in the Social and Information Sciences, 2015. pdf
  • Edge-exchangeable graphs, sparsity, and power laws.
    NIPS Workshop on Bayesian Nonparametrics: The Next Generation, 2015. pdf

Paintboxes and probability functions for edge-exchangeable graphs.
NIPS Workshop on Adaptive and Scalable Nonparametric Methods in Machine Learning, 2016. pdf slides poster






Priors on exchangeable directed graphs

(with Nate Ackerman and Cameron Freer).

Exchangeable directed graphs are characterized by a sampling procedure given by the Aldous-Hoover theorem, determined by specifying a distribution on measurable objects known as digraphons. We present a new Bayesian nonparametric model for exchangeable directed random graphs.
Electronic Journal of Statistics (EJS), 2016.
pdf arXiv slides poster

Preliminary version in the NIPS Workshop on Bayesian Nonparametrics, 2015.
Contributed talk in the 10th Conference on Bayesian Nonparametrics, 2015.




Iterative step-function estimation for graphons

(with Nate Ackerman and Cameron Freer).

We present a method for estimating graphons (symmetric, measurable functions from which we can sample exchangeable random graphs) by iteratively refining a partition. Here we compute the similarity of vertices based on their respective neighborhoods with repsect to the previous partition's edge densities. A step-function estimator is then obtained by grouping vertices by taking the average edge density across each pair of classes in the partition.
pdf arXiv poster





Other projects



The Ratio Project: Analyzing Online Recipes

We analyzed online recipes using computational methods and exploratory visualizations.
"A food pyramid made of cookies." The Boston Globe, Dec 2011. link
(with Elaine Angelino and Michael Brenner)
Cocktails Visualization, Jun 2013.
(with Elaine Angelino, Gabrielle Ehrlich, Brent Heeringa, Michael Mitzenmacher, Naveen Sinha).


Papers


Preprints and working papers

  1. An iterative step-function estimator for graphons. arXiv:1412.2129.
    Diana Cai, Nate Ackerman, Cameron Freer.
    In submission, 2018.

Journal and conference papers

  1. A Bayesian nonparametric view on count-min sketch.
    Diana Cai, Michael Mitzenmacher, Ryan P. Adams.
    Advances in Neural Information Processing Systems (NIPS), 2018.
  2. Exchangeable trait allocations. arxiv:1609.09147.
    Trevor Campbell, Diana Cai, Tamara Broderick.
    Electronic Journal of Statistics (EJS), 2018.
  3. Edge-exchangeable graphs and sparsity. arxiv:1612.05519.
    Diana Cai, Trevor Campbell, Tamara Broderick.
    Advances in Neural Information Processing Systems (NIPS), 2016.
  4. Priors on exchangeable directed graphs. arxiv:1510.08440.
    Diana Cai, Nate Ackerman, Cameron Freer.
    Electronic Journal of Statistics (EJS), 2016.

Workshop papers

  1. Finite mixture models are typically inconsistent for the number of components.
    Diana Cai, Trevor Campbell, Tamara Broderick.
    NIPS Workshop on Advances in Approximate Bayesian Inference, 2017. [[pdf]]
  2. Paintboxes and probability functions for edge-exchangeable graphs.
    Diana Cai, Trevor Campbell, Tamara Broderick.
    NIPS Workshop on Adaptive and Scalable Nonparametric Methods in Machine Learning, 2016. [[pdf]]
  3. A paintbox representation for exchangeable trait allocations.
    Trevor Campbell, Diana Cai, Tamara Broderick.
    NIPS Workshop on Practical Bayesian Nonparametrics, 2016. [[pdf]]
  4. Priors on exchangeable directed graphs.
    Diana Cai, Nate Ackerman, Cameron Freer.
    NIPS Workshop on Bayesian Nonparametrics: The Next Generation, 2015. [[pdf]]
    ISBA@NIPS Special Travel Award for Contributed Paper, 2015.
  5. Completely random measures for modeling power laws in sparse graphs.
    Diana Cai, Tamara Broderick.
    NIPS workshop on Networks in the Social and Information Sciences, 2015. [[pdf]]
    arxiv:1603.06915 [stat.ML, math.ST, stat.ME].
  6. Edge-exchangeable graphs, sparsity, and power laws.
    Tamara Broderick, Diana Cai.
    NIPS Workshop on Bayesian Nonparametrics: The Next Generation [[pdf]]
    ISBA@NIPS Special Travel Award for Contributed Paper, 2015.
  7. Edge-exchangeable graphs and sparsity.
    Tamara Broderick, Diana Cai.
    NIPS workshop on Networks in the Social and Information Sciences, 2015. [[pdf]]
    arxiv:1603.06898 [math.ST, stat.ME, stat.ML].


Selected Talks

  1. Invited talk in the Joint Statistics Meeting (JSM) Topic Contributed Session on Bayesian Nonparametrics, July 2019.
  2. Invited talk in the 12th Conference on Bayesian Nonparametrics, June 2019.
  3. Invited talk in the Joint Mathematics Meetings (JMM) AMS Special Session on Advances in Regularity Lemmas, Jan 2019.
  4. Invited talk in the Tufts CS Rising Star Colloquium, Sept 2018.
  5. Edge-exchangeable graphs, sparsity, and power laws.
    Contributed talk in the 11th Conference on Bayesian Nonparametrics, June 2017.
  6. Edge-exchangeable graphs, sparsity, and power laws.
    Contributed talk in the Conference on Network Science (Netsci), June 2017.
  7. Paintboxes and probability functions for edge-exchangeable graphs.
    Contributed talk in the NIPS Workshop on Adaptive and Scalable Nonparametric Methods in Machine Learning, 2016.
  8. Edge-exchangeable graphs, sparsity, and power laws.
    Invited talk in the Isaac Newton Institute (INI) workshop on Bayesian methods for networks, July 2016. [[video link]]
  9. Edge-exchangeable graphs, sparsity, and power laws.
    Invited talk at the Massachusetts Institute of Technology, Machine Learning Tea seminar, July 2016.
  10. Edge-exchangeable graphs, sparsity, and power laws.
    Contributed talk in the NIPS workshop on Bayesian Nonparametrics: the Next Generation, December 2015. [pdf]
  11. Priors on exchangeable directed graphs.
    Contributed talk in The 10th Conference on Bayesian Nonparametrics, June 2015. [pdf]
  12. Efficient online variational changepoint detection.
    Machine Learning Tea Seminar, Harvard University, Feb 2013.

Poster presentations

  1. Paintboxes and probability functions for edge-exchangeable graphs.
    NIPS Workshop on Adaptive and Scalable Nonparametric Methods in Machine Learning, 2016.
  2. Completely random measures for modeling power laws in sparse graphs.
    NIPS workshop on Networks in the Social and Information Sciences, Dec 2015.
  3. Edge-exchangeable graphs, sparsity, and power laws.
    NIPS workshop on Bayesian Nonparametrics: The Next Generation , Dec 2015.
    NIPS workshop on Networks in the Social and Information Sciences, Dec 2015.
  4. Priors on exchangeable directed graphs.
    NIPS workshop on Bayesian Nonparametrics: The Next Generation, Dec 2015.
    Women in Machine Learning Workshop, Dec 2015.
  5. An iterative step-function estimator for graphons.
    Women in Machine Learning Workshop, Dec 2014.
  6. Efficient variational approximations for online Bayesian changepoint detection. New England Machine Learning Day Workshop, May 2014.
  7. Efficient variational approximations for online Bayesian changepoint detection. Women in Machine Learning Workshop, Dec 2013.

Teaching

Princeton University

COS 597C: Machine Learning for Healthcare. Teaching Assistant: Fall 2018.

University of Chicago

  • STAT 20000: Elementary Statistics. Teaching Assistant: Fall 2016.
  • STAT 22000: Statistical Methods and Applications. Teaching Assistant: Winter 2016, Spring 2016, Spring 2017

Harvard University

  • CS181: Machine Learning. Teaching Fellow, Fall 2013--Spring 2014