Photo by rawpixel on Unsplash

Variational inference with score matching

Sep 27, 2020

Variational inference (VI) approximates an intractable target distribution with the closest member of a tractable family. A central challenge in black-box VI is that standard implementations, which minimize the KL divergence using stochastic gradient descent (SGD), converge slowly due to gradient noise and sensitivity to learning parameters. These issues become more pronounced when using expressive variational families, which are necessary for accurate inference in complex scientific problems.

My work shows that score matching—aligning the gradients of the log densities of the target model and variational approximation—enables faster, more reliable optimization, often yielding closed-form or convex subproblems. We introduced Batch and Match (BaM), an iterative score-matching method for fitting full-covariance Gaussians. BaM minimizes an objective based on a novel score matching divergence between the variational density $q$ and the target density $p$:

$$\mathscr{D}(q,p) = \mathbb{E}_q\left[|| \nabla \log q - \nabla \log p||^2_{\text{Cov}(q)}\right].$$

The BaM objective admits closed-form updates to the global minimum of each subproblem. Empirically, BaM yields 10–100x speedups over stochastic gradient-based BBVI in applications to hierarchical Bayes and deep generative modeling. Theoretically, we prove that for Gaussian targets, BaM converges exponentially fast to the true parameters. To use BaM in high-dimensional settings, we augmented the BaM update with a “Patch” operation, a structured update that projects the covariance update to a low-rank plus diagonal form in linear time and memory. We then demonstrated its effectiveness on high-dimensional latent Gaussian processes and other examples with dimensions up to $8000$.

In many scientific domains, target distributions exhibit skewness, heavy tails, and multimodality, which are beyond the reach of Gaussian approximations. To address these situations, We have developed richer variational families that avoid the need for SGD via score matching. The first, EigenVI, uses orthonormal basis expansions, and score matching reduces inference to solving an eigenvalue problem, from which EigenVI derives its name. The second, based on a product of experts (PoE), combines multivariate-$t$ experts whose contribution is determined by a geometric weighting. We show that this family becomes tractable using a Feynman parameterization identity from physics. We then developed an efficient score matching algorithm that reduces the problem to solving a sequence of nonnegative least-squares problems. We characterized convergence rates of this algorithm, and empirically demonstrated the effectiveness of this algorithm.

selected

Publications

Fisher meets Feynman: score-based variational inference with a product of experts
Diana Cai, Robert M. Gower, David M. Blei, Lawrence K. Saul
Advances in Neural Information Processing Systems (NeurIPS), 2025
Spotlight presentation

Batch, match, and patch: low-rank approximations for score-based variational inference
Chirag Modi*, Diana Cai*, Lawrence K. Saul
Proceedings of the 28th International Conference on Artificial Intelligence and Statistics (AISTATS), 2025

PDF arXiv

EigenVI: score-based variational inference with orthogonal function expansions
Diana Cai, Chirag Modi, Charles C. Margossian, Robert M. Gower, David M. Blei, Lawrence K. Saul
Advances in Neural Information Processing Systems (NeurIPS), 2024
Spotlight presentation

PDF arXiv

Batch and match: black-box variational inference using a score-based divergence
Diana Cai, Chirag Modi, Loucas Pillaud-Vivien, Charles C. Margossian, Robert M. Gower, David M. Blei, Lawrence K. Saul
Proceedings of the 41st International Conference on Machine Learning (ICML), 2024
Spotlight presentation

PDF Code arXiv ICML Poster