Variational inference (VI) approximates an intractable target distribution with the closest member of a tractable family. A central challenge in black-box VI is that standard implementations, which minimize the KL divergence using stochastic gradient descent (SGD), converge slowly due to gradient noise and sensitivity to learning parameters. These issues become more pronounced when using expressive variational families, which are necessary for accurate inference in complex scientific problems.
My work shows that score matching—aligning the gradients of the log densities of the target model and variational approximation—enables faster, more reliable optimization, often yielding closed-form or convex subproblems. We introduced Batch and Match (BaM), an iterative score-matching method for fitting full-covariance Gaussians. BaM minimizes an objective based on a novel score matching divergence between the variational density $q$ and the target density $p$:
$$\mathscr{D}(q,p) = \mathbb{E}_q\left[|| \nabla \log q - \nabla \log p||^2_{\text{Cov}(q)}\right].$$
The BaM objective admits closed-form updates to the global minimum of each subproblem. Empirically, BaM yields 10–100x speedups over stochastic gradient-based BBVI in applications to hierarchical Bayes and deep generative modeling. Theoretically, we prove that for Gaussian targets, BaM converges exponentially fast to the true parameters. To use BaM in high-dimensional settings, we augmented the BaM update with a “Patch” operation, a structured update that projects the covariance update to a low-rank plus diagonal form in linear time and memory. We then demonstrated its effectiveness on high-dimensional latent Gaussian processes and other examples with dimensions up to $8000$.
In many scientific domains, target distributions exhibit skewness, heavy tails, and multimodality, which are beyond the reach of Gaussian approximations. To address these situations, We have developed richer variational families that avoid the need for SGD via score matching. The first, EigenVI, uses orthonormal basis expansions, and score matching reduces inference to solving an eigenvalue problem, from which EigenVI derives its name. The second, based on a product of experts (PoE), combines multivariate-$t$ experts whose contribution is determined by a geometric weighting. We show that this family becomes tractable using a Feynman parameterization identity from physics. We then developed an efficient score matching algorithm that reduces the problem to solving a sequence of nonnegative least-squares problems. We characterized convergence rates of this algorithm, and empirically demonstrated the effectiveness of this algorithm.