Hi there! I’m an MS-Ph.D. student in Computer Science at University of Massachusetts Amherst, advised by Prof. Justin Domke. I am broadly interested in Probabilistic Machine Learning. These days, I focus on scaling and automating variational inference methods for large datasets.
Prior to this I spent four wonderful years at IIT Kanpur acquiring background in machine learning and electrical engineering.
Ph.D. in Computer Science, 2018-Present
University of Massachusetts Amherst
B.Tech, 2018
Indian Institute of Technology Kanpur
It is difficult to use subsampling with variational inference in hierarchical models since the number of local latent variables scales with the dataset. Thus, inference in hierarchical models remains a challenge at large scale. It is helpful to use a variational family with a structure matching the posterior, but optimization is still slow due to the huge number of local distributions. Instead, this paper suggests an amortized approach where shared parameters simultaneously represent all local distributions, and a “feature pooling” network is learned to represent conditionally i.i.d. observations. This approach is similarly accurate as using a given joint distribution (e.g. a full-rank Gaussian) but is feasible on datasets that are several orders of magnitude larger. It is also dramatically faster than using a structured variational distribution.
Recent research has seen several advances relevant to black-box variational inference (VI), but the current state of automatic posterior inference is unclear. One such advance is the use of normalizing flows to define flexible posterior densities for deep latent variable models. Another direction is the integration of Monte-Carlo methods to serve two purposes; first, to obtain tighter variational objectives for optimization, and second, to define enriched variational families through sampling. However, both flows and variational Monte-Carlo methods remain relatively unexplored for black-box VI. Moreover, on a pragmatic front, there are several optimization considerations like step-size scheme, parameter initialization, and choice of gradient estimators, for which there is no clear guidance in the literature. In this paper, we postulate that black-box VI is best addressed through a careful combination of numerous algorithmic components. We evaluate components relating to optimization, flows, and Monte-Carlo methods on a benchmark of 30 models from the Stan model library. The combination of these algorithmic components significantly advances the state-of-the-art “out of the box” variational inference.