VIPaint: Image Inpainting with Pre-Trained Diffusion Models via Variational Inference

Agarwal, Sakshi; Hope, Gabriel; Heo, Jimin; Sudderth, Erik B.

VIPaint: Image Inpainting with Pre-Trained Diffusion Models via Variational Inference

Sakshi Agarwal¹, Gabriel Hope², Jimin Heo³, Erik B. Sudderth³

¹Accenture ²Swarthmore College ³University of California, Irvine
AISTATS 2026

Paper Code arXiv Poster

VIPaint produces diverse, high-quality inpaintings for large masked regions on state-of-the-art text-conditioned latent diffusion models.

Abstract

Diffusion probabilistic models learn to remove noise added during training, generating novel data (e.g., images) from Gaussian noise through sequential denoising. However, conditioning the generative process on corrupted or masked images is challenging. While various methods have been proposed for inpainting masked images with diffusion priors, they often fail to produce samples from the true conditional distribution, especially for large masked regions. Many baselines also cannot be applied to latent diffusion models which generate high-quality images with much lower computational cost. We propose a hierarchical variational inference algorithm that optimizes a non-Gaussian Markov approximation of the true diffusion posterior. Our VIPaint method outperforms existing approaches to inpainting, producing diverse high-quality imputations even for state-of-the-art text-conditioned latent diffusion models, and is also effective for other inverse problems like deblurring and superresolution.

Method

Optimization (top): The hierarchical approximate posterior of VIPaint is defined over a coarse sequence of intermediate latent steps, or keypoints, between h(K) and h(1). During optimization, the variational parameters λ defining the posterior at these sparse times are fit via a prior loss on times above h(K), a hierarchical loss defined across K keypoints, and a reconstruction loss estimated using a sample-based one-step approximation of p_θ(x | z_h(1)). After a single variational optimization, multiple samples may be drawn via gradient-based stochastic refinement.

BibTeX

@inproceedings{agarwal2026vipaint,
  title={{VIP}aint: Image Inpainting with Pre-Trained Diffusion Models via Variational Inference},
  author={Sakshi Agarwal and Gabriel Hope and Jimin Heo and Erik B. Sudderth},
  booktitle={The 29th International Conference on Artificial Intelligence and Statistics},
  year={2026},
  url={https://openreview.net/forum?id=0ehuNXBslr}
}