BayesAdapter: Being Bayesian, Inexpensively and Robustly, via Bayeisan Fine-tuning

Abstract

Despite their theoretical appealingness, Bayesian neural networks (BNNs) are falling far behind in terms of adoption in real-world applications compared with normal NNs, mainly due to their limited scalability in training, and low fidelity in their uncertainty estimates. In this work, we develop a new framework, named BayesAdapter, to address these issues and bring Bayesian deep learning to the masses. The core notion of BayesAdapter is to adapt pre-trained deterministic NNs to be BNNs via Bayesian fine-tuning. We implement Bayesian fine-tuning with a plug-and-play instantiation of stochastic variational inference, and propose exemplar reparameterization to reduce gradient variance and stabilize the fine-tuning. Together, they enable training BNNs as if one were training deterministic NNs with minimal added overheads. During Bayesian fine-tuning, we further propose an uncertainty regularization to supervise and calibrate the uncertainty quantification of learned BNNs at low cost. To empirically evaluate BayesAdapter, we conduct extensive experiments on a diverse set of challenging benchmarks, and observe significantly higher training efficiency, better predictive performance, and more calibrated and faithful uncertainty estimates than existing BNNs.

Core Idea

Unfold the learning of a BNN into two steps: deterministic pre-training of the deep neural network (DNN) counterpart of the BNN followed by several-round Bayesian fine-tuning.
Advantages

We can learn a principled BNN with slightly more efforts than training a regular DNN.
We can embrace qualified off-the-shelf pre-trained DNNs (e.g., those on PyTorch Hub).
We can bypass extensive local optimum suffered by a direct learning of BNN.

Deterministic Pre-training

This stage trains a regular DNN via maximum a posteriori (MAP) estimation:

Advanced techniques (e.g., radical data augmentation, batch normalization, data-parallel distributed training) can be freely incorporated to improve the parameter training.

Bayesian Fine-tuning

To render the fine-tuning in the style of training normal NNs, we resort to stochastic variational inference (VI) to update the approximate posterior. Typically, we maximize the evidence lower bound (ELBO):

where q is the approximate posterior initailized as a Gaussian centered at the converged parameters of deterministic pre-training, and p denotes the non-informative parameter prior (we consider an isotropic Gaussian prior without losing generality).
TWO FEATURES that distinguish us from existing variational BNNs and make the fine-tuning user-friendly and robust:

Optimizers with built-in weight decay
Exemplar reparametrization

Optimizers with built-in weight decay

Exemplar reparametrization

Uncertainty regularization

Results (predictive performance)

Results (quality of uncertainty estimates)

Some out-of-distribution samples used in validation phase

Citation

Zhijie Deng, Xiao Yang, Hao Zhang, Yinpeng Dong, and Jun Zhu. "BayesAdapter: Being Bayesian, Inexpensively and Robustly, via Bayeisan Fine-tuning". Bibtex