BayesAdapter: Being Bayesian, Inexpensively and Robustly, via Bayeisan Fine-tuning
Zhijie Deng Xiao Yang Hao Zhang Yinpeng Dong Jun Zhu
Tsinghua
Paper | PyTorch code
Abstract
Despite their theoretical appealingness, Bayesian neural networks (BNNs) are falling far behind in terms of adoption in real-world applications compared with normal NNs, mainly due to their limited scalability in training, and low fidelity in their uncertainty estimates. In this work, we develop a new framework, named BayesAdapter, to address these issues and bring Bayesian deep learning to the masses. The core notion of BayesAdapter is to adapt pre-trained deterministic NNs to be BNNs via Bayesian fine-tuning. We implement Bayesian fine-tuning with a plug-and-play instantiation of stochastic variational inference, and propose exemplar reparameterization to reduce gradient variance and stabilize the fine-tuning. Together, they enable training BNNs as if one were training deterministic NNs with minimal added overheads. During Bayesian fine-tuning, we further propose an uncertainty regularization to supervise and calibrate the uncertainty quantification of learned BNNs at low cost. To empirically evaluate BayesAdapter, we conduct extensive experiments on a diverse set of challenging benchmarks, and observe significantly higher training efficiency, better predictive performance, and more calibrated and faithful uncertainty estimates than existing BNNs.
Core Idea
Unfold the learning of a BNN into two steps: deterministic pre-training of the deep neural network (DNN) counterpart of the BNN followed by several-round Bayesian fine-tuning.Advantages
- We can learn a principled BNN with slightly more efforts than training a regular DNN.
- We can embrace qualified off-the-shelf pre-trained DNNs (e.g., those on PyTorch Hub).
- We can bypass extensive local optimum suffered by a direct learning of BNN.
Deterministic Pre-training
This stage trains a regular DNN via maximum a posteriori (MAP) estimation:Bayesian Fine-tuning
To render the fine-tuning in the style of training normal NNs, we resort to stochastic variational inference (VI) to update the approximate posterior. Typically, we maximize the evidence lower bound (ELBO):TWO FEATURES that distinguish us from existing variational BNNs and make the fine-tuning user-friendly and robust:
- Optimizers with built-in weight decay
- Exemplar reparametrization
Optimizers with built-in weight decay
Exemplar reparametrization
Uncertainty regularization
Results (predictive performance)
Results (quality of uncertainty estimates)
Some out-of-distribution samples used in validation phase
Citation
Zhijie Deng, Xiao Yang, Hao Zhang, Yinpeng Dong, and Jun Zhu. "BayesAdapter: Being Bayesian, Inexpensively and Robustly, via Bayeisan Fine-tuning". Bibtex