Denoised Smoothing: A Provable Defense for Pretrained Classifiers

  • Hadi Salman ,
  • Mingjie Sun ,
  • Greg Yang ,
  • Ashish Kapoor ,
  • J. Zico Kolter

NeurIPS 2020 |

Organized by ACM

We present a method for provably defending any pretrained image classifier against ℓp adversarial attacks. This method, for instance, allows public vision API providers and users to seamlessly convert pretrained non-robust classification services into provably robust ones. By prepending a custom-trained denoiser to any off-the-shelf image classifier and using randomized smoothing, we effectively create a new classifier that is guaranteed to be ℓp-robust to adversarial examples, without modifying the pretrained classifier. Our approach applies to both the white-box and the black-box settings of the pretrained classifier. We refer to this defense as denoised smoothing, and we demonstrate its effectiveness through extensive experimentation on ImageNet and CIFAR-10. Finally, we use our approach to provably defend the Azure, Google, AWS, and ClarifAI image classification APIs. Our code replicating all the experiments in the paper can be found on GitHub (opens in new tab).

Publication Downloads

Denoised Smoothing

September 21, 2020

This repository contains the code and models necessary to replicate the results of our recent paper: Denoised Smoothing: A Provable Defense for Pretrained Classifiers Hadi Salman, Mingjie Sun, Greg Yang, Ashish Kapoor, J. Zico Kolter Our paper presents a method for provably defending any pretrained image classifier against Lp adversarial attacks.