Exploring Gaps in DeepFool inSearch of More Effective Adversarial Perturbations
Abstract
Adversarial examples are inputs subtly perturbed to produce a wrong prediction in machine learning models, while remaining perceptually similar to the original input. To find adversarial examples, some attack strategies rely on linear approximations of different properties of the models. This opens a number of questions related to the accuracy of such approximations. In this paper we focus on DeepFool, a state-of-the-art attack algorithm, which is based on efficiently approximating the decision space of the target classifier to find the minimal perturbation needed to fool the model. The objective of this paper is to analyze the feasibility of finding inaccuracies in the linear approximation of DeepFool, with the aim of studying whether they can be used to increase the effectiveness of the attack. We introduce two strategies to efficiently explore gaps in the approximation of the decision boundaries, and evaluate our approach in a speech command classification task.