Adversarial examples are small perturbation to an example that is negligible to humans but changes the decision of a computer system. It is first discovered in object recognition (Szegedy et al. 2014)[1] but later found in natural language systems as well (Jia and Liang, 2017)[2]. In terms of models, neural networks, linear models (e.g. SVM) and decision trees are known to suffer from adversarial examples (Zhou et al. 2021[3], among others). The phenomenon is broadly popularized via news about autonomous cars misinterpreting stop signs as speed limit signs, state-of-the-art computer vision systems misinterpreting cats as desktop computers, mistaking face for non-face, gibberish patterns for faces, and one face for another. The phenomenon reveals a fundamental flaw in a big class of classifiers (Goodfellow et al. 2014)[4].
TODO: https://pdfs.semanticscholar.org/7330/0838d524d062e8341b242765fb6efaf48f43.pdf
https://www.cs.uoregon.edu/Reports/AREA-201406-Torkamani.pdf
https://arxiv.org/pdf/1207.0245.pdf
Subspaces of transferable adversarial examples: Tramèr et al. (2017)[5]
Universal adversarial perturbation: https://arxiv.org/pdf/1610.08401.pdf
History[]
From Goodfellow (2017):
- “Adversarial Classification” Dalvi et al 2004: fool spam filter
- “Evasion Attacks Against Machine Learning at Test Time”
- Biggio 2013: fool neural nets
- Szegedy et al 2013: fool ImageNet classifiers imperceptibly
- Goodfellow et al 2014: cheap, closed form attack
Explanations[]
TODO: a survey with a list of hypotheses: Serban et al. (2020)[6]
- Execessive non-linearity and "blind spots": the first explanation, proposed by Szegedy et al. themselves[1]
- Local linearity: Goodfellow et al. (2014)[4]
- Data complexity of robust generalization (with no prior at all? what about robust generalization with the right prior?): Schmidt et al. (2018)[7]
- "Identifying a robust classifier from limited training data is information theoretically possible but computationally intractable" (at least for a family of models called "statistical query"): Bubeck et al. (2018)[8]
- "high dimensional geometry of data manifold" (but hey, people can do it...): Gilmer et al. (2018)[9]
- inevitable consequence of "concentration of measure" in metric measure space (but does our problem has it?): Mahloujifar et al. (2019)[10]
- non-robust features (of the input) that are useful for normal classification but not for robust classification: Ilyas et al. (2019)[11], extended by Springer et al. (2021)[12]
- "We define a feature to be a function mapping from the input space X to the real numbers, ... Note that this formal definition also captures what we abstractly think of as features (e.g., we can construct an f that captures how “furry” an image is)"
Some claim that adversarial examples are inevitable (hey, humans seem to be robust against them?):
- Alhussein Fawzi, Hamza Fawzi, and Omar Fawzi. Adversarial vulnerability for any classifier, 2018. URL https://arxiv.org/pdf/arXiv:1802.08686.pdf.
- Justin Gilmer, Luke Metz, Fartash Faghri, Sam Schoenholz, Maithra Raghu, Martin Wat- tenberg, and Ian Goodfellow. Adversarial spheres. In International Conference on Learning Representations Workshop, 2018. URL https://arxiv.org/pdf/1801.02774.pdf.
Adversarial examples in computer vision[]
Tasks[]
- Object recognition: see the survey Serban et al. (2020)[6]
- Edge detection: Cosgrove and Yuille (2020)[13]
- Semantic segmentation: Xie et al. (2017)[14]
- Facial recognition: Sharif et al. (2016)[15]
- Video classification: Li et al. (2018)[16]
TODO: Simen Thys, Wiebe Van Ranst, and Toon Goedemé. 2019. Fooling automated surveillance cameras: Adversarial patches to attack person detection. arXiv:1904.08653 (2019).
TODO: Xingxing Wei, Siyuan Liang, Xiaochun Cao, and Jun Zhu. 2018. Transferable adversarial attacks for image and video object detection. arXiv:1811.12641 (2018).
Attacks[]
TODO: find refs
- Small perturbation/imperceptible
- Color attacks
- Negative images[19]
- Random color substitution[20]
- ColorFool[21]
- Small recoloring (combined with perturbation)[22]
- contrast, brightness, grayscale conversion, intensity, solarize: Volpi & Murino (2019)[23]
- lots of filters: FilterFool (Shamsabadi et al. 2020)[24]
- Adversarial color enhancement (Zhao et al. 2020)[25][26]
- more color filters: Kantipudi et al. (2020)[27]
- yet more color perturbation: Bhattad et al. (2020)[28]
- structure-preserving
- camera shake???[33]
- Few-pixel attack
- One pixel
- k pixels
- Semantic attacks
- Shadow attacks[34]
- Justaposition/occlusion attacks
- Feature-space attacks (white box, using the internal features to craft images)
- Generative attacks (not using internal features)
- Structure-preserving attack? Peng et al. (2020)[44]
- move across time (in a video): Shankar et al. (2019)[45]
AutoAttack[]
Developed by Croce & Hein (2020)[46]
C&W[]
A black box attack?[]
Transfer attack?[]
Defences[]
Adversarial training: so far the most successful defense.
TODO: lots and lots of defences
- SVD (Jere et al. 2020)[47]
Confirmed fails[]
TODO: a lot fall into this category
- Shown to perform gradient obsfucation and was broken in Athalye et al. (2018) [48]: Distillation (Papernot et al. 2016), Thermometer encoding (Buckman et al. 2018[49])
- Ensembling: https://arxiv.org/abs/1706.04701
- Reported by Croce & Hein (2020)[46]: Mixture of RBF (Taghanaki et al., 2019[50]), restricting hidden space (Mustafa et al. 2019[51]), among around 50 models
Demonstrated on simple datasets only[]
- Tested on MNIST only: convex outer polytope (Wong & Kolter, 2008[52])?
Adversarial examples in natural language processing[]
Tasks[]
Classifying text in to categories (e.g. Sports, Business), reviews into good/bad (Soll et al. 2019) [53]
Attacks[]
From (Soll et al. 2019) [53]: "algorithm by Samanta and Mehta [22], where the candidate pool P, from which possible words for insertion and re- placement are drawn, was created from the following sources:
- Synonyms gathered from the WordNet dataset [5],
- Typos from a dataset [16] to ensure that the typos inserted are not recognized as artificial since they occur in normal texts written by humans, and
- Keywords specific for one input class which were found by looking at all training sentences and extracting words only found in one class."
Defenses[]
Distillation is shown to be ineffective (again) (Soll et al. 2019) [53]
TODO: Jia and Jiang (2017)[54]: data augmentation not effective?
Adversarial examples in other machine learning areas[]
TODO: in reinforcement learning
TODO: from Serban et al. (2020)[6] ismalware detection [68, 78, 94, 101, 179], because it implies direct consequences on security. Other tasks such as reinforcement learning [10, 80, 106], speech recognition [23, 27], facial recognition [150], semantic segmentation [178] [...] are also explored
TODO: Yefet, N., Alon, U., & Yahav, E. (2020). Adversarial examples for models of code. Proceedings of the ACM on Programming Languages, 4(OOPSLA), 1-30.
Evaluation[]
Methodogloy[]
- Careful to avoid gradient obfuscation: Athalye et al.[48]
- Check if random/transfer/blackbox attacks are more effective than whitebox attacks (red flag)
- Always hand-design adaptive attacks for evaluation (unless simpler attacks suffice), careful as many things can go wrong there: Tramer et al. (2020)[55]. The authors identified 6 "themes" on how to create effective adaptive attacks:
- T0: Strive for simplicity
- T1: Attack (a function close to) the full defense
- T2: Identify and target important defense parts
- T3: Adapt the objective to simplify the attack
- T4: Ensure the loss function is consistent
- T5: Optimize the loss function with different methods
- T6: Use a strong adaptive attack for adversarial training
Choice of attacks[]
Strong attacks that are recommended:
- Custom-made adaptive attack (see previous section)
- AutoAttack (see section #Attacks)
- C&W (see section #Attacks)
- A transfer attack
- A blackbox attack
Weak attacks to avoid:
- The use of PGD and FGSM has been criticized by Croce & Hein (2020)[46]
Datasets[]
CIFAR-10[]
TODO: Qin et al. (2019)[56]
Datasets with built-in perturbation[]
These are easier to use but are weaker.
Software[]
References[]
- ↑ 1.0 1.1 Szegedy, Christian, Zaremba, Wojciech, Sutskever, Ilya, Bruna, Joan, Erhan, Dumitru, Goodfellow, Ian J., and Fergus, Rob. Intriguing properties of neural networks. ICLR, abs/1312.6199, 2014b. URL http: //arxiv.org/abs/1312.6199.
- ↑ Jia, Robin, and Percy Liang. "Adversarial Examples for Evaluating Reading Comprehension Systems." arXiv preprint arXiv:1707.07328 (2017).
- ↑ Zhou, D., Liu, T., Han, B., Wang, N., Peng, C., & Gao, X. (2021). Towards Defending against Adversarial Examples via Attack-Invariant Features. In M. Meila & T. Zhang (Eds.), Proceedings of the 38th International Conference on Machine Learning (Vol. 139, pp. 12835–12845). PMLR. Retrieved from http://proceedings.mlr.press/v139/zhou21e.html
- ↑ 4.0 4.1 Goodfellow, Ian J., Jonathon Shlens, and Christian Szegedy. "Explaining and harnessing adversarial examples." arXiv preprint arXiv:1412.6572 (2014).
- ↑ Tramèr, Florian, et al. "The Space of Transferable Adversarial Examples." arXiv preprint arXiv:1704.03453 (2017).
- ↑ 6.0 6.1 6.2 Serban, A., Poll, E., & Visser, J. (2020). Adversarial Examples on Object Recognition. ACM Computing Surveys, 53(3), 1–38. https://doi.org/10.1145/3398394
- ↑ Schmidt, L., Talwar, K., Santurkar, S., Tsipras, D., & Madry, A. (2018). Adversarially robust generalization requires more data. Advances in Neural Information Processing Systems, 2018-Decem(NeurIPS), 5014–5026.
- ↑ Bubeck, S., Price, E., & Razenshteyn, I. (2018). Adversarial examples from computational constraints, 1–19. Retrieved from http://arxiv.org/abs/1805.10204
- ↑ Gilmer, J., Metz, L., Faghri, F., Schoenholz, S. S., Raghu, M., Wattenberg, M., & Goodfellow, I. (2018). The Relationship Between High-Dimensional Geometry and Adversarial Examples. Retrieved from http://arxiv.org/abs/1801.02774
- ↑ Mahloujifar, S., Diochnos, D. I., & Mahmoody, M. (2019). The Curse of Concentration in Robust Learning: Evasion and Poisoning Attacks from Concentration of Measure. Proceedings of the AAAI Conference on Artificial Intelligence, 33, 4536–4543. https://doi.org/10.1609/aaai.v33i01.33014536
- ↑ Ilyas, A., Santurkar, S., Tsipras, D., Engstrom, L., Tran, B., & Madry, A. (2019). Adversarial Examples Are Not Bugs, They Are Features. Retrieved from http://arxiv.org/abs/1905.02175
- ↑ Springer, J. M., Mitchell, M., & Kenyon, G. T. (2021). Adversarial Perturbations Are Not So Weird: Entanglement of Robust and Non-Robust Features in Neural Network Classifiers. Retrieved from http://arxiv.org/abs/2102.05110
- ↑ Cosgrove, C., & Yuille, A. L. (2020). Adversarial examples for edge detection: They exist, and they transfer. Proceedings - 2020 IEEE Winter Conference on Applications of Computer Vision, WACV 2020, 1059–1068. https://doi.org/10.1109/WACV45572.2020.9093304
- ↑ Xie, C., Wang, J., Zhang, Z., Zhou, Y., Xie, L., & Yuille, A. (2017). Adversarial Examples for Semantic Segmentation and Object Detection. Proceedings of the IEEE International Conference on Computer Vision, 2017-Octob, 1378–1387. https://doi.org/10.1109/ICCV.2017.153
- ↑ Sharif, M., Bhagavatula, S., Bauer, L., & Reiter, M. K. (2016). Accessorize to a crime: Real and stealthy attacks on state-of-the-art face recognition. Proceedings of the ACM Conference on Computer and Communications Security, 1528–1540. https://doi.org/10.1145/2976749.2978392
- ↑ Li, S., Neupane, A., Paul, S., Song, C., Krishnamurthy, S. V., Chowdhury, A. K. R., & Swami, A. (2018). Adversarial Perturbations Against Real-Time Video Classification Systems. ArXiv. https://doi.org/10.14722/ndss.2019.23202
- ↑ Quan, P., Guo, R., & Srivastava, M. (n.d.). Towards Imperceptible Query-limited Adversarial Attacks with Perceptual Feature Fidelity Loss, 1–11.
- ↑ Dabouei, A., Soleymani, S., Taherkhani, F., Dawson, J., & Nasrabadi, N. M. (2020). SmoothFool: An efficient framework for computing smooth adversarial perturbations. Proceedings - 2020 IEEE Winter Conference on Applications of Computer Vision, WACV 2020, 2654–2663. https://doi.org/10.1109/WACV45572.2020.9093429
- ↑ Hosseini, H., Xiao, B., Jaiswal, M., & Poovendran, R. (2017). On the limitation of convolutional neural networks in recognizing negative images. Proceedings - 16th IEEE International Conference on Machine Learning and Applications, ICMLA 2017, 352–358. https://doi.org/10.1109/ICMLA.2017.0-136
- ↑ Hosseini, H., & Poovendran, R. (2018). Semantic Adversarial Examples. CVPR 2018, 1727–1732. Retrieved from http://openaccess.thecvf.com/content_cvpr_2018_workshops/papers/w32/Hosseini_Semantic_Adversarial_Examples_CVPR_2018_paper.pdf
- ↑ Shahin Shamsabadi, A., Sanchez-Matilla, R., & Cavallaro, A. (2020). ColorFool: Semantic Adversarial Colorization. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1148–1157. https://doi.org/10.1109/CVPR42600.2020.00123
- ↑ Laidlaw, C., & Feizi, S. (2019). Functional Adversarial Attacks. ArXiv, (NeurIPS), 1–16.
- ↑ 23.0 23.1 Volpi, R., & Murino, V. (2019). Addressing model vulnerability to distributional shifts over image transformation sets. Proceedings of the IEEE/CVF International Conference on Computer Vision, 7980–7989.
- ↑ Peng, D., Zheng, Z., Luo, L., & Zhang, X. (2020). Structure matters: Towards generating transferable adversarial images. Frontiers in Artificial Intelligence and Applications, 325, 1419–1426. https://doi.org/10.3233/FAIA200247
- ↑ Zhao, Z., Liu, Z., & Larson, M. (2020). Adversarial color enhancement: Generating unrestricted adversarial images by optimizing a color filter. ArXiv, 1–14.
- ↑ Zhao, Z., Liu, Z., & Larson, M. (2020). Adversarial robustness against image color transformation within parametric filter space. ArXiv, 1–20.
- ↑ Kantipudi, J., Dubey, S. R., & Chakraborty, S. (2020). Color Channel Perturbation Attacks for Fooling Convolutional Neural Networks and A Defense Against Such Attacks. IEEE Transactions on Artificial Intelligence. Retrieved from http://arxiv.org/abs/2012.14456
- ↑ 28.0 28.1 Bhattad, A., Chong, M. J., Liang, K., Li, B., & Forsyth, D. A. (2019). Unrestricted adversarial examples via semantic manipulation. ArXiv, (2018), 1–19.
- ↑ Peng, D., Zheng, Z., & Zhang, X. (2018). Structure-preserving transformation: Generating diverse and transferable adversarial examples. ArXiv.
- ↑ Shamsabadi, A. S., Oh, C., & Cavallaro, A. (2020). Edgefool: An Adversarial Image Enhancement Filter. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2020-May(2), 1898–1902. https://doi.org/10.1109/ICASSP40776.2020.9054368
- ↑ Naderi, H., Goli, L., & Kasaei, S. (2021). Generating Unrestricted Adversarial Examples via Three Parameters. Retrieved from http://arxiv.org/abs/2103.07640
- ↑ Li, L., Weber, M., Xu, X., Rimanic, L., Xie, T., Zhang, C., & Li, B. (2020). Provable robust learning based on transformation-specific smoothing. In ICML Workshop on Uncertainty & Robustness in Deep Learning (UDL) 2020.
- ↑ 33.0 33.1 Ho, C. H., Leung, B., Sandstrom, E., Chang, Y., & Vasconcelos, N. (2019). Catastrophic child’s play: Easy to perform, hard to defend adversarial attacks. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2019-June, 9221–9229. https://doi.org/10.1109/CVPR.2019.00945
- ↑ Ghiasi, A., Shafahi, A., & Goldstein, T. (2020). Breaking certified defenses: semantic adversarial examples with spoofed robustness certificates. In International Conference on Learning Representations. Retrieved from https://openreview.net/forum?id=HJxdTxHYvB
- ↑ Duan, R., Mao, X., Qin, A. K., Yang, Y., Chen, Y., Ye, S., & He, Y. (2021). Adversarial Laser Beam: Effective Physical-World Attack to DNNs in a Blink. Retrieved from http://arxiv.org/abs/2103.06504
- ↑ Duan, R., Ma, X., Wang, Y., Bailey, J., Qin, A. K., & Yang, Y. (2020). Adversarial camouflage: Hiding physical-world attacks with natural styles. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 997–1005. https://doi.org/10.1109/CVPR42600.2020.00108
- ↑ Xu, Q., Tao, G., & Zhang, X. (2020). D2B: Deep distribution bound for natural-looking adversarial attack. ArXiv, 1–26.
- ↑ Xu, Q., Tao, G., Cheng, S., Tan, L., & Zhang, X. (2020). Towards feature space adversarial attack. ArXiv.
- ↑ Song, Y., Kushman, N., Shu, R., & Ermon, S. (2018). Constructing unrestricted adversarial examples with generative models. Advances in Neural Information Processing Systems, 2018-December(NeurIPS), 8312–8323.
- ↑ Jain, L. (2020). Generating Semantic Adversarial Examples through Differentiable Rendering.
- ↑ Wang, D., Li, C., Wen, S., Nepal, S., & Xiang, Y. (2019). Man-in-the-middle attacks against machine learning classifiers via malicious generative models. ArXiv, (October), 1–12.
- ↑ Dunn, I., Hanu, L., Pouget, H., Kroening, D., & Melham, T. (2020). Evaluating Robustness to Context-Sensitive Feature Perturbations of Different Granularities. Retrieved from http://arxiv.org/abs/2001.11055
- ↑ Song, Y., Kushman, N., Shu, R., & Ermon, S. (2018). Generative Adversarial Examples, 8312–8323.
- ↑ Peng, D., Zheng, Z., Luo, L., & Zhang, X. (n.d.). Structure Matters : Towards Generating Transferable Adversarial Images.
- ↑ Shankar, V., Dave, A., Roelofs, R., Ramanan, D., Recht, B., & Schmidt, L. (2019). Do Image Classifiers Generalize Across Time? Retrieved from http://arxiv.org/abs/1906.02168
- ↑ 46.0 46.1 46.2 Croce, F., & Hein, M. (2020). Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. 37th International Conference on Machine Learning, ICML 2020, PartF168147-3, 2184–2194.
- ↑ Jere, M., Kumar, M., & Koushanfar, F. (2020). A singular value perspective on model robustness. ArXiv.
- ↑ 48.0 48.1 Athalye, A., Carlini, N., & Wagner, D. (2018). Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. 35th International Conference on Machine Learning, ICML 2018, 1, 436–448.
- ↑ Buckman, J., Roy, A., Raffel, C., & Goodfellow, I. (2018). Thermometer encoding: One hot way to resist adversarial examples. 6th International Conference on Learning Representations, ICLR 2018 - Conference Track Proceedings, (2016), 1–22.
- ↑ Taghanaki, S. A., Abhishek, K., Azizi, S., and Hamarneh, G. A kernelized manifold mapping to diminish the effect of adversarial perturbations. In CVPR, 2019.
- ↑ Mustafa, A., Khan, S., Hayat, M., Goecke, R., Shen, J., & Shao, L. (2019). Adversarial Defense by Restricting the Hidden Space of Deep Neural Networks.
- ↑ Wong, E., & Kolter, J. Z. (2018). Provable defenses against adversarial examples via the convex outer adversarial polytope. 35th International Conference on Machine Learning, ICML 2018, 12, 8405–8423.
- ↑ 53.0 53.1 53.2 Soll, M., Hinz, T., Magg, S., & Wermter, S. (2019). Evaluating Defensive Distillation for Defending Text Processing Neural Networks Against Adversarial Examples. International Conference on Artificial Neural Networks (ICANN), 685–696. https://doi.org/10.1007/978-3-030-30508-6_54
- ↑ Jia, R., Liang, P.: Adversarial examples for evaluating reading comprehension sys- tems. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. pp. 2021–2031 (2017). DOI: 10.18653/v1/D17-1215
- ↑ Tramer, F., Carlini, N., Brendel, W., & Madry, A. (2020). On adaptive attacks to adversarial example defenses. ArXiv Preprint ArXiv:2002.08347.
- ↑ Qin, C., Martens, J., Gowal, S., Krishnan, D., Krishnamurthy, Dvijotham, … Kohli, P. (2019). Adversarial Robustness through Local Linearization, (NeurIPS), 1–17. Retrieved from http://arxiv.org/abs/1907.02610