Natural Language Understanding Wiki
Register
Advertisement

Adversarial examples are small perturbation to an example that is negligible to humans but changes the decision of a computer system. It is first discovered in object recognition (Szegedy et al. 2014)[1] but later found in natural language systems as well (Jia and Liang, 2017)[2]. In terms of models, neural networks, linear models (e.g. SVM) and decision trees are known to suffer from adversarial examples (Zhou et al. 2021[3], among others). The phenomenon is broadly popularized via news about autonomous cars misinterpreting stop signs as speed limit signs, state-of-the-art computer vision systems misinterpreting cats as desktop computers, mistaking face for non-face, gibberish patterns for faces, and one face for another. The phenomenon reveals a fundamental flaw in a big class of classifiers (Goodfellow et al. 2014)[4].

TODO: https://pdfs.semanticscholar.org/7330/0838d524d062e8341b242765fb6efaf48f43.pdf

https://www.cs.uoregon.edu/Reports/AREA-201406-Torkamani.pdf

https://arxiv.org/pdf/1207.0245.pdf

Subspaces of transferable adversarial examples: Tramèr et al. (2017)[5]

Universal adversarial perturbation: https://arxiv.org/pdf/1610.08401.pdf

History[]

From Goodfellow (2017):

  1. “Adversarial Classification” Dalvi et al 2004: fool spam filter
  2. “Evasion Attacks Against Machine Learning at Test Time”
  3. Biggio 2013: fool neural nets
  4. Szegedy et al 2013: fool ImageNet classifiers imperceptibly
  5. Goodfellow et al 2014: cheap, closed form attack

Explanations[]

TODO: a survey with a list of hypotheses: Serban et al. (2020)[6]

  • Execessive non-linearity and "blind spots": the first explanation, proposed by Szegedy et al. themselves[1]
  • Local linearity: Goodfellow et al. (2014)[4]
  • Data complexity of robust generalization (with no prior at all? what about robust generalization with the right prior?): Schmidt et al. (2018)[7]
  • "Identifying a robust classifier from limited training data is information theoretically possible but computationally intractable" (at least for a family of models called "statistical query"): Bubeck et al. (2018)[8]
  • "high dimensional geometry of data manifold" (but hey, people can do it...): Gilmer et al. (2018)[9]
  • inevitable consequence of "concentration of measure" in metric measure space (but does our problem has it?): Mahloujifar et al. (2019)[10]
  • non-robust features (of the input) that are useful for normal classification but not for robust classification: Ilyas et al. (2019)[11], extended by Springer et al. (2021)[12]
    • "We define a feature to be a function mapping from the input space X to the real numbers, ... Note that this formal definition also captures what we abstractly think of as features (e.g., we can construct an f that captures how “furry” an image is)"

Some claim that adversarial examples are inevitable (hey, humans seem to be robust against them?):

  • Alhussein Fawzi, Hamza Fawzi, and Omar Fawzi. Adversarial vulnerability for any classifier, 2018. URL https://arxiv.org/pdf/arXiv:1802.08686.pdf.
  • Justin Gilmer, Luke Metz, Fartash Faghri, Sam Schoenholz, Maithra Raghu, Martin Wat- tenberg, and Ian Goodfellow. Adversarial spheres. In International Conference on Learning Representations Workshop, 2018. URL https://arxiv.org/pdf/1801.02774.pdf.


Adversarial examples in computer vision[]

Tasks[]

  • Object recognition: see the survey Serban et al. (2020)[6]
  • Edge detection: Cosgrove and Yuille (2020)[13]
  • Semantic segmentation: Xie et al. (2017)[14]
  • Facial recognition: Sharif et al. (2016)[15]
  • Video classification: Li et al. (2018)[16]

TODO: Simen Thys, Wiebe Van Ranst, and Toon Goedemé. 2019. Fooling automated surveillance cameras: Adversarial patches to attack person detection. arXiv:1904.08653 (2019).

TODO: Xingxing Wei, Siyuan Liang, Xiaochun Cao, and Jun Zhu. 2018. Transferable adversarial attacks for image and video object detection. arXiv:1811.12641 (2018).

Attacks[]

TODO: find refs

  • Small perturbation/imperceptible
    • norm constrained (l2, l-inf, etc.): lots of papers
    • "perceptual feature fidelity" constraint? [17]
    • SmoothFool[18]
  • Color attacks
    • Negative images[19]
    • Random color substitution[20]
    • ColorFool[21]
    • Small recoloring (combined with perturbation)[22]
    • contrast, brightness, grayscale conversion, intensity, solarize: Volpi & Murino (2019)[23]
    • lots of filters: FilterFool (Shamsabadi et al. 2020)[24]
    • Adversarial color enhancement (Zhao et al. 2020)[25][26]
    • more color filters: Kantipudi et al. (2020)[27]
    • yet more color perturbation: Bhattad et al. (2020)[28]
  • structure-preserving
    • Peng et al. (2018)???[29]
    • sharpness: Volpi & Murino (2019)[23]
    • EdgeFool[30]
    • Shifting/deforming: [31]
    • Rotation & translation: Li et al. 2020[32]
  • camera shake???[33]
  • Few-pixel attack
    • One pixel
    • k pixels
  • Semantic attacks
  • Shadow attacks[34]
  • Justaposition/occlusion attacks
    • "Adversarial turtles"
    • "Invisible cloaks"
    • banners
    • Adver-sarial Laser Beam: Duan et al. (2021)[35]
    • Adversarial Camouflage[36]
  • Feature-space attacks (white box, using the internal features to craft images)
    • D2B (Xu et al. 2020)[37]
    • Xu et al. (2020)[38]
  • Generative attacks (not using internal features)
    • using GAN: Song et al. (2018)[39]
    • differentiable rendering: Jain (2020)[40]
    • VAE??[41]
    • pose???[33]
    • more GAN??[42]
    • yet more GAN?? [43]
    • style transfer (texture)?[28]
  • Structure-preserving attack? Peng et al. (2020)[44]
  • move across time (in a video): Shankar et al. (2019)[45]

AutoAttack[]

Developed by Croce & Hein (2020)[46]

C&W[]

A black box attack?[]

Transfer attack?[]

Defences[]

Adversarial training: so far the most successful defense.

TODO: lots and lots of defences

  • SVD (Jere et al. 2020)[47]

Confirmed fails[]

TODO: a lot fall into this category

  • Shown to perform gradient obsfucation and was broken in Athalye et al. (2018) [48]: Distillation (Papernot et al. 2016), Thermometer encoding (Buckman et al. 2018[49])
  • Ensembling: https://arxiv.org/abs/1706.04701
  • Reported by Croce & Hein (2020)[46]: Mixture of RBF (Taghanaki et al., 2019[50]), restricting hidden space (Mustafa et al. 2019[51]), among around 50 models

Demonstrated on simple datasets only[]

  • Tested on MNIST only: convex outer polytope (Wong & Kolter, 2008[52])?

Adversarial examples in natural language processing[]

Tasks[]

Classifying text in to categories (e.g. Sports, Business), reviews into good/bad (Soll et al. 2019) [53]

Attacks[]

From (Soll et al. 2019) [53]: "algorithm by Samanta and Mehta [22], where the candidate pool P, from which possible words for insertion and re- placement are drawn, was created from the following sources:

  • Synonyms gathered from the WordNet dataset [5],
  • Typos from a dataset [16] to ensure that the typos inserted are not recognized as artificial since they occur in normal texts written by humans, and
  • Keywords specific for one input class which were found by looking at all training sentences and extracting words only found in one class."

Defenses[]

Distillation is shown to be ineffective (again) (Soll et al. 2019) [53]

TODO: Jia and Jiang (2017)[54]: data augmentation not effective?

Adversarial examples in other machine learning areas[]

TODO: in reinforcement learning

TODO: from Serban et al. (2020)[6] ismalware detection [68, 78, 94, 101, 179], because it implies direct consequences on security. Other tasks such as reinforcement learning [10, 80, 106], speech recognition [23, 27], facial recognition [150], semantic segmentation [178] [...] are also explored

TODO: Yefet, N., Alon, U., & Yahav, E. (2020). Adversarial examples for models of code. Proceedings of the ACM on Programming Languages, 4(OOPSLA), 1-30.

Evaluation[]

Methodogloy[]

  • Careful to avoid gradient obfuscation: Athalye et al.[48]
    • Check if random/transfer/blackbox attacks are more effective than whitebox attacks (red flag)
  • Always hand-design adaptive attacks for evaluation (unless simpler attacks suffice), careful as many things can go wrong there: Tramer et al. (2020)[55]. The authors identified 6 "themes" on how to create effective adaptive attacks:
    • T0: Strive for simplicity
    • T1: Attack (a function close to) the full defense
    • T2: Identify and target important defense parts
    • T3: Adapt the objective to simplify the attack
    • T4: Ensure the loss function is consistent
    • T5: Optimize the loss function with different methods
    • T6: Use a strong adaptive attack for adversarial training

Choice of attacks[]

Strong attacks that are recommended:

  • Custom-made adaptive attack (see previous section)
  • AutoAttack (see section #Attacks)
  • C&W (see section #Attacks)
  • A transfer attack
  • A blackbox attack

Weak attacks to avoid:

  • The use of PGD and FGSM has been criticized by Croce & Hein (2020)[46]

Datasets[]

CIFAR-10[]

TODO: Qin et al. (2019)[56]

Datasets with built-in perturbation[]

These are easier to use but are weaker.

Software[]

References[]

  1. 1.0 1.1 Szegedy, Christian, Zaremba, Wojciech, Sutskever, Ilya, Bruna, Joan, Erhan, Dumitru, Goodfellow, Ian J., and Fergus, Rob. Intriguing properties of neural networks. ICLR, abs/1312.6199, 2014b. URL http: //arxiv.org/abs/1312.6199.
  2. Jia, Robin, and Percy Liang. "Adversarial Examples for Evaluating Reading Comprehension Systems." arXiv preprint arXiv:1707.07328 (2017).
  3. Zhou, D., Liu, T., Han, B., Wang, N., Peng, C., & Gao, X. (2021). Towards Defending against Adversarial Examples via Attack-Invariant Features. In M. Meila & T. Zhang (Eds.), Proceedings of the 38th International Conference on Machine Learning (Vol. 139, pp. 12835–12845). PMLR. Retrieved from http://proceedings.mlr.press/v139/zhou21e.html
  4. 4.0 4.1 Goodfellow, Ian J., Jonathon Shlens, and Christian Szegedy. "Explaining and harnessing adversarial examples." arXiv preprint arXiv:1412.6572 (2014).
  5. Tramèr, Florian, et al. "The Space of Transferable Adversarial Examples." arXiv preprint arXiv:1704.03453 (2017).
  6. 6.0 6.1 6.2 Serban, A., Poll, E., & Visser, J. (2020). Adversarial Examples on Object Recognition. ACM Computing Surveys, 53(3), 1–38. https://doi.org/10.1145/3398394
  7. Schmidt, L., Talwar, K., Santurkar, S., Tsipras, D., & Madry, A. (2018). Adversarially robust generalization requires more data. Advances in Neural Information Processing Systems, 2018-Decem(NeurIPS), 5014–5026.
  8. Bubeck, S., Price, E., & Razenshteyn, I. (2018). Adversarial examples from computational constraints, 1–19. Retrieved from http://arxiv.org/abs/1805.10204
  9. Gilmer, J., Metz, L., Faghri, F., Schoenholz, S. S., Raghu, M., Wattenberg, M., & Goodfellow, I. (2018). The Relationship Between High-Dimensional Geometry and Adversarial Examples. Retrieved from http://arxiv.org/abs/1801.02774
  10. Mahloujifar, S., Diochnos, D. I., & Mahmoody, M. (2019). The Curse of Concentration in Robust Learning: Evasion and Poisoning Attacks from Concentration of Measure. Proceedings of the AAAI Conference on Artificial Intelligence, 33, 4536–4543. https://doi.org/10.1609/aaai.v33i01.33014536
  11. Ilyas, A., Santurkar, S., Tsipras, D., Engstrom, L., Tran, B., & Madry, A. (2019). Adversarial Examples Are Not Bugs, They Are Features. Retrieved from http://arxiv.org/abs/1905.02175
  12. Springer, J. M., Mitchell, M., & Kenyon, G. T. (2021). Adversarial Perturbations Are Not So Weird: Entanglement of Robust and Non-Robust Features in Neural Network Classifiers. Retrieved from http://arxiv.org/abs/2102.05110
  13. Cosgrove, C., & Yuille, A. L. (2020). Adversarial examples for edge detection: They exist, and they transfer. Proceedings - 2020 IEEE Winter Conference on Applications of Computer Vision, WACV 2020, 1059–1068. https://doi.org/10.1109/WACV45572.2020.9093304
  14. Xie, C., Wang, J., Zhang, Z., Zhou, Y., Xie, L., & Yuille, A. (2017). Adversarial Examples for Semantic Segmentation and Object Detection. Proceedings of the IEEE International Conference on Computer Vision, 2017-Octob, 1378–1387. https://doi.org/10.1109/ICCV.2017.153
  15. Sharif, M., Bhagavatula, S., Bauer, L., & Reiter, M. K. (2016). Accessorize to a crime: Real and stealthy attacks on state-of-the-art face recognition. Proceedings of the ACM Conference on Computer and Communications Security, 1528–1540. https://doi.org/10.1145/2976749.2978392
  16. Li, S., Neupane, A., Paul, S., Song, C., Krishnamurthy, S. V., Chowdhury, A. K. R., & Swami, A. (2018). Adversarial Perturbations Against Real-Time Video Classification Systems. ArXiv. https://doi.org/10.14722/ndss.2019.23202
  17. Quan, P., Guo, R., & Srivastava, M. (n.d.). Towards Imperceptible Query-limited Adversarial Attacks with Perceptual Feature Fidelity Loss, 1–11.
  18. Dabouei, A., Soleymani, S., Taherkhani, F., Dawson, J., & Nasrabadi, N. M. (2020). SmoothFool: An efficient framework for computing smooth adversarial perturbations. Proceedings - 2020 IEEE Winter Conference on Applications of Computer Vision, WACV 2020, 2654–2663. https://doi.org/10.1109/WACV45572.2020.9093429
  19. Hosseini, H., Xiao, B., Jaiswal, M., & Poovendran, R. (2017). On the limitation of convolutional neural networks in recognizing negative images. Proceedings - 16th IEEE International Conference on Machine Learning and Applications, ICMLA 2017, 352–358. https://doi.org/10.1109/ICMLA.2017.0-136
  20. Hosseini, H., & Poovendran, R. (2018). Semantic Adversarial Examples. CVPR 2018, 1727–1732. Retrieved from http://openaccess.thecvf.com/content_cvpr_2018_workshops/papers/w32/Hosseini_Semantic_Adversarial_Examples_CVPR_2018_paper.pdf
  21. Shahin Shamsabadi, A., Sanchez-Matilla, R., & Cavallaro, A. (2020). ColorFool: Semantic Adversarial Colorization. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1148–1157. https://doi.org/10.1109/CVPR42600.2020.00123
  22. Laidlaw, C., & Feizi, S. (2019). Functional Adversarial Attacks. ArXiv, (NeurIPS), 1–16.
  23. 23.0 23.1 Volpi, R., & Murino, V. (2019). Addressing model vulnerability to distributional shifts over image transformation sets. Proceedings of the IEEE/CVF International Conference on Computer Vision, 7980–7989.
  24. Peng, D., Zheng, Z., Luo, L., & Zhang, X. (2020). Structure matters: Towards generating transferable adversarial images. Frontiers in Artificial Intelligence and Applications, 325, 1419–1426. https://doi.org/10.3233/FAIA200247
  25. Zhao, Z., Liu, Z., & Larson, M. (2020). Adversarial color enhancement: Generating unrestricted adversarial images by optimizing a color filter. ArXiv, 1–14.
  26. Zhao, Z., Liu, Z., & Larson, M. (2020). Adversarial robustness against image color transformation within parametric filter space. ArXiv, 1–20.
  27. Kantipudi, J., Dubey, S. R., & Chakraborty, S. (2020). Color Channel Perturbation Attacks for Fooling Convolutional Neural Networks and A Defense Against Such Attacks. IEEE Transactions on Artificial Intelligence. Retrieved from http://arxiv.org/abs/2012.14456
  28. 28.0 28.1 Bhattad, A., Chong, M. J., Liang, K., Li, B., & Forsyth, D. A. (2019). Unrestricted adversarial examples via semantic manipulation. ArXiv, (2018), 1–19.
  29. Peng, D., Zheng, Z., & Zhang, X. (2018). Structure-preserving transformation: Generating diverse and transferable adversarial examples. ArXiv.
  30. Shamsabadi, A. S., Oh, C., & Cavallaro, A. (2020). Edgefool: An Adversarial Image Enhancement Filter. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2020-May(2), 1898–1902. https://doi.org/10.1109/ICASSP40776.2020.9054368
  31. Naderi, H., Goli, L., & Kasaei, S. (2021). Generating Unrestricted Adversarial Examples via Three Parameters. Retrieved from http://arxiv.org/abs/2103.07640
  32. Li, L., Weber, M., Xu, X., Rimanic, L., Xie, T., Zhang, C., & Li, B. (2020). Provable robust learning based on transformation-specific smoothing. In ICML Workshop on Uncertainty & Robustness in Deep Learning (UDL) 2020.
  33. 33.0 33.1 Ho, C. H., Leung, B., Sandstrom, E., Chang, Y., & Vasconcelos, N. (2019). Catastrophic child’s play: Easy to perform, hard to defend adversarial attacks. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2019-June, 9221–9229. https://doi.org/10.1109/CVPR.2019.00945
  34. Ghiasi, A., Shafahi, A., & Goldstein, T. (2020). Breaking certified defenses: semantic adversarial examples with spoofed robustness certificates. In International Conference on Learning Representations. Retrieved from https://openreview.net/forum?id=HJxdTxHYvB
  35. Duan, R., Mao, X., Qin, A. K., Yang, Y., Chen, Y., Ye, S., & He, Y. (2021). Adversarial Laser Beam: Effective Physical-World Attack to DNNs in a Blink. Retrieved from http://arxiv.org/abs/2103.06504
  36. Duan, R., Ma, X., Wang, Y., Bailey, J., Qin, A. K., & Yang, Y. (2020). Adversarial camouflage: Hiding physical-world attacks with natural styles. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 997–1005. https://doi.org/10.1109/CVPR42600.2020.00108
  37. Xu, Q., Tao, G., & Zhang, X. (2020). D2B: Deep distribution bound for natural-looking adversarial attack. ArXiv, 1–26.
  38. Xu, Q., Tao, G., Cheng, S., Tan, L., & Zhang, X. (2020). Towards feature space adversarial attack. ArXiv.
  39. Song, Y., Kushman, N., Shu, R., & Ermon, S. (2018). Constructing unrestricted adversarial examples with generative models. Advances in Neural Information Processing Systems, 2018-December(NeurIPS), 8312–8323.
  40. Jain, L. (2020). Generating Semantic Adversarial Examples through Differentiable Rendering.
  41. Wang, D., Li, C., Wen, S., Nepal, S., & Xiang, Y. (2019). Man-in-the-middle attacks against machine learning classifiers via malicious generative models. ArXiv, (October), 1–12.
  42. Dunn, I., Hanu, L., Pouget, H., Kroening, D., & Melham, T. (2020). Evaluating Robustness to Context-Sensitive Feature Perturbations of Different Granularities. Retrieved from http://arxiv.org/abs/2001.11055
  43. Song, Y., Kushman, N., Shu, R., & Ermon, S. (2018). Generative Adversarial Examples, 8312–8323.
  44. Peng, D., Zheng, Z., Luo, L., & Zhang, X. (n.d.). Structure Matters : Towards Generating Transferable Adversarial Images.
  45. Shankar, V., Dave, A., Roelofs, R., Ramanan, D., Recht, B., & Schmidt, L. (2019). Do Image Classifiers Generalize Across Time? Retrieved from http://arxiv.org/abs/1906.02168
  46. 46.0 46.1 46.2 Croce, F., & Hein, M. (2020). Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. 37th International Conference on Machine Learning, ICML 2020, PartF168147-3, 2184–2194.
  47. Jere, M., Kumar, M., & Koushanfar, F. (2020). A singular value perspective on model robustness. ArXiv.
  48. 48.0 48.1 Athalye, A., Carlini, N., & Wagner, D. (2018). Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. 35th International Conference on Machine Learning, ICML 2018, 1, 436–448.
  49. Buckman, J., Roy, A., Raffel, C., & Goodfellow, I. (2018). Thermometer encoding: One hot way to resist adversarial examples. 6th International Conference on Learning Representations, ICLR 2018 - Conference Track Proceedings, (2016), 1–22.
  50. Taghanaki, S. A., Abhishek, K., Azizi, S., and Hamarneh, G. A kernelized manifold mapping to diminish the effect of adversarial perturbations. In CVPR, 2019.
  51. Mustafa, A., Khan, S., Hayat, M., Goecke, R., Shen, J., & Shao, L. (2019). Adversarial Defense by Restricting the Hidden Space of Deep Neural Networks.
  52. Wong, E., & Kolter, J. Z. (2018). Provable defenses against adversarial examples via the convex outer adversarial polytope. 35th International Conference on Machine Learning, ICML 2018, 12, 8405–8423.
  53. 53.0 53.1 53.2 Soll, M., Hinz, T., Magg, S., & Wermter, S. (2019). Evaluating Defensive Distillation for Defending Text Processing Neural Networks Against Adversarial Examples. International Conference on Artificial Neural Networks (ICANN), 685–696. https://doi.org/10.1007/978-3-030-30508-6_54
  54. Jia, R., Liang, P.: Adversarial examples for evaluating reading comprehension sys- tems. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. pp. 2021–2031 (2017). DOI: 10.18653/v1/D17-1215
  55. Tramer, F., Carlini, N., Brendel, W., & Madry, A. (2020). On adaptive attacks to adversarial example defenses. ArXiv Preprint ArXiv:2002.08347.
  56. Qin, C., Martens, J., Gowal, S., Krishnan, D., Krishnamurthy, Dvijotham, … Kohli, P. (2019). Adversarial Robustness through Local Linearization, (NeurIPS), 1–17. Retrieved from http://arxiv.org/abs/1907.02610
Advertisement