Natural Language Understanding Wiki
Advertisement

Adversarial examples are small perturbation to an example that is negligible to humans but changes the decision of a computer system. It is first discovered in object recognition (Szegedy et al. 2014)[1] but later found in natural language systems as well (Jia and Liang, 2017)[2]. In terms of models, neural networks, linear models (e.g. SVM) and decision trees are known to suffer from adversarial examples (Zhou et al. 2021[3], among others). The phenomenon is broadly popularized via news about autonomous cars misinterpreting stop signs as speed limit signs, state-of-the-art computer vision systems misinterpreting cats as desktop computers, mistaking face for non-face, gibberish patterns for faces, and one face for another. The phenomenon reveals a fundamental flaw in a big class of classifiers (Goodfellow et al. 2014)[4].

TODO: https://pdfs.semanticscholar.org/7330/0838d524d062e8341b242765fb6efaf48f43.pdf

https://www.cs.uoregon.edu/Reports/AREA-201406-Torkamani.pdf

https://arxiv.org/pdf/1207.0245.pdf

Subspaces of transferable adversarial examples: Tramèr et al. (2017)[5]

Universal adversarial perturbation: https://arxiv.org/pdf/1610.08401.pdf

History[]

From Goodfellow (2017):

  1. “Adversarial Classification” Dalvi et al 2004: fool spam filter
  2. “Evasion Attacks Against Machine Learning at Test Time”
  3. Biggio 2013: fool neural nets
  4. Szegedy et al 2013: fool ImageNet classifiers imperceptibly
  5. Goodfellow et al 2014: cheap, closed form attack

Explanation[]

TODO: a survey with a list of hypotheses: Serban et al. (2020)[6]

  • Linearity: Goodfellow et al. (2014)[4]
  • Data complexity of robust generalization (with no prior at all? what about robust generalization with the right prior?): Schmidt et al. (2018)[7]
  • "Identifying a robust classifier from limited training data is information theoretically possible but computationally intractable" (at least for a family of models called "statistical query"): Bubeck et al. (2018)[8]
  • "high dimensional geometry of data manifold" (but hey, people can do it...): Gilmer et al. (2018)[9]
  • inevitable consequence of "concentration of measure" in metric measure space (but does our problem has it?): Mahloujifar et al. (2019)[10]
  • non-robust features (of the input) that are useful for normal classification but not for robust classification: Ilyas et al. (2019)[11]
    • "We define a feature to be a function mapping from the input space X to the real numbers, ... Note that this formal definition also captures what we abstractly think of as features (e.g., we can construct an f that captures how “furry” an image is)"

Some claim that adversarial examples are inevitable (hey, humans seem to be robust against them?):

  • Alhussein Fawzi, Hamza Fawzi, and Omar Fawzi. Adversarial vulnerability for any classifier, 2018. URL https://arxiv.org/pdf/arXiv:1802.08686.pdf.
  • Justin Gilmer, Luke Metz, Fartash Faghri, Sam Schoenholz, Maithra Raghu, Martin Wat- tenberg, and Ian Goodfellow. Adversarial spheres. In International Conference on Learning Representations Workshop, 2018. URL https://arxiv.org/pdf/1801.02774.pdf.


Adversarial examples in computer vision[]

Tasks[]

  • Object recognition: see the survey Serban et al. (2020)[6]
  • Edge detection: Cosgrove and Yuille (2020)[12]
  • Semantic segmentation: Xie et al. (2017)[13]
  • Facial recognition: Sharif et al. (2016)[14]
  • Video classification: Li et al. (2018)[15]

TODO: Simen Thys, Wiebe Van Ranst, and Toon Goedemé. 2019. Fooling automated surveillance cameras: Adversarial patches to attack person detection. arXiv:1904.08653 (2019).

TODO: Xingxing Wei, Siyuan Liang, Xiaochun Cao, and Jun Zhu. 2018. Transferable adversarial attacks for image and video object detection. arXiv:1811.12641 (2018).

Attacks[]

TODO: find refs

  • Small perturbation/imperceptible
    • norm constrained (l2, l-inf, etc.): lots of papers
    • "perceptual feature fidelity" constraint? [16]
    • SmoothFool[17]
  • Color attacks
    • Negative images[18]
    • Random color substitution[19]
    • ColorFool[20]
    • Small recoloring (combined with perturbation)[21]
    • contrast, brightness, grayscale conversion, intensity, solarize: Volpi & Murino (2019)[22]
    • lots of filters: FilterFool (Shamsabadi et al. 2020)[23]
    • Adversarial color enhancement (Zhao et al. 2020)[24][25]
    • more color filters: Kantipudi et al. (2020)[26]
    • yet more color perturbation: Bhattad et al. (2020)[27]
  • structure-preserving
    • Peng et al. (2018)???[28]
    • sharpness: Volpi & Murino (2019)[22]
    • EdgeFool[29]
    • Shifting/deforming: [30]
    • Rotation & translation: Li et al. 2020[31]
  • camera shake???[32]
  • Few-pixel attack
    • One pixel
    • k pixels
  • Semantic attacks
  • Shadow attacks[33]
  • Justaposition/occlusion attacks
    • "Adversarial turtles"
    • "Invisible cloaks"
    • banners
    • Adver-sarial Laser Beam: Duan et al. (2021)[34]
    • Adversarial Camouflage[35]
  • Feature-space attacks (white box, using the internal features to craft images)
    • D2B (Xu et al. 2020)[36]
    • Xu et al. (2020)[37]
  • Generative attacks (not using internal features)
    • using GAN: Song et al. (2018)[38]
    • differentiable rendering: Jain (2020)[39]
    • VAE??[40]
    • pose???[32]
    • more GAN??[41]
    • yet more GAN?? [42]
    • style transfer (texture)?[27]
  • Structure-preserving attack? Peng et al. (2020)[43]
  • move across time (in a video): Shankar et al. (2019)[44]

Defences[]

Adversarial training: so far the most successful defense.

TODO: lots and lots of defences

  • SVD (Jere et al. 2020)[45]

Confirmed fails[]

TODO: a lot fall into this category

Adversarial examples in natural language processing[]

Tasks[]

Classifying text in to categories (e.g. Sports, Business), reviews into good/bad (Soll et al. 2019) [49]

Attacks[]

From (Soll et al. 2019) [49]: "algorithm by Samanta and Mehta [22], where the candidate pool P, from which possible words for insertion and re- placement are drawn, was created from the following sources:

  • Synonyms gathered from the WordNet dataset [5],
  • Typos from a dataset [16] to ensure that the typos inserted are not recognized as artificial since they occur in normal texts written by humans, and
  • Keywords specific for one input class which were found by looking at all training sentences and extracting words only found in one class."

Defenses[]

Distillation is shown to be ineffective (again) (Soll et al. 2019) [49]

TODO: Jia and Jiang (2017)[50]: data augmentation not effective?

Adversarial examples in other machine learning areas[]

TODO: in reinforcement learning

TODO: from Serban et al. (2020)[6] ismalware detection [68, 78, 94, 101, 179], because it implies direct consequences on security. Other tasks such as reinforcement learning [10, 80, 106], speech recognition [23, 27], facial recognition [150], semantic segmentation [178] [...] are also explored

TODO: Yefet, N., Alon, U., & Yahav, E. (2020). Adversarial examples for models of code. Proceedings of the ACM on Programming Languages, 4(OOPSLA), 1-30.

Evaluation[]

Methodogloy[]

  • Careful to avoid gradient obfuscation: Athalye et al.[46]
  • Always hand-design adaptive attacks for evaluation (unless simpler attacks suffice), careful as many things can go wrong there: Tramer et al. (2020)[51]. The authors identified 6 "themes" on how to create effective adaptive attacks:
    • T0: Strive for simplicity
    • T1: Attack (a function close to) the full defense
    • T2: Identify and target important defense parts
    • T3: Adapt the objective to simplify the attack
    • T4: Ensure the loss function is consistent
    • T5: Optimize the loss function with different methods
    • T6: Use a strong adaptive attack for adversarial training

Datasets[]

CIFAR-10[]

TODO: Qin et al. (2019)[52]

Datasets with built-in perturbation[]

These are easier to use but are weaker.

Software[]

References[]

  1. Szegedy, Christian, Zaremba, Wojciech, Sutskever, Ilya, Bruna, Joan, Erhan, Dumitru, Goodfellow, Ian J., and Fergus, Rob. Intriguing properties of neural networks. ICLR, abs/1312.6199, 2014b. URL http: //arxiv.org/abs/1312.6199.
  2. Jia, Robin, and Percy Liang. "Adversarial Examples for Evaluating Reading Comprehension Systems." arXiv preprint arXiv:1707.07328 (2017).
  3. Zhou, D., Liu, T., Han, B., Wang, N., Peng, C., & Gao, X. (2021). Towards Defending against Adversarial Examples via Attack-Invariant Features. In M. Meila & T. Zhang (Eds.), Proceedings of the 38th International Conference on Machine Learning (Vol. 139, pp. 12835–12845). PMLR. Retrieved from http://proceedings.mlr.press/v139/zhou21e.html
  4. 4.0 4.1 Goodfellow, Ian J., Jonathon Shlens, and Christian Szegedy. "Explaining and harnessing adversarial examples." arXiv preprint arXiv:1412.6572 (2014).
  5. Tramèr, Florian, et al. "The Space of Transferable Adversarial Examples." arXiv preprint arXiv:1704.03453 (2017).
  6. 6.0 6.1 6.2 Serban, A., Poll, E., & Visser, J. (2020). Adversarial Examples on Object Recognition. ACM Computing Surveys, 53(3), 1–38. https://doi.org/10.1145/3398394
  7. Schmidt, L., Talwar, K., Santurkar, S., Tsipras, D., & Madry, A. (2018). Adversarially robust generalization requires more data. Advances in Neural Information Processing Systems, 2018-Decem(NeurIPS), 5014–5026.
  8. Bubeck, S., Price, E., & Razenshteyn, I. (2018). Adversarial examples from computational constraints, 1–19. Retrieved from http://arxiv.org/abs/1805.10204
  9. Gilmer, J., Metz, L., Faghri, F., Schoenholz, S. S., Raghu, M., Wattenberg, M., & Goodfellow, I. (2018). The Relationship Between High-Dimensional Geometry and Adversarial Examples. Retrieved from http://arxiv.org/abs/1801.02774
  10. Mahloujifar, S., Diochnos, D. I., & Mahmoody, M. (2019). The Curse of Concentration in Robust Learning: Evasion and Poisoning Attacks from Concentration of Measure. Proceedings of the AAAI Conference on Artificial Intelligence, 33, 4536–4543. https://doi.org/10.1609/aaai.v33i01.33014536
  11. Ilyas, A., Santurkar, S., Tsipras, D., Engstrom, L., Tran, B., & Madry, A. (2019). Adversarial Examples Are Not Bugs, They Are Features. Retrieved from http://arxiv.org/abs/1905.02175
  12. Cosgrove, C., & Yuille, A. L. (2020). Adversarial examples for edge detection: They exist, and they transfer. Proceedings - 2020 IEEE Winter Conference on Applications of Computer Vision, WACV 2020, 1059–1068. https://doi.org/10.1109/WACV45572.2020.9093304
  13. Xie, C., Wang, J., Zhang, Z., Zhou, Y., Xie, L., & Yuille, A. (2017). Adversarial Examples for Semantic Segmentation and Object Detection. Proceedings of the IEEE International Conference on Computer Vision, 2017-Octob, 1378–1387. https://doi.org/10.1109/ICCV.2017.153
  14. Sharif, M., Bhagavatula, S., Bauer, L., & Reiter, M. K. (2016). Accessorize to a crime: Real and stealthy attacks on state-of-the-art face recognition. Proceedings of the ACM Conference on Computer and Communications Security, 1528–1540. https://doi.org/10.1145/2976749.2978392
  15. Li, S., Neupane, A., Paul, S., Song, C., Krishnamurthy, S. V., Chowdhury, A. K. R., & Swami, A. (2018). Adversarial Perturbations Against Real-Time Video Classification Systems. ArXiv. https://doi.org/10.14722/ndss.2019.23202
  16. Quan, P., Guo, R., & Srivastava, M. (n.d.). Towards Imperceptible Query-limited Adversarial Attacks with Perceptual Feature Fidelity Loss, 1–11.
  17. Dabouei, A., Soleymani, S., Taherkhani, F., Dawson, J., & Nasrabadi, N. M. (2020). SmoothFool: An efficient framework for computing smooth adversarial perturbations. Proceedings - 2020 IEEE Winter Conference on Applications of Computer Vision, WACV 2020, 2654–2663. https://doi.org/10.1109/WACV45572.2020.9093429
  18. Hosseini, H., Xiao, B., Jaiswal, M., & Poovendran, R. (2017). On the limitation of convolutional neural networks in recognizing negative images. Proceedings - 16th IEEE International Conference on Machine Learning and Applications, ICMLA 2017, 352–358. https://doi.org/10.1109/ICMLA.2017.0-136
  19. Hosseini, H., & Poovendran, R. (2018). Semantic Adversarial Examples. CVPR 2018, 1727–1732. Retrieved from http://openaccess.thecvf.com/content_cvpr_2018_workshops/papers/w32/Hosseini_Semantic_Adversarial_Examples_CVPR_2018_paper.pdf
  20. Shahin Shamsabadi, A., Sanchez-Matilla, R., & Cavallaro, A. (2020). ColorFool: Semantic Adversarial Colorization. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1148–1157. https://doi.org/10.1109/CVPR42600.2020.00123
  21. Laidlaw, C., & Feizi, S. (2019). Functional Adversarial Attacks. ArXiv, (NeurIPS), 1–16.
  22. 22.0 22.1 Volpi, R., & Murino, V. (2019). Addressing model vulnerability to distributional shifts over image transformation sets. ArXiv.
  23. Peng, D., Zheng, Z., Luo, L., & Zhang, X. (2020). Structure matters: Towards generating transferable adversarial images. Frontiers in Artificial Intelligence and Applications, 325, 1419–1426. https://doi.org/10.3233/FAIA200247
  24. Zhao, Z., Liu, Z., & Larson, M. (2020). Adversarial color enhancement: Generating unrestricted adversarial images by optimizing a color filter. ArXiv, 1–14.
  25. Zhao, Z., Liu, Z., & Larson, M. (2020). Adversarial robustness against image color transformation within parametric filter space. ArXiv, 1–20.
  26. Kantipudi, J., Dubey, S. R., & Chakraborty, S. (2020). Color Channel Perturbation Attacks for Fooling Convolutional Neural Networks and A Defense Against Such Attacks. IEEE Transactions on Artificial Intelligence. Retrieved from http://arxiv.org/abs/2012.14456
  27. 27.0 27.1 Bhattad, A., Chong, M. J., Liang, K., Li, B., & Forsyth, D. A. (2019). Unrestricted adversarial examples via semantic manipulation. ArXiv, (2018), 1–19.
  28. Peng, D., Zheng, Z., & Zhang, X. (2018). Structure-preserving transformation: Generating diverse and transferable adversarial examples. ArXiv.
  29. Shamsabadi, A. S., Oh, C., & Cavallaro, A. (2020). Edgefool: An Adversarial Image Enhancement Filter. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2020-May(2), 1898–1902. https://doi.org/10.1109/ICASSP40776.2020.9054368
  30. Naderi, H., Goli, L., & Kasaei, S. (2021). Generating Unrestricted Adversarial Examples via Three Parameters. Retrieved from http://arxiv.org/abs/2103.07640
  31. Li, L., Weber, M., Xu, X., Rimanic, L., Xie, T., Zhang, C., & Li, B. (2020). Provable robust learning based on transformation-specific smoothing. In ICML Workshop on Uncertainty & Robustness in Deep Learning (UDL) 2020.
  32. 32.0 32.1 Ho, C. H., Leung, B., Sandstrom, E., Chang, Y., & Vasconcelos, N. (2019). Catastrophic child’s play: Easy to perform, hard to defend adversarial attacks. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2019-June, 9221–9229. https://doi.org/10.1109/CVPR.2019.00945
  33. Ghiasi, A., Shafahi, A., & Goldstein, T. (2020). Breaking certified defenses: semantic adversarial examples with spoofed robustness certificates. In International Conference on Learning Representations. Retrieved from https://openreview.net/forum?id=HJxdTxHYvB
  34. Duan, R., Mao, X., Qin, A. K., Yang, Y., Chen, Y., Ye, S., & He, Y. (2021). Adversarial Laser Beam: Effective Physical-World Attack to DNNs in a Blink. Retrieved from http://arxiv.org/abs/2103.06504
  35. Duan, R., Ma, X., Wang, Y., Bailey, J., Qin, A. K., & Yang, Y. (2020). Adversarial camouflage: Hiding physical-world attacks with natural styles. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 997–1005. https://doi.org/10.1109/CVPR42600.2020.00108
  36. Xu, Q., Tao, G., & Zhang, X. (2020). D2B: Deep distribution bound for natural-looking adversarial attack. ArXiv, 1–26.
  37. Xu, Q., Tao, G., Cheng, S., Tan, L., & Zhang, X. (2020). Towards feature space adversarial attack. ArXiv.
  38. Song, Y., Kushman, N., Shu, R., & Ermon, S. (2018). Constructing unrestricted adversarial examples with generative models. Advances in Neural Information Processing Systems, 2018-December(NeurIPS), 8312–8323.
  39. Jain, L. (2020). Generating Semantic Adversarial Examples through Differentiable Rendering.
  40. Wang, D., Li, C., Wen, S., Nepal, S., & Xiang, Y. (2019). Man-in-the-middle attacks against machine learning classifiers via malicious generative models. ArXiv, (October), 1–12.
  41. Dunn, I., Hanu, L., Pouget, H., Kroening, D., & Melham, T. (2020). Evaluating Robustness to Context-Sensitive Feature Perturbations of Different Granularities. Retrieved from http://arxiv.org/abs/2001.11055
  42. Song, Y., Kushman, N., Shu, R., & Ermon, S. (2018). Generative Adversarial Examples, 8312–8323.
  43. Peng, D., Zheng, Z., Luo, L., & Zhang, X. (n.d.). Structure Matters : Towards Generating Transferable Adversarial Images.
  44. Shankar, V., Dave, A., Roelofs, R., Ramanan, D., Recht, B., & Schmidt, L. (2019). Do Image Classifiers Generalize Across Time? Retrieved from http://arxiv.org/abs/1906.02168
  45. Jere, M., Kumar, M., & Koushanfar, F. (2020). A singular value perspective on model robustness. ArXiv.
  46. 46.0 46.1 Athalye, A., Carlini, N., & Wagner, D. (2018). Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. 35th International Conference on Machine Learning, ICML 2018, 1, 436–448.
  47. Croce, F., & Hein, M. (2020). Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. 37th International Conference on Machine Learning, ICML 2020, PartF168147-3, 2184–2194.
  48. Taghanaki, S. A., Abhishek, K., Azizi, S., and Hamarneh, G. A kernelized manifold mapping to diminish the effect of adversarial perturbations. In CVPR, 2019.
  49. 49.0 49.1 49.2 Soll, M., Hinz, T., Magg, S., & Wermter, S. (2019). Evaluating Defensive Distillation for Defending Text Processing Neural Networks Against Adversarial Examples. International Conference on Artificial Neural Networks (ICANN), 685–696. https://doi.org/10.1007/978-3-030-30508-6_54
  50. Jia, R., Liang, P.: Adversarial examples for evaluating reading comprehension sys- tems. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. pp. 2021–2031 (2017). DOI: 10.18653/v1/D17-1215
  51. Tramer, F., Carlini, N., Brendel, W., & Madry, A. (2020). On adaptive attacks to adversarial example defenses. ArXiv Preprint ArXiv:2002.08347.
  52. Qin, C., Martens, J., Gowal, S., Krishnan, D., Krishnamurthy, Dvijotham, … Kohli, P. (2019). Adversarial Robustness through Local Linearization, (NeurIPS), 1–17. Retrieved from http://arxiv.org/abs/1907.02610