University of Pittsburgh

Characterizing the hidden layer representation impact of FGSM adversarial attacks

Graduate Student
Friday, April 9, 2021 - 1:00pm - 1:30pm

Neural network research has progressed for many decades. Recent advancements in computing hardware coupled with a proliferation of data has led to widespread adoption of deep learning for solving an assortment of machine learning problems. Along with its success, it has been shown that deep learning models can be tricked by adversarial examples, input instances that are carefully crafted to appear unaltered and cause incorrect model output.


Much of the existing work on adversarial examples seeks to develop new attack methods and/or defense techniques. Our work is motivated by trying to understand how adversarial attacks—generated with the fast gradient sign method (FGSM) in particular—effect change on the hidden layer representations of their target networks. A further understanding of adversarial examples could provide insights on the inner workings of neural networks, including how their functioning for computer vision differs from human visual processing. We explore the layer-by-layer consequence of adversarial perturbations and characterize the layers by how their representations reveal perturbed inputs. We show how combining representations from multiple models impacts the performance of discriminative models trained to identify adversarial perturbations.

Copyright 2009–2021 | Send feedback about this site