University of Pittsburgh

Survey of the Effects of Image Domain Shifts for Different Methods of Visual Question Answering

Graduate Student
Friday, October 23, 2020 - 12:30pm - 12:50pm

Visual Question Answering (VQA) takes visual and linguistic recognition and combines them in a multi-modal reasoning task. Modern VQA systems are typically limited to a single VQA dataset, due to dataset-specific domain differences or due to dataset overfitting. To better understand how VQA algorithms handle shifts in domain, we introduce a synthetic domain shift in the image space, using style transfer techniques to modify the look of the images. Five different VQA algorithms are compared, each of which use a different methodology for representing the different modalities and reasoning mechanism within the combined spaces. Each of the algorithms’ response to domain shifts are monitored, with various fine-tuning techniques, and domain adaptation methods explored to improve upon the domain generalization. We demonstrate how different method types exhibit different levels of domain shift robustness and provide insight to means for improving domain generalization in VQA.

Copyright 2009–2021 | Send feedback about this site