Adversarial Learning With Multi-Modal Attention for Visual Question Answering.

Clicks: 240

ID: 110446

2020

Article Quality & Performance Metrics

Overall Quality Improving Quality

0.0 /100

Combines engagement data with AI-assessed academic quality

Reader Engagement Steady Performance

73.8 /100

240 views

192 readers

AI Quality Assessment

Not analyzed

Abstract

Visual question answering (VQA) has been proposed as a challenging task and attracted extensive research attention. It aims to learn a joint representation of the question-image pair for answer inference. Most of the existing methods focus on exploring the multi-modal correlation between the question and image to learn the joint representation. However, the answer-related information is not fully captured by these methods, which results that the learned representation is ineffective to reflect the answer of the question. To tackle this problem, we propose a novel model, i.e., adversarial learning with multi-modal attention (ALMA), for VQA. An adversarial learning-based framework is proposed to learn the joint representation to effectively reflect the answer-related information. Specifically, multi-modal attention with the Siamese similarity learning method is designed to build two embedding generators, i.e., question-image embedding and question-answer embedding. Then, adversarial learning is conducted as an interplay between the two embedding generators and an embedding discriminator. The generators have the purpose of generating two modality-invariant representations for the question-image and question-answer pairs, whereas the embedding discriminator aims to discriminate the two representations. Both the multi-modal attention module and the adversarial networks are integrated into an end-to-end unified framework to infer the answer. Experiments performed on three benchmark data sets confirm the favorable performance of ALMA compared with state-of-the-art approaches.

Reference Key	liu2020adversarialieee Use this key to autocite in the manuscript while using SciMatic Manuscript Manager or Thesis Manager
Authors	Liu, Yun;Zhang, Xiaoming;Huang, Feiran;Cheng, Lei;Li, Zhoujun;
Journal	IEEE Transactions on Neural Networks and Learning Systems
Year	2020
DOI	10.1109/TNNLS.2020.3016083
URL	https://doi.org/10.1109/TNNLS.2020.3016083
Keywords	interview germany covid‐19 geography education geographical knowledge powerful knowledge

Citations

No citations found. To add a citation, contact the admin at info@scimatic.org

Comments

No comments yet. Be the first to comment on this article.