Image-Text Surgery: Efficient Concept Learning in Image Captioning by Generating Pseudopairs.

Clicks: 236
ID: 61873
2018
Image captioning aims to generate natural language sentences to describe the salient parts of a given image. Although neural networks have recently achieved promising results, a key problem is that they can only describe concepts seen in the training image-sentence pairs. Efficient learning of novel concepts has thus been a topic of recent interest to alleviate the expensive manpower of labeling data. In this paper, we propose a novel method, Image-Text Surgery, to synthesize pseudoimage-sentence pairs. The pseudopairs are generated under the guidance of a knowledge base, with syntax from a seed data set (i.e., MSCOCO) and visual information from an existing large-scale image base (i.e., ImageNet). Via pseudodata, the captioning model learns novel concepts without any corresponding human-labeled pairs. We further introduce adaptive visual replacement, which adaptively filters unnecessary visual features in pseudodata with an attention mechanism. We evaluate our approach on a held-out subset of the MSCOCO data set. The experimental results demonstrate that the proposed approach provides significant performance improvements over state-of-the-art methods in terms of F1 score and sentence quality. An ablation study and the qualitative results further validate the effectiveness of our approach.
Reference Key
fu2018imagetextieee Use this key to autocite in the manuscript while using SciMatic Manuscript Manager or Thesis Manager
Authors Fu, Kun;Li, Jin;Jin, Junqi;Zhang, Changshui;
Journal IEEE Transactions on Neural Networks and Learning Systems
Year 2018
DOI 10.1109/TNNLS.2018.2813306
URL
Keywords

Citations

No citations found. To add a citation, contact the admin at info@scimatic.org

No comments yet. Be the first to comment on this article.