Learning to Localize Sound Sources in Visual Scenes: Analysis and Applications.

Senocak; Arda;Oh; Tae-Hyun;Kim; Junsik;Yang; Ming-Hsuan;Kweon; In So;

doi:10.1109/TPAMI.2019.2952095

Learning to Localize Sound Sources in Visual Scenes: Analysis and Applications.

Clicks: 294

ID: 66015

2019

Article Quality & Performance Metrics

Overall Quality Improving Quality

0.0 /100

Combines engagement data with AI-assessed academic quality

Reader Engagement Steady Performance

63.4 /100

265 views

218 readers

AI Quality Assessment

Not analyzed

Abstract

EN
- Turkish
- Spanish
- Portuguese
- Arabic
- Chinese
- French
- German
- Indonesian
- Russian
- Thai

Visual events are usually accompanied by sounds in our daily lives. However, can the machines learn to correlate the visual scene and sound, as well as localize the sound source only by observing them like humans? To investigate its empirical learnability, in this work we first present a novel unsupervised algorithm to address the problem of localizing sound sources in visual scenes. In order to achieve this goal, a two-stream network structure which handles each modality, with attention mechanism is developed for sound source localization. The network naturally reveals the localized response in the scene without human annotation. In addition, a new sound source dataset is developed for performance evaluation. Nevertheless, our empirical evaluation shows that the unsupervised method generates false conclusions in some cases. Thereby, we show that this false conclusion cannot be fixed without human prior knowledge due to the well-known correlation and causality mismatch misconception. We show that the false conclusion can be effectively corrected even with a small amount of supervision, i.e., semi-supervised setup. We present the versatility of the learned audio and visual embeddings on the cross-modal content alignment and we incorporate this proposed algorithm into sound saliency based automatic camera view panning in 360 degree videos.

Reference Key	senocak2019learningieee Use this key to autocite in the manuscript while using SciMatic Manuscript Manager or Thesis Manager
Authors	Senocak, Arda;Oh, Tae-Hyun;Kim, Junsik;Yang, Ming-Hsuan;Kweon, In So;
Journal	ieee transactions on pattern analysis and machine intelligence
Year	2019
DOI	10.1109/TPAMI.2019.2952095 Searching for DOI...
URL	https://doi.org/10.1109/TPAMI.2019.2952095
Keywords	Deep learning pig mask r-cnn kernel-extreme learning machine mounting behaviour

Citations

No citations found. To add a citation, contact the admin at info@scimatic.org

Comments

Login to comment Register

No comments yet. Be the first to comment on this article.