GASS: Generalizing Audio Source Separation with Large-scale Data
Clicks: 38
ID: 283034
2023
Article Quality & Performance Metrics
Overall Quality
Improving Quality
0.0
/100
Combines engagement data with AI-assessed academic quality
Reader Engagement
Emerging Content
4.8
/100
16 views
16 readers
Trending
AI Quality Assessment
Not analyzed
Abstract
Universal source separation targets at separating the audio sources of an
arbitrary mix, removing the constraint to operate on a specific domain like
speech or music. Yet, the potential of universal source separation is limited
because most existing works focus on mixes with predominantly sound events, and
small training datasets also limit its potential for supervised learning. Here,
we study a single general audio source separation (GASS) model trained to
separate speech, music, and sound events in a supervised fashion with a
large-scale dataset. We assess GASS models on a diverse set of tasks. Our
strong in-distribution results show the feasibility of GASS models, and the
competitive out-of-distribution performance in sound event and speech
separation shows its generalization abilities. Yet, it is challenging for GASS
models to generalize for separating out-of-distribution cinematic and music
content. We also fine-tune GASS models on each dataset and consistently
outperform the ones without pre-training. All fine-tuned models (except the
music separation one) obtain state-of-the-art results in their respective
benchmarks.
| Reference Key |
serrà2023gass
Use this key to autocite in the manuscript while using
SciMatic Manuscript Manager or Thesis Manager
|
|---|---|
| Authors | Jordi Pons; Xiaoyu Liu; Santiago Pascual; Joan Serrà |
| Journal | arXiv |
| Year | 2023 |
| DOI |
DOI not found
|
| URL | |
| Keywords |
Citations
No citations found. To add a citation, contact the admin at info@scimatic.org
Comments
No comments yet. Be the first to comment on this article.