Whistle: Data-Efficient Multilingual and Crosslingual Speech Recognition via Weakly Phonetic Supervision
Clicks: 24
ID: 282569
2024
There exist three approaches for multilingual and crosslingual automatic
speech recognition (MCL-ASR) - supervised pretraining with phonetic or
graphemic transcription, and self-supervised pretraining. We find that
pretraining with phonetic supervision has been underappreciated so far for
MCL-ASR, while conceptually it is more advantageous for information sharing
between different languages. This paper explores the approach of pretraining
with weakly phonetic supervision towards data-efficient MCL-ASR, which is
called Whistle. We relax the requirement of gold-standard human-validated
phonetic transcripts, and obtain International Phonetic Alphabet (IPA) based
transcription by leveraging the LanguageNet grapheme-to-phoneme (G2P) models.
We construct a common experimental setup based on the CommonVoice dataset,
called CV-Lang10, with 10 seen languages and 2 unseen languages. A set of
experiments are conducted on CV-Lang10 to compare, as fair as possible, the
three approaches under the common setup for MCL-ASR. Experiments demonstrate
the advantages of phoneme-based models (Whistle) for MCL-ASR, in terms of
speech recognition for seen languages, crosslingual performance for unseen
languages with different amounts of few-shot data, overcoming catastrophic
forgetting, and training efficiency. It is found that when training data is
more limited, phoneme supervision can achieve better results compared to
subword supervision and self-supervision, thereby providing higher
data-efficiency. To support reproducibility and promote future research along
this direction, we release the code, models and data for the entire pipeline of
Whistle at https://github.com/thu-spmi/CAT/tree/master/egs/cv-lang10.
Reference Key |
ou2024whistle
Use this key to autocite in the manuscript while using
SciMatic Manuscript Manager or Thesis Manager
|
---|---|
Authors | Saierdaer Yusuyin; Te Ma; Hao Huang; Wenbo Zhao; Zhijian Ou |
Journal | arXiv |
Year | 2024 |
DOI | DOI not found |
URL | |
Keywords |
Citations
No citations found. To add a citation, contact the admin at info@scimatic.org
Comments
No comments yet. Be the first to comment on this article.