UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data

Clicks: 22

ID: 282592

2021

Article Quality & Performance Metrics

Overall Quality Improving Quality

0.0 /100

Combines engagement data with AI-assessed academic quality

Reader Engagement Emerging Content

0.3 /100

1 views

1 readers

AI Quality Assessment

Not analyzed

Abstract

In this paper, we propose a unified pre-training approach called UniSpeech to learn speech representations with both unlabeled and labeled data, in which supervised phonetic CTC learning and phonetically-aware contrastive self-supervised learning are conducted in a multi-task learning manner. The resultant representations can capture information more correlated with phonetic structures and improve the generalization across languages and domains. We evaluate the effectiveness of UniSpeech for cross-lingual representation learning on public CommonVoice corpus. The results show that UniSpeech outperforms self-supervised pretraining and supervised transfer learning for speech recognition by a maximum of 13.4% and 17.8% relative phone error rate reductions respectively (averaged over all testing languages). The transferability of UniSpeech is also demonstrated on a domain-shift speech recognition task, i.e., a relative word error rate reduction of 6% against the previous approach.

Reference Key	huang2021unispeech Use this key to autocite in the manuscript while using SciMatic Manuscript Manager or Thesis Manager
Authors	Chengyi Wang; Yu Wu; Yao Qian; Kenichi Kumatani; Shujie Liu; Furu Wei; Michael Zeng; Xuedong Huang
Journal	arXiv
Year	2021
DOI	DOI not found
URL	http://arxiv.org/abs/2101.07597v2
Keywords	cs.lg cs.cl eess.as cs.sd

Citations

No citations found. To add a citation, contact the admin at info@scimatic.org

Comments

No comments yet. Be the first to comment on this article.