Minimum Bayes Risk Training for End-to-End Speaker-Attributed ASR
Clicks: 41
ID: 282327
2020
Article Quality & Performance Metrics
Overall Quality
Improving Quality
0.0
/100
Combines engagement data with AI-assessed academic quality
Reader Engagement
Emerging Content
5.1
/100
17 views
17 readers
Trending
AI Quality Assessment
Not analyzed
Abstract
Recently, an end-to-end speaker-attributed automatic speech recognition (E2E
SA-ASR) model was proposed as a joint model of speaker counting, speech
recognition and speaker identification for monaural overlapped speech. In the
previous study, the model parameters were trained based on the
speaker-attributed maximum mutual information (SA-MMI) criterion, with which
the joint posterior probability for multi-talker transcription and speaker
identification are maximized over training data. Although SA-MMI training
showed promising results for overlapped speech consisting of various numbers of
speakers, the training criterion was not directly linked to the final
evaluation metric, i.e., speaker-attributed word error rate (SA-WER). In this
paper, we propose a speaker-attributed minimum Bayes risk (SA-MBR) training
method where the parameters are trained to directly minimize the expected
SA-WER over the training data. Experiments using the LibriSpeech corpus show
that the proposed SA-MBR training reduces the SA-WER by 9.0 % relative compared
with the SA-MMI-trained model.
| Reference Key |
yoshioka2020minimum
Use this key to autocite in the manuscript while using
SciMatic Manuscript Manager or Thesis Manager
|
|---|---|
| Authors | Naoyuki Kanda; Zhong Meng; Liang Lu; Yashesh Gaur; Xiaofei Wang; Zhuo Chen; Takuya Yoshioka |
| Journal | arXiv |
| Year | 2020 |
| DOI |
DOI not found
|
| URL | |
| Keywords |
Citations
No citations found. To add a citation, contact the admin at info@scimatic.org
Comments
No comments yet. Be the first to comment on this article.