Phonetic and Prosody-aware Self-supervised Learning Approach for Non-native Fluency Scoring
Clicks: 20
ID: 282566
2023
Article Quality & Performance Metrics
Overall Quality
0.0
/100
Combines engagement data with AI-assessed academic quality
Reader Engagement
0.0
/100
0 views
0 readers
AI Quality Assessment
Not analyzed
Speech fluency/disfluency can be evaluated by analyzing a range of phonetic
and prosodic features. Deep neural networks are commonly trained to map
fluency-related features into the human scores. However, the effectiveness of
deep learning-based models is constrained by the limited amount of labeled
training samples. To address this, we introduce a self-supervised learning
(SSL) approach that takes into account phonetic and prosody awareness for
fluency scoring. Specifically, we first pre-train the model using a
reconstruction loss function, by masking phones and their durations jointly on
a large amount of unlabeled speech and text prompts. We then fine-tune the
pre-trained model using human-annotated scoring data. Our experimental results,
conducted on datasets such as Speechocean762 and our non-native datasets, show
that our proposed method outperforms the baseline systems in terms of Pearson
correlation coefficients (PCC). Moreover, we also conduct an ablation study to
better understand the contribution of phonetic and prosody factors during the
pre-training stage.
Reference Key |
ma2023phonetic
Use this key to autocite in the manuscript while using
SciMatic Manuscript Manager or Thesis Manager
|
---|---|
Authors | Kaiqi Fu; Shaojun Gao; Shuju Shi; Xiaohai Tian; Wei Li; Zejun Ma |
Journal | arXiv |
Year | 2023 |
DOI | DOI not found |
URL | |
Keywords |
Citations
No citations found. To add a citation, contact the admin at info@scimatic.org
Comments
No comments yet. Be the first to comment on this article.