Phonetic and Prosody-aware Self-supervised Learning Approach for Non-native Fluency Scoring

Clicks: 20

ID: 282566

2023

Article Quality & Performance Metrics

Overall Quality

0.0 /100

Combines engagement data with AI-assessed academic quality

Reader Engagement

0.0 /100

0 views

0 readers

AI Quality Assessment

Not analyzed

Speech fluency/disfluency can be evaluated by analyzing a range of phonetic and prosodic features. Deep neural networks are commonly trained to map fluency-related features into the human scores. However, the effectiveness of deep learning-based models is constrained by the limited amount of labeled training samples. To address this, we introduce a self-supervised learning (SSL) approach that takes into account phonetic and prosody awareness for fluency scoring. Specifically, we first pre-train the model using a reconstruction loss function, by masking phones and their durations jointly on a large amount of unlabeled speech and text prompts. We then fine-tune the pre-trained model using human-annotated scoring data. Our experimental results, conducted on datasets such as Speechocean762 and our non-native datasets, show that our proposed method outperforms the baseline systems in terms of Pearson correlation coefficients (PCC). Moreover, we also conduct an ablation study to better understand the contribution of phonetic and prosody factors during the pre-training stage.

Reference Key	ma2023phonetic Use this key to autocite in the manuscript while using SciMatic Manuscript Manager or Thesis Manager
Authors	Kaiqi Fu; Shaojun Gao; Shuju Shi; Xiaohai Tian; Wei Li; Zejun Ma
Journal	arXiv
Year	2023
DOI	DOI not found
URL	http://arxiv.org/abs/2305.11438v1
Keywords	cs.cl eess.as

Citations

No citations found. To add a citation, contact the admin at info@scimatic.org

Comments

No comments yet. Be the first to comment on this article.