The Role of Phonetic Units in Speech Emotion Recognition

Clicks: 19

ID: 282557

2021

Article Quality & Performance Metrics

Overall Quality Improving Quality

0.0 /100

Combines engagement data with AI-assessed academic quality

Reader Engagement Emerging Content

0.6 /100

2 views

2 readers

AI Quality Assessment

Not analyzed

Abstract

We propose a method for emotion recognition through emotiondependent speech recognition using Wav2vec 2.0. Our method achieved a significant improvement over most previously reported results on IEMOCAP, a benchmark emotion dataset. Different types of phonetic units are employed and compared in terms of accuracy and robustness of emotion recognition within and across datasets and languages. Models of phonemes, broad phonetic classes, and syllables all significantly outperform the utterance model, demonstrating that phonetic units are helpful and should be incorporated in speech emotion recognition. The best performance is from using broad phonetic classes. Further research is needed to investigate the optimal set of broad phonetic classes for the task of emotion recognition. Finally, we found that Wav2vec 2.0 can be fine-tuned to recognize coarser-grained or larger phonetic units than phonemes, such as broad phonetic classes and syllables.

Reference Key	church2021the Use this key to autocite in the manuscript while using SciMatic Manuscript Manager or Thesis Manager
Authors	Jiahong Yuan; Xingyu Cai; Renjie Zheng; Liang Huang; Kenneth Church
Journal	arXiv
Year	2021
DOI	DOI not found
URL	http://arxiv.org/abs/2108.01132v1
Keywords	cs.cl

Citations

No citations found. To add a citation, contact the admin at info@scimatic.org

Comments

No comments yet. Be the first to comment on this article.