Phonetic Segmentation of the UCLA Phonetics Lab Archive
Clicks: 39
ID: 282556
2024
Article Quality & Performance Metrics
Overall Quality
Improving Quality
0.0
/100
Combines engagement data with AI-assessed academic quality
Reader Engagement
Emerging Content
3.3
/100
11 views
11 readers
Trending
AI Quality Assessment
Not analyzed
Abstract
Research in speech technologies and comparative linguistics depends on access
to diverse and accessible speech data. The UCLA Phonetics Lab Archive is one of
the earliest multilingual speech corpora, with long-form audio recordings and
phonetic transcriptions for 314 languages (Ladefoged et al., 2009). Recently,
95 of these languages were time-aligned with word-level phonetic transcriptions
(Li et al., 2021). Here we present VoxAngeles, a corpus of audited phonetic
transcriptions and phone-level alignments of the UCLA Phonetics Lab Archive,
which uses the 95-language CMU re-release as our starting point. VoxAngeles
also includes word- and phone-level segmentations from the original UCLA
corpus, as well as phonetic measurements of word and phone durations, vowel
formants, and vowel f0. This corpus enhances the usability of the original
data, particularly for quantitative phonetic typology, as demonstrated through
a case study of vowel intrinsic f0. We also discuss the utility of the
VoxAngeles corpus for general research and pedagogy in crosslinguistic
phonetics, as well as for low-resource and multilingual speech technologies.
VoxAngeles is free to download and use under a CC-BY-NC 4.0 license.
| Reference Key |
moran2024phonetic
Use this key to autocite in the manuscript while using
SciMatic Manuscript Manager or Thesis Manager
|
|---|---|
| Authors | Eleanor Chodroff; Blaž Pažon; Annie Baker; Steven Moran |
| Journal | arXiv |
| Year | 2024 |
| DOI |
DOI not found
|
| URL | |
| Keywords |
Citations
No citations found. To add a citation, contact the admin at info@scimatic.org
Comments
No comments yet. Be the first to comment on this article.