A Corpus for Large-Scale Phonetic Typology
Clicks: 14
ID: 282571
2020
A major hurdle in data-driven research on typology is having sufficient data
in many languages to draw meaningful conclusions. We present VoxClamantis v1.0,
the first large-scale corpus for phonetic typology, with aligned segments and
estimated phoneme-level labels in 690 readings spanning 635 languages, along
with acoustic-phonetic measures of vowels and sibilants. Access to such data
can greatly facilitate investigation of phonetic typology at a large scale and
across many languages. However, it is non-trivial and computationally intensive
to obtain such alignments for hundreds of languages, many of which have few to
no resources presently available. We describe the methodology to create our
corpus, discuss caveats with current methods and their impact on the utility of
this data, and illustrate possible research directions through a series of case
studies on the 48 highest-quality readings. Our corpus and scripts are publicly
available for non-commercial use at https://voxclamantisproject.github.io.
Reference Key |
eisner2020a
Use this key to autocite in the manuscript while using
SciMatic Manuscript Manager or Thesis Manager
|
---|---|
Authors | Elizabeth Salesky; Eleanor Chodroff; Tiago Pimentel; Matthew Wiesner; Ryan Cotterell; Alan W Black; Jason Eisner |
Journal | arXiv |
Year | 2020 |
DOI | DOI not found |
URL | |
Keywords |
Citations
No citations found. To add a citation, contact the admin at info@scimatic.org
Comments
No comments yet. Be the first to comment on this article.