transliteration normalization for information extraction and machine translation

;Yuval Marton;Imed Zitouni

doi:10.1016/j.jksuci.2014.06.011

transliteration normalization for information extraction and machine translation

คลิก: 225

รหัส: 149390

2014

PDF ฟรี

คุณภาพบทความและตัวชี้วัดประสิทธิภาพ

คุณภาพโดยรวม Improving Quality

0.0 /100

รวมข้อมูลการมีส่วนร่วมกับการประเมินคุณภาพทางวิชาการโดย AI

การมีส่วนร่วมของผู้อ่าน Steady Performance

30.0 /100

209 การเข้าชม

26 ผู้อ่าน

การประเมินคุณภาพโดย AI

ยังไม่ได้วิเคราะห์

บทคัดย่อ

EN

Foreign name transliterations typically include multiple spelling variants. These variants cause data sparseness and inconsistency problems, increase the Out-of-Vocabulary (OOV) rate, and present challenges for Machine Translation, Information Extraction and other natural language processing (NLP) tasks. This work aims to identify and cluster name spelling variants using a Statistical Machine Translation method: word alignment. The variants are identified by being aligned to the same “pivot” name in another language (the source-language in Machine Translation settings). Based on word-to-word translation and transliteration probabilities, as well as the string edit distance metric, names with similar spellings in the target language are clustered and then normalized to a canonical form. With this approach, tens of thousands of high-precision name transliteration spelling variants are extracted from sentence-aligned bilingual corpora in Arabic and English (in both languages). When these normalized name spelling variants are applied to Information Extraction tasks, improvements over strong baseline systems are observed. When applied to Machine Translation tasks, a large improvement potential is shown.

คีย์อ้างอิง	marton2014journaltransliteration ใช้คีย์นี้เพื่ออ้างอิงอัตโนมัติในต้นฉบับขณะใช้งาน SciMatic Manuscript Manager หรือ Thesis Manager
ผู้เขียน	;Yuval Marton;Imed Zitouni
วารสาร	journal of heritage tourism
ปี	2014
DOI	10.1016/j.jksuci.2014.06.011 กำลังค้นหา DOI...
URL	http://www.sciencedirect.com/science/article/pii/S1319157814000354 https://doi.org/10.1016/j.jksuci.2014.06.011
คำสำคัญ	named entity recognition arabic information extraction computer science

การอ้างอิง

ไม่พบการอ้างอิง หากต้องการเพิ่มการอ้างอิง กรุณาติดต่อผู้ดูแลระบบที่ info@scimatic.org

ความคิดเห็น

เข้าสู่ระบบเพื่อแสดงความคิดเห็น ลงทะเบียน

ยังไม่มีความคิดเห็น เป็นคนแรกที่แสดงความคิดเห็นเกี่ยวกับบทความนี้