A7׳ta: Data on a monolingual Arabic parallel corpus for grammar checking.

Madi; Nora;Al-Khalifa; Hend S;

doi:10.1016/j.dib.2018.11.146

A7׳ta: Data on a monolingual Arabic parallel corpus for grammar checking.

Clicks: 240

ID: 100385

2019

Article Quality & Performance Metrics

Overall Quality Improving Quality

0.0 /100

Combines engagement data with AI-assessed academic quality

Reader Engagement Steady Performance

30.0 /100

239 views

28 readers

AI Quality Assessment

Not analyzed

Abstract

EN
- Turkish
- Spanish
- Portuguese
- Arabic
- Chinese
- French
- German
- Indonesian
- Russian
- Thai

Grammar error correction can be considered as a "translation" problem, such that an erroneous sentence is "translated" into a correct version of the sentence in the same language. This can be accomplished by employing techniques like Statistical Machine Translation (SMT) or Neural Machine Translation (NMT). Producing models for SMT or NMT for the goal of grammar correction requires monolingual parallel corpora of a certain language. This data article presents a monolingual parallel corpus of Arabic text called A7׳ta (). It contains 470 erroneous sentences and their 470 error-free counterparts. This is an Arabic parallel corpus that can be used as a linguistic resource for Arabic natural language processing (NLP) mainly to train sequence-to-sequence models for grammar checking. Sentences were manually collected from a book that has been prepared as a guide for correctly writing and using Arabic grammar and other linguistic features. Although there are a number of available Arabic corpora of errors and corrections [2] such as QALB [10] and Arabic Learner Corpus [11], the data we present in this article is an effort to increase the number of freely available Arabic corpora of errors and corrections by providing a detailed error specification and leveraging the work of language experts.

Reference Key	madi2019a7tadata Use this key to autocite in the manuscript while using SciMatic Manuscript Manager or Thesis Manager
Authors	Madi, Nora;Al-Khalifa, Hend S;
Journal	Data in brief
Year	2019
DOI	10.1016/j.dib.2018.11.146 Searching for DOI...
URL	https://doi.org/10.1016/j.dib.2018.11.146
Keywords	nlp arabic language error checking parallel corpus

Citations

No citations found. To add a citation, contact the admin at info@scimatic.org

Comments

Login to comment Register

No comments yet. Be the first to comment on this article.