A7׳ta: Data on a monolingual Arabic parallel corpus for grammar checking.

Clicks: 195
ID: 100385
2019
Article Quality & Performance Metrics
Overall Quality Improving Quality
0.0 /100
Combines engagement data with AI-assessed academic quality
AI Quality Assessment
Not analyzed
Abstract
Grammar error correction can be considered as a "translation" problem, such that an erroneous sentence is "translated" into a correct version of the sentence in the same language. This can be accomplished by employing techniques like Statistical Machine Translation (SMT) or Neural Machine Translation (NMT). Producing models for SMT or NMT for the goal of grammar correction requires monolingual parallel corpora of a certain language. This data article presents a monolingual parallel corpus of Arabic text called A7׳ta (). It contains 470 erroneous sentences and their 470 error-free counterparts. This is an Arabic parallel corpus that can be used as a linguistic resource for Arabic natural language processing (NLP) mainly to train sequence-to-sequence models for grammar checking. Sentences were manually collected from a book that has been prepared as a guide for correctly writing and using Arabic grammar and other linguistic features. Although there are a number of available Arabic corpora of errors and corrections [2] such as QALB [10] and Arabic Learner Corpus [11], the data we present in this article is an effort to increase the number of freely available Arabic corpora of errors and corrections by providing a detailed error specification and leveraging the work of language experts.
Reference Key
madi2019a7tadata Use this key to autocite in the manuscript while using SciMatic Manuscript Manager or Thesis Manager
Authors Madi, Nora;Al-Khalifa, Hend S;
Journal Data in brief
Year 2019
DOI
10.1016/j.dib.2018.11.146
URL
Keywords

Citations

No citations found. To add a citation, contact the admin at info@scimatic.org

No comments yet. Be the first to comment on this article.