Text Filtering through Multi-Pattern Matching: A Case Study of Wu–Manber–Uy on the Language of Uyghur
Clicks: 250
ID: 109627
2019
Article Quality & Performance Metrics
Overall Quality
Improving Quality
0.0
/100
Combines engagement data with AI-assessed academic quality
Reader Engagement
Steady Performance
70.5
/100
249 views
203 readers
Trending
AI Quality Assessment
Not analyzed
Abstract
Given its generality in applications and its high time-efficiency on big data-sets, in recent years, the technique of text filtering through pattern matching has been attracting increasing attention from the field of information retrieval and Natural language Processing (NLP) research communities at large. That being the case, however, it has yet to be seen how this technique and its algorithms, (e.g., Wu−Manber, which is also considered in this paper) can be applied and adopted properly and effectively to Uyghur, a low-resource language that is mostly spoken by the ethnic Uyghur group with a population of more than eleven-million in Xinjiang, China. We observe that technically, the challenge is mainly caused by two factors: (1) Vowel weakening and (2) mismatching in semantics between affixes and stems. Accordingly, in this paper, we propose Wu−Manber−Uy, a variant of an improvement to Wu−Manber, dedicated particularly for working on the Uyghur language. Wu−Manber−Uy implements a stem deformation-based pattern expansion strategy, specifically for reducing the mismatching of patterns caused by vowel weakening and spelling errors. A two-way strategy that applies invigilation and control on the change of lexical meaning of stems during word-building is also used in Wu−Manber−Uy. Extra consideration with respect to Word2vec and the dictionary are incorporated into the system for processing Uyghur. The experimental results we have obtained consistently demonstrate the high performance of Wu−Manber−Uy.
| Reference Key |
tohti2019textinformation
Use this key to autocite in the manuscript while using
SciMatic Manuscript Manager or Thesis Manager
|
|---|---|
| Authors | Tohti, Turdi;Huang, Jimmy;Hamdulla, Askar;Tan, Xing; |
| Journal | information |
| Year | 2019 |
| DOI |
DOI not found
|
| URL | |
| Keywords |
Biology (General)
Information technology
Technology
Language. Linguistic theory. Comparative grammar
Computer applications to medicine. Medical informatics
history of scholarship and learning. the humanities
electronic computers. computer science
mining engineering. metallurgy
computer software
computational linguistics. natural language processing
|
Citations
No citations found. To add a citation, contact the admin at info@scimatic.org
Comments
No comments yet. Be the first to comment on this article.