Examining the performance of classification algorithms for imbalanced data sets in web author identification
Clicks: 266
ID: 104887
2016
Article Quality & Performance Metrics
Overall Quality
Improving Quality
0.0
/100
Combines engagement data with AI-assessed academic quality
Reader Engagement
Star Article
65.2
/100
260 views
211 readers
Trending
AI Quality Assessment
Not analyzed
Abstract
Individuals, criminals or even terrorist organizations can use web-communication for criminal purposes; to avoid the prosecution they try to hide their identity. To increase level of safety in Web we have to improve the author (or web-user) identification and authentication procedures. In field of web author identification the situation of imbalanced data sets appears rather frequent, when number of one author's texts significantly exceeds the number of other's. This is common situation for the modern web: social networks, blogs, emails etc. Author identification task is some sort of classification task. To develop methods, technics and tools for web author identification we have to examine the performance of classification algorithms for imbalanced data sets. In this work several modern classification algorithms were tested on data sets with various levels of class imbalance and different number of available webpost The best accuracy in all experiments was achieved with Random Forest algorithm.
| Reference Key |
vorobeva2016examiningproceedings
Use this key to autocite in the manuscript while using
SciMatic Manuscript Manager or Thesis Manager
|
|---|---|
| Authors | Vorobeva, Alisa A.; |
| Journal | proceedings of the xxth conference of open innovations association fruct |
| Year | 2016 |
| DOI |
DOI not found
|
| URL | |
| Keywords |
Citations
No citations found. To add a citation, contact the admin at info@scimatic.org
Comments
No comments yet. Be the first to comment on this article.