Examining the performance of classification algorithms for imbalanced data sets in web author identification

Clicks: 266
ID: 104887
2016
Article Quality & Performance Metrics
Overall Quality Improving Quality
0.0 /100
Combines engagement data with AI-assessed academic quality
AI Quality Assessment
Not analyzed
Abstract
Individuals, criminals or even terrorist organizations can use web-communication for criminal purposes; to avoid the prosecution they try to hide their identity. To increase level of safety in Web we have to improve the author (or web-user) identification and authentication procedures. In field of web author identification the situation of imbalanced data sets appears rather frequent, when number of one author's texts significantly exceeds the number of other's. This is common situation for the modern web: social networks, blogs, emails etc. Author identification task is some sort of classification task. To develop methods, technics and tools for web author identification we have to examine the performance of classification algorithms for imbalanced data sets. In this work several modern classification algorithms were tested on data sets with various levels of class imbalance and different number of available webpost The best accuracy in all experiments was achieved with Random Forest algorithm.
Reference Key
vorobeva2016examiningproceedings Use this key to autocite in the manuscript while using SciMatic Manuscript Manager or Thesis Manager
Authors Vorobeva, Alisa A.;
Journal proceedings of the xxth conference of open innovations association fruct
Year 2016
DOI
DOI not found
URL
Keywords

Citations

No citations found. To add a citation, contact the admin at info@scimatic.org

No comments yet. Be the first to comment on this article.