The Use of Entropy Measure for Higher Quality Machine Learning Algorithms in Text Data Processing

被引:1
作者
Guseva, Anna I. [1 ]
Kuznetsov, Igor A. [1 ]
机构
[1] Natl Res Nucl Univ, MEPhI Moscow Engn Phys Inst, Fac Business Informat & Complex Syst Management, Moscow, Russia
来源
2017 5TH INTERNATIONAL CONFERENCE ON FUTURE INTERNET OF THINGS AND CLOUD WORKSHOPS (FICLOUDW) 2017 | 2017年
关键词
entropy measure; classification; text mining; big data; machine learning; algorithm quality comparison;
D O I
10.1109/FiCloudW.2017.84
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A large number of text data are regularly published in social networks and the media. Processing and analysis of such information is an highly required direction. This paper focuses on the way to use the entropy measure when dealing with big volumes of text data in classification. The used entropy measure stands for algorithm quality criteria when defining a class in a set of data. The work also features a comparative efficiency analysis of the proposed approach used with different number of classes and diverse machine learning algorithms. The Entropy measure can be used to build a committee of voting algorithms afterwards as well.
引用
收藏
页码:47 / 52
页数:6
相关论文
共 11 条
  • [1] Entropy, Shannon's Measure of Information and Boltzmann's H-Theorem
    Ben-Naim, Arieh
    [J]. ENTROPY, 2017, 19 (02):
  • [2] Does data splitting improve prediction?
    Faraway, Julian J.
    [J]. STATISTICS AND COMPUTING, 2016, 26 (1-2) : 49 - 60
  • [3] Hady A., 2011, THESIS, DOI [10.18725/OPARU-1750, DOI 10.18725/OPARU-1750]
  • [4] Kireev V., 2016, CEUR WORKSHOP P, V1752, P37
  • [5] Kireev V. S, 2016, INT J APPL ENG RES, V11, P6613
  • [6] Integrated system of databases on the properties of inorganic substances and materials
    Kiselyova, N. N.
    Dudarev, V. A.
    Stolyarenko, A. V.
    [J]. HIGH TEMPERATURE, 2016, 54 (02) : 215 - 222
  • [7] Multi-label maximum entropy model for social emotion classification over short text
    Li, Jun
    Rao, Yanghui
    Jin, Fengmei
    Chen, Huijun
    Xiang, Xiyun
    [J]. NEUROCOMPUTING, 2016, 210 : 247 - 256
  • [8] Big Data, Fast Data and Data Lake Concepts
    Miloslavskaya, Natalia
    Tolstoy, Alexander
    [J]. 7TH ANNUAL INTERNATIONAL CONFERENCE ON BIOLOGICALLY INSPIRED COGNITIVE ARCHITECTURES, (BICA 2016), 2016, 88 : 300 - 305
  • [9] Rokach Lior, 2009, ENSEMBLE BASED CLASS
  • [10] Noise robust and rotation invariant entropy features for texture classification
    Shakoor, Mohammad Hossein
    Tajeripour, Farshad
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (06) : 8031 - 8066