A comparative study of feature selection methods for binary text streams classification

被引:0
作者
Matheus Bernardelli de Moraes
Andre Leon Sampaio Gradvohl
机构
[1] University of Campinas,School of Technology
来源
Evolving Systems | 2021年 / 12卷
关键词
Text streams; Feature drift; Feature selection; Evolving regularization; Binary classification; Concept drift;
D O I
暂无
中图分类号
学科分类号
摘要
Text streams are a continuous flow of high-dimensional text, transmitted at high-volume and high-velocities. They are expected to be classified in real-time, which is challenging due to the high dimensionality of feature space. Applying feature selection algorithms is one solution to reduce text streams feature space and improve the learning process. However, since text streams are potentially unbounded, it is expected a change in their probabilistic distribution over time, the so-called Concept Drift. The concept drift impacts the feature selection process due to the feature drift when the relevance of features is also subject to changes over time. This paper presents a comparative study of six feature selection methods for binary text streams classification, even in the presence of feature drift. We also propose the Online Feature Selection with Evolving Regularization (OFSER) algorithm, a modified version of the Online Feature Selection (OFS) algorithm, which uses evolving regularization to dynamically penalize model complexity, reducing feature drift impacts on the feature selection process. We conducted the experimental analysis on eleven real-world, commonly used datasets for text classification. The OFSER algorithm showed F1-scores up to 12.92% higher than other algorithms in some cases. The results using Iman and Davenport and Bergmann–Hommel’s tests show that OFSER algorithm is statistically superior to Information Gain and Extremal Feature Selection algorithms in terms of improving the base classifier predictive power.
引用
收藏
页码:997 / 1013
页数:16
相关论文
共 50 条
  • [41] An enhanced feature selection method for text classification
    Kang, Jinbeom
    Lee, Eunshil
    Hong, Kwanghee
    Park, Jeahyun
    Kim, Taehwan
    Park, Juyoung
    Choi, Joongmin
    Yang, Jaeyoung
    PROCEEDINGS OF THE SECOND IASTED INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE, 2006, : 36 - 41
  • [42] A new approach to feature selection in text classification
    Wang, Y
    Wang, XJ
    PROCEEDINGS OF 2005 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-9, 2005, : 3814 - 3819
  • [43] Higher order feature selection for text classification
    Bakus, J
    Kamel, MS
    KNOWLEDGE AND INFORMATION SYSTEMS, 2006, 9 (04) : 468 - 491
  • [44] Feature selection for text classification with Naive Bayes
    Chen, Jingnian
    Huang, Houkuan
    Tian, Shengfeng
    Qu, Youli
    EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (03) : 5432 - 5435
  • [45] Higher order feature selection for text classification
    Jan Bakus
    Mohamed S. Kamel
    Knowledge and Information Systems, 2006, 9 : 468 - 491
  • [46] Efficient Method for Feature Selection in Text Classification
    Sun, Jian
    Zhang, Xiang
    Liao, Dan
    Chang, Victor
    2017 INTERNATIONAL CONFERENCE ON ENGINEERING AND TECHNOLOGY (ICET), 2017,
  • [47] Comparative study of feature selection methods on microarray data
    Miyamoto, T
    Uchimura, S
    Hamamoto, Y
    Iizuka, N
    Oka, M
    Yamada-Okabe, H
    IEEE EMBS APBME 2003, 2003, : 82 - 83
  • [48] Study on the Method of Feature Selection Based on Hybrid Model for Text Classification
    Li, Runzhi
    Zhang, Yangsen
    MATERIALS SCIENCE AND INFORMATION TECHNOLOGY, PTS 1-8, 2012, 433-440 : 2881 - 2886
  • [49] Feature selection for multiple binary classification problems
    Shapira, Y
    Gath, I
    PATTERN RECOGNITION LETTERS, 1999, 20 (08) : 823 - 832
  • [50] Feature Selection and Molecular Classification of Cancer Phenotypes: A Comparative Study
    Zanella, Luca
    Facco, Pierantonio
    Bezzo, Fabrizio
    Cimetta, Elisa
    INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2022, 23 (16)