A comparative study of feature selection methods for binary text streams classification

被引:0
|
作者
Matheus Bernardelli de Moraes
Andre Leon Sampaio Gradvohl
机构
[1] University of Campinas,School of Technology
来源
Evolving Systems | 2021年 / 12卷
关键词
Text streams; Feature drift; Feature selection; Evolving regularization; Binary classification; Concept drift;
D O I
暂无
中图分类号
学科分类号
摘要
Text streams are a continuous flow of high-dimensional text, transmitted at high-volume and high-velocities. They are expected to be classified in real-time, which is challenging due to the high dimensionality of feature space. Applying feature selection algorithms is one solution to reduce text streams feature space and improve the learning process. However, since text streams are potentially unbounded, it is expected a change in their probabilistic distribution over time, the so-called Concept Drift. The concept drift impacts the feature selection process due to the feature drift when the relevance of features is also subject to changes over time. This paper presents a comparative study of six feature selection methods for binary text streams classification, even in the presence of feature drift. We also propose the Online Feature Selection with Evolving Regularization (OFSER) algorithm, a modified version of the Online Feature Selection (OFS) algorithm, which uses evolving regularization to dynamically penalize model complexity, reducing feature drift impacts on the feature selection process. We conducted the experimental analysis on eleven real-world, commonly used datasets for text classification. The OFSER algorithm showed F1-scores up to 12.92% higher than other algorithms in some cases. The results using Iman and Davenport and Bergmann–Hommel’s tests show that OFSER algorithm is statistically superior to Information Gain and Extremal Feature Selection algorithms in terms of improving the base classifier predictive power.
引用
收藏
页码:997 / 1013
页数:16
相关论文
共 50 条
  • [21] The Effect of Combining Different Feature Selection Methods on Arabic Text Classification
    Al-Thubaity, Abdulmohsen
    Abanumay, Norah
    AL-Jerayyed, Sara
    Alrukban, Aljoharah
    Mannaa, Zarah
    2013 14TH ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING (SNPD 2013), 2013, : 211 - 216
  • [22] Contextual feature selection for text classification
    Paradis, Francois
    Nie, Jian-Yun
    INFORMATION PROCESSING & MANAGEMENT, 2007, 43 (02) : 344 - 352
  • [23] Hybrid feature selection for text classification
    Gunal, Serkan
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2012, 20 : 1296 - 1311
  • [24] Feature selection for text classification: A review
    Deng, Xuelian
    Li, Yuqing
    Weng, Jian
    Zhang, Jilian
    MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (03) : 3797 - 3816
  • [25] Feature Selection Strategy in Text Classification
    Fung, Pui Cheong Gabriel
    Morstatter, Fred
    Liu, Huan
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT I: 15TH PACIFIC-ASIA CONFERENCE, PAKDD 2011, 2011, 6634 : 26 - 37
  • [26] Feature selection for text classification: A review
    Xuelian Deng
    Yuqing Li
    Jian Weng
    Jilian Zhang
    Multimedia Tools and Applications, 2019, 78 : 3797 - 3816
  • [27] A Review on Feature Selection and Feature Extraction for Text Classification
    Shah, Foram P.
    Patel, Vibha
    PROCEEDINGS OF THE 2016 IEEE INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, SIGNAL PROCESSING AND NETWORKING (WISPNET), 2016, : 2264 - 2268
  • [28] A comparative study on the effect of feature selection on classification accuracy
    Karabulut, Esra Mahsereci
    Ozel, Selma Ayse
    Ibrikci, Turgay
    FIRST WORLD CONFERENCE ON INNOVATION AND COMPUTER SCIENCES (INSODE 2011), 2012, 1 : 323 - 327
  • [29] A Comparative Study of Feature Selection for SVM in Video Text Detection
    Wang Zhen
    Wei Zhiqiang
    SECOND INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN, VOL 2, PROCEEDINGS, 2009, : 552 - 556
  • [30] COMPARATIVE STUDY OF FEATURE SELECTION APPROACHES FOR URDU TEXT CATEGORIZATION
    Zia, Tehseen
    Akhter, Muhammad Pervez
    Abbas, Qaiser
    MALAYSIAN JOURNAL OF COMPUTER SCIENCE, 2015, 28 (02) : 93 - 109