Lazy learner text categorization algorithm based on embedded feature selection

被引:0
|
作者
Yan Peng~(1
2.China State Information Center
机构
关键词
machine learning; text categorization; embedded feature selection; lazy learner; cosine similarity;
D O I
暂无
中图分类号
TP181 [自动推理、机器学习];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
To avoid the curse of dimensionality,text categorization(TC)algorithms based on machine learning (ML)have to use an feature selection(FS)method to reduce the dimensionality of feature space.Although having been widely used,FS process will generally cause information losing and then have much side-effect on the whole performance of TC algorithms.On the basis of the sparsity characteristic of text vectors,a new TC algorithm based on lazy feature selection(LFS)is presented.As a new type of embedded feature selection approach,the LFS method can greatly reduce the dimension of features without any information losing,which can improve both efficiency and performance of algorithms greatly.The experiments show the new algorithm can simultaneously achieve much higher both performance and efficiency than some of other classical TC algorithms.
引用
收藏
页码:651 / 659
页数:9
相关论文
共 50 条
  • [1] Lazy learner text categorization algorithm based on embedded feature selection
    Yan Peng
    Zheng Xuefeng
    Zhu Jianyong
    Xiao Yunhong
    JOURNAL OF SYSTEMS ENGINEERING AND ELECTRONICS, 2009, 20 (03) : 651 - 659
  • [2] A novel feature selection algorithm for text categorization
    Shang, Wenqian
    Huang, Houkuan
    Zhu, Haibin
    Lin, Yongmin
    Qu, Youli
    Wang, Zhihai
    EXPERT SYSTEMS WITH APPLICATIONS, 2007, 33 (01) : 1 - 5
  • [3] Hybrid feature selection based on enhanced genetic algorithm for text categorization
    Ghareb, Abdullah Saeed
    Abu Bakar, Azuraliza
    Hamdan, Abdul Razak
    EXPERT SYSTEMS WITH APPLICATIONS, 2016, 49 : 31 - 47
  • [4] Research on the algorithm of feature selection based on Gini index for text categorization
    Shang, Wenqian
    Huang, Houkuan
    Liu, Yuling
    Lin, Yongmin
    Qu, Youli
    Dong, Hongbin
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2006, 43 (10): : 1688 - 1694
  • [5] Novel feature selection algorithm for Chinese text categorization based on CHI
    Cai Zhenliang
    Wang Jian
    Liu Jiqiang
    PROCEEDINGS OF 2016 IEEE 13TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP 2016), 2016, : 1035 - 1039
  • [6] An Algorithm of Feature Selection in Text Categorization Based on Gini-index
    Zhu, Wei-Dong
    Wang, Bo
    Lin, Yong-Min
    PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON MANAGEMENT SCIENCE AND MANAGEMENT INNOVATION, 2015, 6 : 272 - 278
  • [7] An Improved Strategy of the Feature Selection Algorithm for the Text Categorization
    Yang, Jieming
    Lu, Yixin
    Liu, Zhiying
    2019 20TH IEEE/ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING (SNPD), 2019, : 3 - 7
  • [8] Text Categorization Based on Clustering Feature Selection
    Zhou, Xiaofei
    Hu, Yue
    Guo, Li
    2ND INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND QUANTITATIVE MANAGEMENT, ITQM 2014, 2014, 31 : 398 - 405
  • [9] Feature selection based on feature interactions with application to text categorization
    Tang, Xiaochuan
    Dai, Yuanshun
    Xiang, Yanping
    EXPERT SYSTEMS WITH APPLICATIONS, 2019, 120 : 207 - 216
  • [10] GU metric - A new feature selection algorithm for text categorization
    Uchyigit, Gulden
    Clark, Keith
    ICEIS 2007: PROCEEDINGS OF THE NINTH INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS: ARTIFICIAL INTELLIGENCE AND DECISION SUPPORT SYSTEMS, 2007, : 399 - 402