Analysis of inverse class frequency in centroid-based text classification

被引:0
|
作者
Lertnattee, V [1 ]
Theeramunkong, T [1 ]
机构
[1] Siringhorn Int Inst Technol, Informat Technol Program, Bangkadi 12000, Maung, Thailand
来源
IEEE INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS AND INFORMATION TECHNOLOGIES 2004 (ISCIT 2004), PROCEEDINGS, VOLS 1 AND 2: SMART INFO-MEDIA SYSTEMS | 2004年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Most previous works on text categorization applied term occurrence frequency and inverse document frequency for representing importance of terms. This paper presents an analysis of inverse class frequency in centroid-based text categorization. Two aims of this paper are concerned. The first one is to find appropriate functions of inverse class frequency. The other is to find the key factors for using inverse class frequency. The experimental results show that the key factors, which improve classification accuracy, are the numbers of few-class terms and most-class terms. When large numbers of few-class terms and most-class terms are obtained, logarithmic function of inverse class frequency is the most effective when it is combined with term frequency. The square root of inverse class frequency incorporated into TFIDF, works well in the case that data sets include a small number of few-class terms and most-class terms. To increase the numbers of these effective terms, some methods are involved i.e. using higher gram models, small number of classes and large number of training set size.
引用
收藏
页码:1171 / 1176
页数:6
相关论文
共 50 条
  • [21] A centroid-based gene selection method for microarray data classification
    Guo, Shun
    Guo, Donghui
    Chen, Lifei
    Jiang, Qingshan
    JOURNAL OF THEORETICAL BIOLOGY, 2016, 400 : 32 - 41
  • [22] Centroid-Based Particle Swarm Optimization Variant for Data Classification
    Al-Sawwa, Jamil
    Ludwig, Simone A.
    2018 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI), 2018, : 672 - 679
  • [23] Anti-spam filtering: A centroid-based classification approach
    Soonthornphisaj, N
    Chaikulseriwat, K
    Tang-On, P
    2002 6TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS, VOLS I AND II, 2002, : 1096 - 1099
  • [24] CENTROID-BASED TEXTURE CLASSIFICATION USING THE GENERALIZED GAMMA DISTRIBUTION
    Schutz, Aurelien
    Bombrun, Lionel
    Berthoumieu, Yannick
    Najim, Mohamed
    2013 PROCEEDINGS OF THE 21ST EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2013,
  • [25] A new Chinese text feature selection method in centroid-based classifier
    Gu, Yijun
    Wang, Rong
    Wang, Jianhua
    Yu, Jiangde
    2008 INTERNATIONAL SYMPOSIUM ON INFORMATION PROCESSING AND 2008 INTERNATIONAL PACIFIC WORKSHOP ON WEB MINING AND WEB-BASED APPLICATION, 2008, : 88 - +
  • [26] Centroid-Based Clustering with -Divergences
    Sarmiento, Auxiliadora
    Fondon, Irene
    Duran-Diaz, Ivan
    Cruces, Sergio
    ENTROPY, 2019, 21 (02)
  • [27] An Effective Class-centroid-based Dimension Reduction Method for Text Classification
    Pang, Guansong
    Jin, Huidong
    Jiang, Shengyi
    PROCEEDINGS OF THE 22ND INTERNATIONAL CONFERENCE ON WORLD WIDE WEB (WWW'13 COMPANION), 2013, : 223 - 224
  • [28] Multi-Label Learning with Class-Based Features Using Extended Centroid-Based Classification Technique (CCBF)
    Devi, P. R. Suganya
    Baskaran, R.
    Abirami, S.
    ELEVENTH INTERNATIONAL CONFERENCE ON COMMUNICATION NETWORKS, ICCN 2015/INDIA ELEVENTH INTERNATIONAL CONFERENCE ON DATA MINING AND WAREHOUSING, ICDMW 2015/NDIA ELEVENTH INTERNATIONAL CONFERENCE ON IMAGE AND SIGNAL PROCESSING, ICISP 2015, 2015, 54 : 405 - 411
  • [29] RANDOM CENTROID INITIALIZATION FOR IMPROVING CENTROID-BASED CLUSTERING
    Romanuke V.V.
    Decision Making: Applications in Management and Engineering, 2023, 6 (02): : 734 - 746
  • [30] Centroid-based sifting for empiricalmode decomposition
    Hong, Hong
    Wang, Xin-Long
    Tao, Zhi-Yong
    Du, Shuan-Ping
    Journal of Zhejiang University: Science C, 2011, 12 (02): : 88 - 95