Analysis of inverse class frequency in centroid-based text classification

被引:0
作者
Lertnattee, V [1 ]
Theeramunkong, T [1 ]
机构
[1] Siringhorn Int Inst Technol, Informat Technol Program, Bangkadi 12000, Maung, Thailand
来源
IEEE INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS AND INFORMATION TECHNOLOGIES 2004 (ISCIT 2004), PROCEEDINGS, VOLS 1 AND 2: SMART INFO-MEDIA SYSTEMS | 2004年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Most previous works on text categorization applied term occurrence frequency and inverse document frequency for representing importance of terms. This paper presents an analysis of inverse class frequency in centroid-based text categorization. Two aims of this paper are concerned. The first one is to find appropriate functions of inverse class frequency. The other is to find the key factors for using inverse class frequency. The experimental results show that the key factors, which improve classification accuracy, are the numbers of few-class terms and most-class terms. When large numbers of few-class terms and most-class terms are obtained, logarithmic function of inverse class frequency is the most effective when it is combined with term frequency. The square root of inverse class frequency incorporated into TFIDF, works well in the case that data sets include a small number of few-class terms and most-class terms. To increase the numbers of these effective terms, some methods are involved i.e. using higher gram models, small number of classes and large number of training set size.
引用
收藏
页码:1171 / 1176
页数:6
相关论文
共 50 条
  • [41] CenDE: Centroid-based Differential Evolution
    Salehinejad, Hojjat
    Rahnamayan, Shahryar
    Tizhoosh, Hamid R.
    2018 IEEE CANADIAN CONFERENCE ON ELECTRICAL & COMPUTER ENGINEERING (CCECE), 2018,
  • [42] Algorithm for centroid-based tracking of moving objects
    Nascimento, Jacinto C.
    Abrantes, Arnaldo J.
    Marques, Jorge S.
    ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 1999, 6 : 3305 - 3308
  • [43] An Improved Centroid-based Approach for Multi-label Classification of Web Pages by Genre
    Jebari, Chaker
    2011 23RD IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2011), 2011, : 889 - 890
  • [44] Centroid-based robust audio watermarking scheme
    Fan, Mingquan
    Wang, Hongxia
    2008 INTERNATIONAL CONFERENCE ON AUDIO, LANGUAGE AND IMAGE PROCESSING, VOLS 1 AND 2, PROCEEDINGS, 2008, : 476 - 479
  • [45] Performance Analysis of Centroid-based Routing Protocol in Wireless Sensor Networks
    Basumatary, Habila
    Debnath, Arindam
    Deb Barma, Mrinal Kanti
    Bhattacharyya, Bidyut Kumar
    IETE TECHNICAL REVIEW, 2024, 41 (02) : 187 - 199
  • [46] A Weighted Method to Improve the Centroid-based Classifier
    Liu, Chuan
    Wang, Wen-yong
    Tu, Guang-hui
    Liu, Nan-nan
    Xiang, Yu
    2016 INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING AND AUTOMATION (ICEEA 2016), 2016,
  • [47] Centroid-Based Multiple Local Community Detection
    Li, Boyu
    Kamuhanda, Dany
    He, Kun
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2024, 11 (01) : 455 - 464
  • [48] Centroid-based focused crawler with incremental ability
    Wang, Hui
    Zuo, Wanli
    Wang, Huiyu
    Ning, Aijun
    Sun, Zhiwei
    Man, Chunlei
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2009, 46 (02): : 217 - 224
  • [49] Assessing Centroid-Based Classification Models for Intrusion Detection System Using Composite Indicators
    Setiawan, Bambang
    Djanali, Supeno
    Ahmad, Tohari
    Aziz, Moh Nasrul
    FIFTH INFORMATION SYSTEMS INTERNATIONAL CONFERENCE, 2019, 161 : 665 - 676
  • [50] A Centroid-Based Automatic Image Registration Method
    Peng ZHENG
    Keni ZHENG
    Xiquan SHI
    Journal of Mathematical Research with Applications, 2019, 39 (06) : 619 - 632