Term-length normalization for centroid-based text categorization

被引:0
作者
Lertnattee, V [1 ]
Theeramunkong, T [1 ]
机构
[1] Thammasat Univ, Sirindhorn Int Inst Tehcnol, Informat Technol Program, Pathum Thani 12121, Thailand
来源
KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT 1, PROCEEDINGS | 2003年 / 2773卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Centroid-based categorization is one of the most popular algorithms in text classification. Normalization is an important factor to improve performance of a centroid-based classifier when documents in text collection have quite different sizes. In the past, normalization involved with only document- or class-length normalization. In this paper, we propose a new type of normalization called term-length normalization which considers term. distribution in a class. The performance of this normalization is investigated in three environments of a standard centroid-based classifier (TFIDF): (1) without class-length normalization, (2) with cosine class-length normalization and (3) with summing weight normalization. The results suggest that our term-length normalization is useful for improving classification accuracy in all cases.
引用
收藏
页码:850 / 856
页数:7
相关论文
共 50 条
  • [21] A Centroid-Based Outlier Detection Method
    Wang, Xiaochun
    Chen, Yiqin
    Wang, Xia Li
    PROCEEDINGS 2017 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE (CSCI), 2017, : 1411 - 1416
  • [22] Graph and Centroid-based Word Clustering
    Thaiprayoon, Santipong
    Unger, Herwig
    Kubek, Mario
    2020 4TH INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND INFORMATION RETRIEVAL, NLPIR 2020, 2020, : 163 - 168
  • [23] Centroid-based maximum intensity projections
    Cash, DM
    Palmisano, MG
    Galloway, RL
    JOURNAL OF COMPUTER ASSISTED TOMOGRAPHY, 2002, 26 (01) : 73 - 83
  • [24] Centroid-based summarization of multiple documents
    Radev, DR
    Jing, HY
    Stys, M
    Tam, D
    INFORMATION PROCESSING & MANAGEMENT, 2004, 40 (06) : 919 - 938
  • [25] Centroid-Based Classification of Categorical Data
    Chen, Lifei
    Guo, Gongde
    WEB-AGE INFORMATION MANAGEMENT, WAIM 2014, 2014, 8485 : 472 - 475
  • [26] A centroid-based nonparametric regression estimator
    Barry, RP
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 1996, 25 (01) : 81 - 97
  • [27] Centroid-based sifting for empiricalmode decomposition
    Hong Hong
    Xin-long Wang
    Zhi-yong Tao
    Shuan-ping Du
    Journal of Zhejiang University SCIENCE C, 2011, 12 : 88 - 95
  • [28] Centroid-based Clustering for Graph Datasets
    Chen, Lifei
    Wang, Shengrui
    Yan, Xuanhui
    2012 21ST INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR 2012), 2012, : 2144 - 2147
  • [29] CenDE: Centroid-based Differential Evolution
    Salehinejad, Hojjat
    Rahnamayan, Shahryar
    Tizhoosh, Hamid R.
    2018 IEEE CANADIAN CONFERENCE ON ELECTRICAL & COMPUTER ENGINEERING (CCECE), 2018,
  • [30] Algorithm for centroid-based tracking of moving objects
    Nascimento, Jacinto C.
    Abrantes, Arnaldo J.
    Marques, Jorge S.
    ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 1999, 6 : 3305 - 3308