Term-length normalization for centroid-based text categorization

被引:0
|
作者
Lertnattee, V [1 ]
Theeramunkong, T [1 ]
机构
[1] Thammasat Univ, Sirindhorn Int Inst Tehcnol, Informat Technol Program, Pathum Thani 12121, Thailand
来源
KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT 1, PROCEEDINGS | 2003年 / 2773卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Centroid-based categorization is one of the most popular algorithms in text classification. Normalization is an important factor to improve performance of a centroid-based classifier when documents in text collection have quite different sizes. In the past, normalization involved with only document- or class-length normalization. In this paper, we propose a new type of normalization called term-length normalization which considers term. distribution in a class. The performance of this normalization is investigated in three environments of a standard centroid-based classifier (TFIDF): (1) without class-length normalization, (2) with cosine class-length normalization and (3) with summing weight normalization. The results suggest that our term-length normalization is useful for improving classification accuracy in all cases.
引用
收藏
页码:850 / 856
页数:7
相关论文
共 50 条
  • [1] Class normalization in centroid-based text categorization
    Lertnattee, Verayuth
    Theeramunkong, Thanaruk
    INFORMATION SCIENCES, 2006, 176 (12) : 1712 - 1738
  • [2] Effect of term distributions on centroid-based text categorization
    Lertnattee, V
    Theeramunkong, T
    INFORMATION SCIENCES, 2004, 158 : 89 - 115
  • [3] Supervised term weighting centroid-based classifiers for text categorization
    Nguyen, Tam T.
    Chang, Kuiyu
    Hui, Siu Cheung
    KNOWLEDGE AND INFORMATION SYSTEMS, 2013, 35 (01) : 61 - 85
  • [4] Supervised term weighting centroid-based classifiers for text categorization
    Tam T. Nguyen
    Kuiyu Chang
    Siu Cheung Hui
    Knowledge and Information Systems, 2013, 35 : 61 - 85
  • [5] A Framework of Centroid-Based Methods for Text Categorization
    Wang, Dandan
    Chen, Qingcai
    Wang, Xiaolong
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2014, E97D (02): : 245 - 254
  • [6] A New Centroid-Based Classifier for Text Categorization
    Chen, Lifei
    Ye, Yanfang
    Jiang, Qingshan
    2008 22ND INTERNATIONAL WORKSHOPS ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS, VOLS 1-3, 2008, : 1217 - +
  • [7] A new Centroid-Based Classification model for text categorization
    Liu, Chuan
    Wang, Wenyong
    Tu, Guanghui
    Xiang, Yu
    Wang, Siyang
    Lv, Fengmao
    KNOWLEDGE-BASED SYSTEMS, 2017, 136 : 15 - 26
  • [8] Semi-supervised Single-label Text Categorization using Centroid-based Classifiers
    Cardoso-Cachopo, Ana
    Oliveira, Arlindo L.
    APPLIED COMPUTING 2007, VOL 1 AND 2, 2007, : 844 - +
  • [9] An improvement of centroid-based classification algorithm for text classification
    Cataltepe, Zehra
    Aygun, Eser
    2007 IEEE 23RD INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOP, VOLS 1-2, 2007, : 952 - 956
  • [10] Combining homogeneous classifiers for centroid-based text classification
    Lertnattee, V
    Theeramunkong, T
    ISCC 2002: SEVENTH INTERNATIONAL SYMPOSIUM ON COMPUTERS AND COMMUNICATIONS, PROCEEDINGS, 2002, : 1034 - 1039