Term-length normalization for centroid-based text categorization

被引:0
作者
Lertnattee, V [1 ]
Theeramunkong, T [1 ]
机构
[1] Thammasat Univ, Sirindhorn Int Inst Tehcnol, Informat Technol Program, Pathum Thani 12121, Thailand
来源
KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT 1, PROCEEDINGS | 2003年 / 2773卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Centroid-based categorization is one of the most popular algorithms in text classification. Normalization is an important factor to improve performance of a centroid-based classifier when documents in text collection have quite different sizes. In the past, normalization involved with only document- or class-length normalization. In this paper, we propose a new type of normalization called term-length normalization which considers term. distribution in a class. The performance of this normalization is investigated in three environments of a standard centroid-based classifier (TFIDF): (1) without class-length normalization, (2) with cosine class-length normalization and (3) with summing weight normalization. The results suggest that our term-length normalization is useful for improving classification accuracy in all cases.
引用
收藏
页码:850 / 856
页数:7
相关论文
共 50 条
[41]   Support-vector-based iteratively adjusted centroid classifier for text categorization [J].
Wang, Deqing ;
Zhang, Hui .
Beijing Hangkong Hangtian Daxue Xuebao/Journal of Beijing University of Aeronautics and Astronautics, 2013, 39 (02) :269-274
[42]   Large margin DragPushing strategy for centroid text categorization [J].
Tan, Songbo .
EXPERT SYSTEMS WITH APPLICATIONS, 2007, 33 (01) :215-220
[43]   An effective approach to enhance centroid classifier for text categorization [J].
Tan, Songbo ;
Cheng, Xueqi .
KNOWLEDGE DISCOVERY IN DATABASES: PKDD 2007, PROCEEDINGS, 2007, 4702 :581-588
[44]   Quick Induction of NNTrees for Text Categorization Based on Discriminative Multiple Centroid Approach [J].
Hayashi, Hirotomo ;
Zhao, Qiangfu .
2010 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC 2010), 2010,
[45]   CENTROID-BASED TEXTURE CLASSIFICATION USING THE SIRV REPRESENTATION [J].
Schutz, Aurelien ;
Bombrun, Lionel ;
Berthoumieu, Yannick .
2013 20TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP 2013), 2013, :3810-3814
[46]   Using Different Term Weighting Schemes of Centroid-based Classifiers to Classify Drug Monographs [J].
Lertnattee, Verayuth ;
Lueviphan, Chanisara .
PROGRESS IN MECHATRONICS AND INFORMATION TECHNOLOGY, PTS 1 AND 2, 2014, 462-463 :968-973
[47]   Multi-Document Summarization with Centroid-Based Pretraining [J].
Puduppully, Ratish ;
Jain, Parag ;
Chen, Nancy F. ;
Steedman, Mark .
61ST CONFERENCE OF THE THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 2, 2023, :128-138
[48]   Enhanced centroid-based classification technique by filtering outliers [J].
Shin, Kwangcheol ;
Abraham, Ajith ;
Han, SangYong .
TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2006, 4188 :159-163
[49]   A Study on Intrusion Detection Using Centroid-Based Classification [J].
Setiawan, Bambang ;
Djanali, Supeno ;
Ahmad, Tohari .
4TH INFORMATION SYSTEMS INTERNATIONAL CONFERENCE (ISICO 2017), 2017, 124 :672-681
[50]   Centroid-Based Efficient Minimum Bayes Risk Decoding [J].
Deguchi, Hiroyuki ;
Sakai, Yusuke ;
Kamigaito, Hidetaka ;
Watanabe, Taro ;
Tanaka, Hideki ;
Utiyama, Masao .
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, :11009-11018