Towards enhancing centroid classifier for text classification-A border-instance approach

被引:15
作者
Wang, Deqing [2 ]
Wu, Junjie [1 ]
Zhang, Hui [2 ]
Xu, Ke [2 ]
Lin, Mengxiang [2 ]
机构
[1] Beihang Univ, Sch Econ & Management, Beijing Key Lab Emergency Support Simulat Technol, Beijing 100191, Peoples R China
[2] Beihang Univ, State Key Lab Software Dev Environm, Sch Comp Sci & Engn, Beijing 100191, Peoples R China
基金
中国国家自然科学基金;
关键词
Text classification/categorization; Centroid-based classifier; Border instance; Iterative adjustment; Support vector machines (SVMs); SUPPORT VECTOR MACHINES;
D O I
10.1016/j.neucom.2012.08.019
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text classification/categorization (TC) is to assign new unlabeled natural language documents to the predefined thematic categories. Centroid-based classifier (CC) has been widely used for TC because of its simplicity and efficiency. However, it has also been long criticized for its relatively low classification accuracy compared with state-of-the-art classifiers such as support vector machines (SVMs). In this paper, we find that for CC using only border instances rather than all instances to construct centroid vectors can obtain higher generalization accuracy. Along this line, we propose Border-Instance-based Iteratively Adjusted Centroid Classifier (IACC_BI), which relies on the border instances found by some routines, e.g. 1-Nearest-and-1-Furthest-Neighbors strategy, to construct centroid vectors for CC. IACC_BI then iteratively adjusts the initial centroid vectors according to the misclassified training instances. Our extensive experiments on 11 real-world text corpora demonstrate that IACC_BI improves the performance of centroid-based classifiers greatly and obtains classification accuracy competitive to the well-known SVMs, while at significantly lower computational costs. (C) 2012 Elsevier B.V. All rights reserved.
引用
收藏
页码:299 / 308
页数:10
相关论文
共 47 条
  • [1] Support vector machine classification for large data sets via minimum enclosing ball clustering
    Cervantes, Jair
    Li, Xiaoou
    Yu, Wen
    Li, Kang
    [J]. NEUROCOMPUTING, 2008, 71 (4-6) : 611 - 619
  • [2] LIBSVM: A Library for Support Vector Machines
    Chang, Chih-Chung
    Lin, Chih-Jen
    [J]. ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
  • [3] Multiclass support vector classification via coding and regression
    Chen, Pei-Chun
    Lee, Kuang-Yao
    Lee, Tsung-Ju
    Lee, Yuh-Jye
    Huang, Su-Yun
    [J]. NEUROCOMPUTING, 2010, 73 (7-9) : 1501 - 1512
  • [4] SUPPORT-VECTOR NETWORKS
    CORTES, C
    VAPNIK, V
    [J]. MACHINE LEARNING, 1995, 20 (03) : 273 - 297
  • [5] Czarnowski Ireneusz, 2010, International Journal of Intelligent Information and Database Systems, V4, P245, DOI 10.1504/IJIIDS.2010.034082
  • [6] Cluster-based instance selection for machine classification
    Czarnowski, Ireneusz
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2012, 30 (01) : 113 - 133
  • [7] Approximate statistical tests for comparing supervised classification learning algorithms
    Dietterich, TG
    [J]. NEURAL COMPUTATION, 1998, 10 (07) : 1895 - 1923
  • [8] Guan H., 2009, WWW 09 P 18 INT C WO, P201
  • [9] Non-negative Patch Alignment Framework
    Guan, Naiyang
    Tao, Dacheng
    Luo, Zhigang
    Yuan, Bo
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS, 2011, 22 (08): : 1218 - 1230
  • [10] Han Eui-Hong, 2000, P PKDD 00