Clustering Inside Classes Improves Performance of Linear Classifiers

被引:5
作者
Fradkin, Dmitriy [1 ]
机构
[1] Siemens Corp Res, Princeton, NJ 08540 USA
来源
20TH IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, VOL 2, PROCEEDINGS | 2008年
关键词
D O I
10.1109/ICTAI.2008.29
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This work systematically examines a Clustering Inside Classes (CIC) approach to classification. In CIC, each class is partitioned into subclasses based on cluster analysis. We find that CIC, by extracting local structure and producing compact subclasses, can improve performance of linear classifiers such as SVM and logistic regression. It is compared against a global classifier on four benchmark datasets. We empirically analyze effects of the training set size and the number of clusters per class on the results of the CIC approach. We also examine use of an automated method for selecting the number of clusters for each class.
引用
收藏
页码:439 / 442
页数:4
相关论文
共 15 条
[1]  
Blake C.L., 1998, UCI repository of machine learning databases
[2]   LIBSVM: A Library for Support Vector Machines [J].
Chang, Chih-Chung ;
Lin, Chih-Jen .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
[3]  
FORGY EW, 1965, BIOMETRICS, V21, P768
[4]  
Fradkin D., 2006, THESIS STATE U NEW J
[5]   Model-based clustering, discriminant analysis, and density estimation [J].
Fraley, C ;
Raftery, AE .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2002, 97 (458) :611-631
[6]   Mercer kernel-based clustering in feature space [J].
Girolami, M .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2002, 13 (03) :780-784
[7]  
JAPKOWICZ N, 2002, P IASTED INT C ART I, P321
[8]  
JUNJIE PW, 2007, P KDD 2007, P814
[9]  
LLOYD SP, 1982, IEEE T INFORM THEORY, V28, P129, DOI 10.1109/TIT.1982.1056489
[10]  
MADIGAN D, 2005, P JOINT ANN M CLASS