Oriented principal component analysis for large margin classifiers

被引:49
作者
Bermejo, S [1 ]
Cabestany, J [1 ]
机构
[1] Univ Politecn Cataluna, Dept Elect Engn, ES-08034 Barcelona, Spain
关键词
large margin classifiers; oriented principal component analysis; co-operative learning; principal component neural networks; learning-to-learn algorithms; feature extraction; online gradient descent; pattern recognition;
D O I
10.1016/S0893-6080(01)00106-X
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large margin classifiers (such as MLPs) are designed to assign training samples with high confidence (or margin) to one of the classes. Recent theoretical results of these systems show why the use of regularisation terms and feature extractor techniques can enhance their generalisation properties. Since the optimal subset of features selected depends on the classification problem, but also on the particular classifier with which they are used, global learning algorithms for large margin classifiers that use feature extractor techniques are desired. A direct approach is to optimise a cost function based on the margin error, which also incorporates regularisation terms for controlling capacity. These terms must penalise a classifier with the largest margin for the problem at hand. Our work shows that the inclusion of a PCA term can be employed for this purpose. Since PCA only achieves an optimal discriminatory projection for some particular distribution of data, the margin of the classifier can then be effectively controlled. We also propose a simple constrained search for the global algorithm in which the feature extractor and the classifier are trained separately. This allows a degree of flexibility for including heuristics that can enhance the search and the performance of the computed solution. Experimental results demonstrate the potential of the proposed method. (C) 2001 Elsevier Science Ltd. All rights reserved.
引用
收藏
页码:1447 / 1461
页数:15
相关论文
共 44 条
[1]   The sample complexity of pattern classification with neural networks: The size of the weights is more important than the size of the network [J].
Bartlett, PL .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1998, 44 (02) :525-536
[2]   Adaptive soft k-nearest-neighbor classifiers [J].
Bermejo, S ;
Cabestany, J .
PATTERN RECOGNITION, 1999, 32 (12) :2077-2079
[3]   Finite-sample convergence properties of the LVQ1 algorithm and the batch LVQ1 algorithm [J].
Bermejo, S ;
Cabestany, J .
NEURAL PROCESSING LETTERS, 2001, 13 (02) :135-157
[4]  
BERMEJO S, 2001, IN PRESS PATTERN REC, V33
[5]  
Bishop C. M., 1995, NEURAL NETWORKS PATT
[6]   LOCAL LEARNING ALGORITHMS [J].
BOTTOU, L ;
VAPNIK, V .
NEURAL COMPUTATION, 1992, 4 (06) :888-900
[7]  
BOTTOU L, 1991, ADV NEURAL INFORMATI, V3
[8]  
BOTTOU L, 1996, ADV NEURAL INFORMATI, V8
[9]  
CHOE Y, 1996, ADV NEURAL INFORMATI, V8
[10]  
CORTES C, 1995, THESIS U ROCHESTER N