Margin maximization with feed-forward neural networks:: a comparative study with SVM and AdaBoost

被引:27
作者
Romero, E [1 ]
Màrquez, L [1 ]
Carreras, X [1 ]
机构
[1] Univ Politecn Cataluna, Dept Llenguatges & Sist Informat, ES-08034 Barcelona, Spain
关键词
margin maximization; feed-forward neural networks; support vector machines; AdaBoost; NLP classification problems;
D O I
10.1016/j.neucom.2003.10.011
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feed-forward Neural Networks (FNN) and Support Vector Machines (SVM) are two machine learning frameworks developed from very different starting points of view. In this work a new leaming model for FNN is proposed such that, in the linearly separable case, it tends to obtain the same solution as SVM. The key idea of the model is a weighting of the sum-of-squares error function, which is inspired by the AdaBoost algorithm. As in SVM, the hardness of the margin can be controlled, so that this model cart be also used for the non-linearly separable case. In addition, it is not restricted to the use of kernel functions, and it allows to deal with multiclass and multilabel problems as FNN usually do. Finally, it is independent of the particular algorithm used to minimize the error function. Theoretic and experimental results on synthetic and real-world problems are shown to confirm these claims. Several empirical comparisons among this new model, SVM, and AdaBoost have been made in order to study the agreement between the predictions made by the respective classifiers. Additionally, the results obtained show that similar performance does not imply similar predictions, suggesting that different models can be combined leading to better performance. (C) 2003 Elsevier B.V. All rights reserved.
引用
收藏
页码:313 / 344
页数:32
相关论文
共 42 条
[1]  
[Anonymous], 1998, LECT NOTES COMPUTER, DOI DOI 10.1007/S13928716
[2]  
Bishop C. M., 1996, Neural networks for pattern recognition
[3]  
Bruce RF, 1999, COMPUT LINGUIST, V25, P195
[4]   Guest editors' introduction: Machine learning and natural language [J].
Cardie, C ;
Mooney, RJ .
MACHINE LEARNING, 1999, 34 (1-3) :5-9
[5]  
Chang C.-C., LIB SUPPORT VECTOR M
[6]   A COEFFICIENT OF AGREEMENT FOR NOMINAL SCALES [J].
COHEN, J .
EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT, 1960, 20 (01) :37-46
[7]  
Cristianini N., 2000, Intelligent Data Analysis: An Introduction, DOI 10.1017/CBO9780511801389
[8]   A DISTANCE-BASED ATTRIBUTE SELECTION MEASURE FOR DECISION TREE INDUCTION [J].
DEMANTARAS, RL .
MACHINE LEARNING, 1991, 6 (01) :81-92
[9]   An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization [J].
Dietterich, TG .
MACHINE LEARNING, 2000, 40 (02) :139-157
[10]  
Escudero G, 2000, LECT NOTES ARTIF INT, V1810, P129