Bidirectional discrimination with application to data visualization

被引:10
作者
Huang, Hanwen [1 ]
Liu, Yufeng [2 ]
Marron, J. S. [2 ]
机构
[1] Univ Texas Hlth Sci Ctr Houston, Ctr Clin & Translat Sci, Houston, TX 77030 USA
[2] Univ N Carolina, Dept Stat & Operat Res, Chapel Hill, NC 27599 USA
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
Asymptotics; Classification; High-dimensional data; Initial value; Iteration; Optimization; Visualization; HIGH-DIMENSION; GEOMETRIC REPRESENTATION;
D O I
10.1093/biomet/ass029
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Linear classifiers are very popular, but can have limitations when classes have distinct subpopulations. General nonlinear kernel classifiers are very flexible, but do not give clear interpretations and may not be efficient in high dimensions. We propose the bidirectional discrimination classification method, which generalizes linear classifiers to two or more hyperplanes. This new family of classification methods gives much of the flexibility of a general nonlinear classifier while maintaining the interpretability, and much of the parsimony, of linear classifiers. They provide a new visualization tool for high-dimensional, low-sample-size data. Although the idea is generally applicable, we focus on the generalization of the support vector machine and distance-weighted discrimination methods. The performance and usefulness of the proposed method are assessed using asymptotics and demonstrated through analysis of simulated and real data. Our method leads to better classification performance in high-dimensional situations where subclusters are present in the data.
引用
收藏
页码:851 / 864
页数:14
相关论文
共 19 条
[1]   The high-dimension, low-sample-size geometric representation holds under mild conditions [J].
Ahn, Jeongyoun ;
Marron, J. S. ;
Muller, Keith M. ;
Chi, Yueh-Yun .
BIOMETRIKA, 2007, 94 (03) :760-766
[2]  
Aizerman M. A., 1964, Automation and Remote Control, V25, P821
[3]   Second-order cone programming [J].
Alizadeh, F ;
Goldfarb, D .
MATHEMATICAL PROGRAMMING, 2003, 95 (01) :3-51
[4]  
Boser B. E., 1992, Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory, P144, DOI 10.1145/130385.130401
[5]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[6]   A tutorial on Support Vector Machines for pattern recognition [J].
Burges, CJC .
DATA MINING AND KNOWLEDGE DISCOVERY, 1998, 2 (02) :121-167
[7]  
Cherkassky V, 1997, IEEE Trans Neural Netw, V8, P1564, DOI 10.1109/TNN.1997.641482
[8]   SUPPORT-VECTOR NETWORKS [J].
CORTES, C ;
VAPNIK, V .
MACHINE LEARNING, 1995, 20 (03) :273-297
[9]  
Cristianini Nello, 2000, An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods, DOI DOI 10.1017/CB09780511801389
[10]   The use of multiple measurements in taxonomic problems [J].
Fisher, RA .
ANNALS OF EUGENICS, 1936, 7 :179-188