Extreme entropy machines: robust information theoretic classification

被引:0
作者
Wojciech Marian Czarnecki
Jacek Tabor
机构
[1] Jagiellonian University,Faculty of Mathematics and Computer Science
来源
Pattern Analysis and Applications | 2017年 / 20卷
关键词
Rapid learning; Extreme learning machines; Classification; Random projections; Entropy;
D O I
暂无
中图分类号
学科分类号
摘要
Most existing classification methods are aimed at minimization of empirical risk (through some simple point-based error measured with loss function) with added regularization. We propose to approach the classification problem by applying entropy measures as a model objective function. We focus on quadratic Renyi’s entropy and connected Cauchy–Schwarz Divergence which leads to the construction of extreme entropy machines (EEM). The main contribution of this paper is proposing a model based on the information theoretic concepts which on the one hand shows new, entropic perspective on known linear classifiers and on the other leads to a construction of very robust method competitive with the state of the art non-information theoretic ones (including Support Vector Machines and Extreme Learning Machines). Evaluation on numerous problems spanning from small, simple ones from UCI repository to the large (hundreds of thousands of samples) extremely unbalanced (up to 100:1 classes’ ratios) datasets shows wide applicability of the EEM in real-life problems. Furthermore, it scales better than all considered competitive methods.
引用
收藏
页码:383 / 400
页数:17
相关论文
共 54 条
[1]  
Chang CC(2011)Libsvm: a library for support vector machines ACM Trans Intell Syst Technol 2 27-297
[2]  
Lin CJ(1995)Support-vector networks Mach Learn 20 273-5606
[3]  
Cortes C(2014)Multithreshold Entropy Linear Classifier: Theory and applications Expert Syst Appl 42 5591-2175
[4]  
Vapnik V(2005)On the Nyström method for approximating a Gram matrix for improved kernel-based learning J Mach Learn Res 6 2153-501
[5]  
Czarnecki WM(2006)Extreme learning machine: theory and applications Neurocomputing 70 489-629
[6]  
Tabor J(2006)The Cauchy–Schwarz divergence and parzen windowing: connections to graph theory and mercer kernels J Frankl Inst 343 614-2206
[7]  
Drineas P(1998)Learning pattern classification-a survey IEEE Trans Inf Theory 44 2178-411
[8]  
Mahoney MW(2004)A well-conditioned estimator for large-dimensional covariance matrices J Multivar Anal 88 365-55
[9]  
Huang GB(1936)On the generalized distance in statistics Proc Natl Inst Sci (Calcutta) 2 49-1076
[10]  
Zhu QY(1962)On estimation of a probability density function and mode Ann Math Stat 33 1065-2830