Feature extraction using information-theoretic learning

被引:99
作者
Hild, Kenneth E., II
Erdogmus, Deniz
Torkkola, Kari
Principe, Jose C.
机构
[1] Univ Calif San Francisco, Biomagnet Imaging Lab, San Francisco, CA 94122 USA
[2] Oregon Hlth & Sci Univ, Dept Comp Sci & Engn, OGI Sch Sci & Engn, Beaverton, OR 97006 USA
[3] Oregon Hlth & Sci Univ, Dept Biomed Engn, OGI Sch Sci & Engn, Beaverton, OR 97006 USA
[4] Motorola Labs, Ctr Human Interact Res, Tempe, AZ 85282 USA
[5] Univ Florida, Computat NeuroEngn Lab, Gainesville, FL 32611 USA
基金
美国国家科学基金会;
关键词
feature extraction; information theory; classification; nonparametric statistics;
D O I
10.1109/TPAMI.2006.186
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A classification system typically consists of both a feature extractor ( preprocessor) and a classifier. These two components can be trained either independently or simultaneously. The former option has an implementation advantage since the extractor need only be trained once for use with any classifier, whereas the latter has an advantage since it can be used to minimize classification error directly. Certain criteria, such as Minimum Classification Error, are better suited for simultaneous training, whereas other criteria, such as Mutual Information, are amenable for training the feature extractor either independently or simultaneously. Herein, an information-theoretic criterion is introduced and is evaluated for training the extractor independently of the classifier. The proposed method uses nonparametric estimation of Renyi's entropy to train the extractor by maximizing an approximation of the mutual information between the class labels and the output of the feature extractor. The evaluations show that the proposed method, even though it uses independent training, performs at least as well as three feature extraction methods that train the extractor and classifier simultaneously.
引用
收藏
页码:1385 / 1392
页数:8
相关论文
共 38 条
[1]  
[Anonymous], 1995, PATTERN RECOGNITION
[2]   USING MUTUAL INFORMATION FOR SELECTING FEATURES IN SUPERVISED NEURAL-NET LEARNING [J].
BATTITI, R .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1994, 5 (04) :537-550
[3]  
Beirlant J, 1997, INT J MATH STAT SCI, V6, P17
[4]   Pattern recognition using discriminative feature extraction [J].
Biem, A ;
Katagiri, S ;
Juang, BH .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1997, 45 (02) :500-504
[5]  
Biem A., 1993, P 1993 IEEE WORKSH N, P392
[6]  
Bishop C. M., 1996, Neural networks for pattern recognition
[7]  
Bollacker KD, 1996, IEEE IJCNN, P1528, DOI 10.1109/ICNN.1996.549127
[8]  
Cover TM, 2006, Elements of Information Theory
[9]   Lower and upper bounds for misclassification probability based on Renyi's information [J].
Erdogmus, D ;
Principe, JC .
JOURNAL OF VLSI SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2004, 37 (2-3) :305-317
[10]   Online entropy manipulation: Stochastic information gradient [J].
Erdogmus, D ;
Hild, KE ;
Principe, JC .
IEEE SIGNAL PROCESSING LETTERS, 2003, 10 (08) :242-245