Feature and instance reduction for PNN classifiers based on fuzzy rough sets

被引:34
作者
Tsang, Eric C. C. [1 ]
Hu, Qinghua [2 ]
Chen, Degang [3 ]
机构
[1] Macau Univ Sci & Technol, Fac Informat Technol, Macau, Peoples R China
[2] Tianjin Univ, Sch Comp Sci & Technol, Tianjin 300072, Peoples R China
[3] North China Elect Power Univ, Sch Math & Phys, Beijing, Peoples R China
关键词
Nearest neighbor rule; Feature reduction; Instance reduction; Fuzzy rough set; NEAREST-NEIGHBOR SEARCH; SELECTION; MODEL; ALGORITHMS; GENERATION; RULE;
D O I
10.1007/s13042-014-0232-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Instance reduction for K-nearest-neighbor classification rules (KNN) has attracted much attention these years, and most of the existing approaches lose the semantics of probability of original data. In this work, we propose a new reduced KNN rule, called FAIR-KNN, to perform feature and instance reduction based on fuzzy rough set theory. First, we use fuzzy rough sets to evaluate candidate features and select the most informative ones. The algorithm of feature selection returns the selected features and the membership values of samples to the lower approximations of their classes. These values reflect the distances of the samples to classification boundary and are used to compute probabilities of samples to be subsampled. Then we introduce a weighted Parzen window technique to estimate the probability from the weighted subsampled data. Thus we can not only reduce features and samples in original data, but also do not lose the semantics of probability. Finally, the memberships of samples to lower and upper approximations of decisions are interpreted as certainty and possibility degrees of samples belonging to the corresponding decisions, respectively. So the weighted averages with probability of the memberships of samples to lower and upper approximations are outputted as the certainty and possibility degrees of unseen samples belonging to some decisions, which enrich the semantics of KNN. Numerical experiments on artificial and real-world data validate the effectiveness of the proposed technique.
引用
收藏
页码:1 / 11
页数:11
相关论文
共 59 条
[31]   An empirical analysis of the probabilistic K-nearest neighbour classifier [J].
Manocha, S. ;
Girolami, M. A. .
PATTERN RECOGNITION LETTERS, 2007, 28 (13) :1818-1824
[32]   A fast nearest-neighbor algorithm based on a principal axis search tree [J].
McNames, J .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2001, 23 (09) :964-976
[33]   An axiomatic characterization of a fuzzy generalization of rough sets [J].
Mi, JS ;
Zhang, WX .
INFORMATION SCIENCES, 2004, 160 (1-4) :235-249
[34]   Rough-fuzzy collaborative clustering [J].
Mitra, Sushmita ;
Banka, Haider ;
Pedrycz, Witold .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2006, 36 (04) :795-805
[35]   Axiomatics for fuzzy rough sets [J].
Morsi, NN ;
Yakout, MM .
FUZZY SETS AND SYSTEMS, 1998, 100 (1-3) :327-342
[36]  
Moser B, 2006, J MACH LEARN RES, V7, P2603
[37]   Maxdiff kd-trees for data condensation [J].
Narayan, BL ;
Murthy, CA ;
Pal, SK .
PATTERN RECOGNITION LETTERS, 2006, 27 (03) :187-200
[38]   Case generation using rough sets with fuzzy representation [J].
Pal, SK ;
Mitra, P .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2004, 16 (03) :292-300
[39]   Learning weighted metrics to minimize nearest-neighbor classification error [J].
Paredes, R ;
Vidal, E .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2006, 28 (07) :1100-1110
[40]   ROUGH SETS [J].
PAWLAK, Z .
INTERNATIONAL JOURNAL OF COMPUTER & INFORMATION SCIENCES, 1982, 11 (05) :341-356