Active learning of constraints for weighted feature selection

被引:3
作者
Hijazi, Samah [1 ]
Hamad, Denis [1 ]
Kalakech, Mariam [2 ]
Kalakech, Ali [2 ]
机构
[1] Univ Littoral Cote dOpale, Lab LISIC EA 4491, F-62228 Calais, France
[2] Lebanese Univ, Dept Management Informat Syst, Hadath, Lebanon
关键词
Feature selection; Active learning; Pairwise constraint selection; Constraint propagation; Graph Laplacian; Uncertainty reduction; Matrix perturbation; SUPERVISED FEATURE-SELECTION; MUTUAL INFORMATION; CLASSIFICATION; EFFICIENT; TEXTURE; RELEVANCE; SCORE;
D O I
10.1007/s11634-020-00408-5
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Pairwise constraints, a cheaper kind of supervision information that does not need to reveal the class labels of data points, were initially suggested to enhance the performance of clustering algorithms. Recently, researchers were interested in using them for feature selection. However, in most current methods, pairwise constraints are provided passively and generated randomly over multiple algorithmic runs by which the results are averaged. This leads to the need of a large number of constraints that might be redundant, unnecessary, and under some circumstances even inimical to the algorithm's performance. It also masks the individual effect of each constraint set and introduces a human labor-cost burden. Therefore, in this paper, we suggest a framework for actively selecting and then propagating constraints for feature selection. For that, we benefit from the graph Laplacian that is defined on the similarity matrix. We assume that when a small perturbation of the similarity value between a data couple leads to a more well-separated cluster indicator based on the second eigenvector of the graph Laplacian, this couple is definitely expected to be a pairwise query of higher and more significant impact. Constraints propagation on the other side ensures increasing supervision information while decreasing the cost of human-labor. Finally, experimental results validated our proposal in comparison to other known feature selection methods and proved to be prominent.
引用
收藏
页码:337 / 377
页数:41
相关论文
共 54 条
[1]   Active selection of clustering constraints: a sequential approach [J].
Abin, Ahmad Ali ;
Beigy, Hamid .
PATTERN RECOGNITION, 2014, 47 (03) :1443-1458
[2]   Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays [J].
Alon, U ;
Barkai, N ;
Notterman, DA ;
Gish, K ;
Ybarra, S ;
Mack, D ;
Levine, AJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) :6745-6750
[3]  
[Anonymous], 2008, IEEE 19 INT C PATT R
[4]  
[Anonymous], ARXIV171108421
[5]  
Basu S, 2004, SIAM PROC S, P333
[6]   Ensemble constrained Laplacian score for efficient and robust semi-supervised feature selection [J].
Benabdeslem, Khalid ;
Elghazel, Haytham ;
Hindawi, Mohammed .
KNOWLEDGE AND INFORMATION SYSTEMS, 2016, 49 (03) :1161-1185
[7]   Efficient Semi-Supervised Feature Selection: Constraint, Relevance, and Redundancy [J].
Benabdeslem, Khalid ;
Hindawi, Mohammed .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (05) :1131-1143
[8]  
Bishop C.M., 1995, Neural Networks for Pattern Recognition (Advanced Texts inEconometrics(Paperback)): Bishop, DOI DOI 10.1201/9781420050646.PTB6
[9]  
Davidson I, 2006, LECT NOTES ARTIF INT, V4213, P115
[10]  
Gilad-Bachrach R., 2004, 21 INT C MACH LEARN, P43, DOI [10.1145/1015330.1015352, DOI 10.1145/1015330.1015352]