MGFS: A multi-label graph-based feature selection algorithm via PageRank centrality

被引：92

作者：

Hashemi, Amin ^{[1
]}

Dowlatshahi, Mohammad Bagher ^{[1
]}

Nezamabadi-pour, Hossein ^{[2
]}

机构：

[1] Lorestan Univ, Fac Engn, Dept Comp Engn, Khorramabad, Iran

[2] Shahid Bahonar Univ Kerman, Dept Elect Engn, Kerman, Iran

来源：

EXPERT SYSTEMS WITH APPLICATIONS | 2020年 / 142卷

关键词：

Multi-label feature selection; Correlation distance matrix; Feature-label graph; PageRank centrality; GRAVITATIONAL SEARCH ALGORITHM;

D O I：

10.1016/j.eswa.2019.113024

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In multi-label data, each instance corresponds to a set of labels instead of one label whereby the instances belonging to a label in the corresponding column of that label are assigned 1, while instances that do not belong to that label are assigned 0 in the data set. This type of data is usually considered as high-dimensional data, so many methods, using machine learning algorithms, seek to choose the best subset of features for reducing the dimensionality of data and then to create an acceptable model for classification. In this paper, we have designed a fast algorithm for feature selection on the multi-label data using the PageRank algorithm, which is an effective method used to calculate the importance of web pages on the Internet. This algorithm, which is called multi-label graph-based feature selection (MGFS), first constructs an M x L matrix, called Correlation Distance Matrix (CDM), where M is the number of features and L represents the number of class labels. Then, MGFS creates a complete weighted graph, called Feature-Label Graph (FLG), where each feature is considered as a vertex, and the weight between two vertices (or features) represents their Euclidean distance in CDM. Finally, the importance of each graph vertex (or feature) is estimated via the PageRank algorithm. In the proposed method, the number of features can be determined by the user. To prove the performance of the proposed algorithm, we have tested this algorithm with several methods for multi-label feature selection and on several multi-label datasets with different dimensions. The results show the superiority of the proposed method in the classification criteria and run-time. (C) 2019 Elsevier Ltd. All rights reserved.

引用

页数：14

共 50 条

[1]

[Anonymous], 2012, INT J MACH LEARN COM

[2] Application of high-dimensional feature selection: evaluation for genomic prediction in man [J].

Bermingham, M. L. ;

Pong-Wong, R. ;

Spiliopoulou, A. ;

Hayward, C. ;

Rudan, I. ;

Campbell, H. ;

Wright, A. F. ;

Wilson, J. F. ;

Agakov, F. ;

Navarro, P. ;

Haley, C. S. .

SCIENTIFIC REPORTS, 2015, 5

[3] Feature selection in machine learning: A new perspective [J].

Cai, Jie ;

Luo, Jiawei ;

Wang, Shulin ;

Yang, Sheng .

NEUROCOMPUTING, 2018, 300 :70-79

[4] A survey on feature selection methods [J].

Chandrashekar, Girish ;

Sahin, Ferat .

COMPUTERS & ELECTRICAL ENGINEERING, 2014, 40 (01) :16-28

[5] Lazy Multi-label Learning Algorithms Based on Mutuality Strategies [J].

Cherman, Everton Alvares ;

Spolaor, Newton ;

Valverde-Rebaza, Jorge ;

Monard, Maria Carolina .

JOURNAL OF INTELLIGENT & ROBOTIC SYSTEMS, 2015, 80 :S261-S276

[6]

Coakley C.W., 2000, J. Amer. Statist. Assoc., V95, P332

[7] Semi-supervised relevance index for feature selection [J].

Coelho, Frederico ;

Castro, Cristiano ;

Braga, Antonio P. ;

Verleysen, Michel .

NEURAL COMPUTING & APPLICATIONS, 2019, 31 (Suppl 2) :989-997

[8]

Doquire G, 2011, LECT NOTES COMPUT SC, V6691, P9, DOI 10.1007/978-3-642-21501-8_2

[9]

Dowlatshahi M.B., 2017, J AI DATA MIN, V5, P169, DOI 10.22044/JADM.2017.880

[10] A Novel Three-Stage Filter-Wrapper Framework for miRNA Subset Selection in Cancer Classification [J].

Dowlatshahi, Mohammad Bagher ;

Derhami, Vali ;

Nezamabadi-pour, Hossein .

INFORMATICS-BASEL, 2018, 5 (01)

← 1 2 3 4 5 →