MGFS: A multi-label graph-based feature selection algorithm via PageRank centrality

被引:84
作者
Hashemi, Amin [1 ]
Dowlatshahi, Mohammad Bagher [1 ]
Nezamabadi-pour, Hossein [2 ]
机构
[1] Lorestan Univ, Fac Engn, Dept Comp Engn, Khorramabad, Iran
[2] Shahid Bahonar Univ Kerman, Dept Elect Engn, Kerman, Iran
关键词
Multi-label feature selection; Correlation distance matrix; Feature-label graph; PageRank centrality; GRAVITATIONAL SEARCH ALGORITHM;
D O I
10.1016/j.eswa.2019.113024
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In multi-label data, each instance corresponds to a set of labels instead of one label whereby the instances belonging to a label in the corresponding column of that label are assigned 1, while instances that do not belong to that label are assigned 0 in the data set. This type of data is usually considered as high-dimensional data, so many methods, using machine learning algorithms, seek to choose the best subset of features for reducing the dimensionality of data and then to create an acceptable model for classification. In this paper, we have designed a fast algorithm for feature selection on the multi-label data using the PageRank algorithm, which is an effective method used to calculate the importance of web pages on the Internet. This algorithm, which is called multi-label graph-based feature selection (MGFS), first constructs an M x L matrix, called Correlation Distance Matrix (CDM), where M is the number of features and L represents the number of class labels. Then, MGFS creates a complete weighted graph, called Feature-Label Graph (FLG), where each feature is considered as a vertex, and the weight between two vertices (or features) represents their Euclidean distance in CDM. Finally, the importance of each graph vertex (or feature) is estimated via the PageRank algorithm. In the proposed method, the number of features can be determined by the user. To prove the performance of the proposed algorithm, we have tested this algorithm with several methods for multi-label feature selection and on several multi-label datasets with different dimensions. The results show the superiority of the proposed method in the classification criteria and run-time. (C) 2019 Elsevier Ltd. All rights reserved.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] [Anonymous], 2012, INT J MACH LEARN COM
  • [2] Application of high-dimensional feature selection: evaluation for genomic prediction in man
    Bermingham, M. L.
    Pong-Wong, R.
    Spiliopoulou, A.
    Hayward, C.
    Rudan, I.
    Campbell, H.
    Wright, A. F.
    Wilson, J. F.
    Agakov, F.
    Navarro, P.
    Haley, C. S.
    [J]. SCIENTIFIC REPORTS, 2015, 5
  • [3] Feature selection in machine learning: A new perspective
    Cai, Jie
    Luo, Jiawei
    Wang, Shulin
    Yang, Sheng
    [J]. NEUROCOMPUTING, 2018, 300 : 70 - 79
  • [4] A survey on feature selection methods
    Chandrashekar, Girish
    Sahin, Ferat
    [J]. COMPUTERS & ELECTRICAL ENGINEERING, 2014, 40 (01) : 16 - 28
  • [5] Lazy Multi-label Learning Algorithms Based on Mutuality Strategies
    Cherman, Everton Alvares
    Spolaor, Newton
    Valverde-Rebaza, Jorge
    Monard, Maria Carolina
    [J]. JOURNAL OF INTELLIGENT & ROBOTIC SYSTEMS, 2015, 80 : S261 - S276
  • [6] Coakley C. W., 2000, J. Amer. Statist. Assoc., V95, P332
  • [7] Semi-supervised relevance index for feature selection
    Coelho, Frederico
    Castro, Cristiano
    Braga, Antonio P.
    Verleysen, Michel
    [J]. NEURAL COMPUTING & APPLICATIONS, 2019, 31 (Suppl 2) : 989 - 997
  • [8] Doquire G, 2011, LECT NOTES COMPUT SC, V6691, P9, DOI 10.1007/978-3-642-21501-8_2
  • [9] Dowlatshahi M. B., 2017, J AI DATA MIN, V5, P169
  • [10] A Novel Three-Stage Filter-Wrapper Framework for miRNA Subset Selection in Cancer Classification
    Dowlatshahi, Mohammad Bagher
    Derhami, Vali
    Nezamabadi-pour, Hossein
    [J]. INFORMATICS-BASEL, 2018, 5 (01):