Multi-objective Cuckoo Search-based Streaming Feature Selection for Multi-label Dataset

被引:18
作者
Paul, Dipanjyoti [1 ]
Kumar, Rahul [2 ]
Saha, Sriparna [3 ]
Mathew, Jimson [3 ]
机构
[1] Indian Inst Technol Patna, Dept Comp Sci & Engn, Patna 801106, Bihar, India
[2] Indian Inst Technol Patna, Elect Engn, Patna 801106, Bihar, India
[3] Indian Inst Technol Patna, Comp Sci & Engn, Patna 801106, Bihar, India
关键词
Pareto optimal front; label exploitation; ONLINE FEATURE-SELECTION;
D O I
10.1145/3447586
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The feature selection method is the process of selecting only relevant features by removing irrelevant or redundant features amongst the large number of features that are used to represent data. Nowadays, many application domains especially social media networks, generate new features continuously at different time stamps. In such a scenario, when the features are arriving in an online fashion, to cope up with the continuous arrival of features, the selection task must also have to be a continuous process. Therefore, the streaming feature selection based approach has to be incorporated, i.e., every time a new feature or a group of features arrives, the feature selection process has to be invoked. Again, in recent years, there are many application domains that generate data where samples may belong to more than one classes called multi-label dataset. The multiple labels that the instances are being associated with, may have some dependencies amongst themselves. Finding the co-relation amongst the class labels helps to select the discriminative features across multiple labels. In this article, we develop streaming feature selection methods for multi-label data where the multiple labels are reduced to a lower-dimensional space. The similar labels are grouped together before performing the selection method to improve the selection quality and to make the model time efficient. The multi-objective version of the cuckoo search-based approach is used to select the optimal feature set. The proposed method develops two versions of the streaming feature selection method: (1) when the features arrive individually and (2) when the features arrive in the form of a batch. Various multi-label datasets from various domains such as text, biology, and audio have been used to test the developed streaming feature selection methods. The proposed methods are compared with many previous feature selection methods and from the comparison, the superiority of using multiple objectives and label co-relation in the feature selection process can be established.
引用
收藏
页数:24
相关论文
共 49 条
[1]  
AlNuaimi Noura., 2019, Applied Computing and Informatics
[2]  
[Anonymous], 2013, ARTIF INTELL, DOI [10.1007/978-3-642-29694-9-17, DOI 10.1007/978-3-642-29694-9-17, DOI 10.1007/978-3-642-29694-9_17]
[3]  
Bertsekas D. P, 2006, NONLINEAR PROGRAMMIN
[4]  
Bi W., 2011, P 28 INT C MACH LEAR, P17
[5]   LAIM discretization for multi-label data [J].
Cano, Alberto ;
Maria Luna, Jose ;
Gibaja, Eva L. ;
Ventura, Sebastian .
INFORMATION SCIENCES, 2016, 330 :370-384
[6]   Deep Learning for Aspect-Based Sentiment Analysis: A Comparative Review [J].
Do, Hai Ha ;
Prasad, P. W. C. ;
Maag, Angelika ;
Alsadoon, Abeer .
EXPERT SYSTEMS WITH APPLICATIONS, 2019, 118 :272-299
[7]  
Dorigo M., 1999, Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406), P1470, DOI 10.1109/CEC.1999.782657
[8]   MULTIPLE COMPARISONS AMONG MEANS [J].
DUNN, OJ .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1961, 56 (293) :52-&
[9]  
Elisseeff A, 2002, ADV NEUR IN, V14, P681
[10]   Online streaming feature selection using rough sets [J].
Eskandari, S. ;
Javidi, M. M. .
INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2016, 69 :35-57