A Recursive Regularization Based Feature Selection Framework for Hierarchical Classification

被引:58
作者
Zhao, Hong [1 ,2 ]
Hu, Qinghua [3 ,4 ]
Zhu, Pengfei [3 ,4 ]
Wang, Yu [3 ,4 ]
Wang, Ping [5 ]
机构
[1] Tianjin Univ, Coll Intelligence & Comp, Tianjin Key Lab Machine Learning, Tianjin 300354, Peoples R China
[2] Minnan Normal Univ, Fujian Key Lab Granular Comp & Applicat, Zhangzhou 363000, Peoples R China
[3] Tianjin Univ, Coll Intelligence & Comp, Tianjin 300354, Peoples R China
[4] Tianjin Key Lab Machine Learning, Tianjin 300354, Peoples R China
[5] Tianjin Univ, Sch Math, Sch Comp Software, Tianjin 300354, Peoples R China
基金
中国国家自然科学基金;
关键词
Feature selection; hierarchical classification; recursive regularization; semantic hyponymy;
D O I
10.1109/TKDE.2019.2960251
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The sizes of datasets in terms of the number of samples, features, and classes have dramatically increased in recent years. In particular, there usually exists a hierarchical structure among class labels as hundreds of classes exist in a classification task. We call these tasks hierarchical classification, and hierarchical structures are helpful for dividing a very large task into a collection of relatively small subtasks. Various algorithms have been developed to select informative features for flat classification. However, these algorithms ignore the semantic hyponymy in the directory of hierarchical classes, and select a uniform subset of the features for all classes. In this paper, we propose a new feature selection framework with recursive regularization for hierarchical classification. This framework takes the hierarchical information of the class structure into account. In contrast to flat feature selection, we select different feature subsets for each node in a hierarchical tree structure with recursive regularization. The proposed framework uses parent-child, sibling, and family relationships for hierarchical regularization. By imposing l(2,1)-norm regularization to different parts of the hierarchical classes, we can learn a sparse matrix for the feature ranking at each node. Extensive experiments on public datasets demonstrate the effectiveness and efficiency of the proposed algorithms.
引用
收藏
页码:2833 / 2846
页数:14
相关论文
共 52 条
[1]  
[Anonymous], P 2008 IEEE C COMP V
[2]  
[Anonymous], 2014, Comput. Sci.
[3]  
[Anonymous], 2007, NIPS
[4]  
[Anonymous], 2011, ADV NEURAL INFORM PR
[5]  
[Anonymous], 2000, Pattern Classification, DOI DOI 10.1007/978-3-319-57027-3_4
[6]  
[Anonymous], 2010, P 23 INT C NEURAL IN
[7]  
Babbar R, 2016, J MACH LEARN RES, V17
[8]   Kernel independent component analysis [J].
Bach, FR ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (01) :1-48
[9]  
Bengio S., 2010, P ADV NEURAL INFORM, P163
[10]  
Cai D., 2007, P 16 ACM C INF KNOWL, P741