MI-MCF: A Mutual Information-Based Multilabel Causal Feature Selection

被引:0
作者
Ma, Lin [1 ]
Hu, Liang [1 ]
Li, Yonghao [2 ]
Ding, Weiping [3 ,4 ]
Gao, Wanfu [1 ]
机构
[1] Jilin Univ, Dept Comp Sci & Technol, Changchun 130012, Jilin, Peoples R China
[2] Southwestern Univ Finance & Econ, Sch Comp & Artificial Intelligence, Chengdu 611130, Sichuan, Peoples R China
[3] Nantong Univ, Sch Artificial Intelligence & Comp Sci, Nantong 226019, Peoples R China
[4] City Univ Macau, Fac Data Sci, Taipa, Peoples R China
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
Feature extraction; Correlation; Learning systems; Redundancy; Optimization; Mutual information; Manifolds; Electronic mail; Computational complexity; Accuracy; Causal feature selection; Markov Blanket (MB); multilabel learning; mutual information (MI); LABEL FEATURE-SELECTION;
D O I
10.1109/TNNLS.2025.3556128
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multilabel causal feature selection has attracted extensive attention in recent years. Current multilabel causal feature selection algorithms typically employ existing Markov Blanket (MB) search methods for the initial construction of the MB, followed by further optimization. These methods generally treat labels and features as equally weighted nodes during the MB construction process. However, the search for spouse sets often involves extensive conditional independence (CI) tests, which are time-consuming. Furthermore, they fail to consider the distinct contributions of labels and features to the target nodes. Information theory is often used to evaluate the contributions of nodes. Inspired by this, we carry out a theoretical investigation into the causal relationships within multilabel datasets and propose the mutual information-based multilabel causal feature selection (MI-MCF) method. First, MI-MCF employs MI and conditional MI (CMI) instead of CI test when constructing the MB of labels without incurring significant time overhead. Then, MI-MCF uses MI to compare the contributions of features and labels to the target nodes. This helps identify which nodes should be retained when recovering features hindered by strong label correlation. Finally, MI-MCF eliminates spurious nodes through a symmetry check. Experiments on real-world datasets demonstrate that MI-MCF can autonomously determine the optimal number of selected features and consistently outperform compared methods. The code is available at https://github.com/malinjlu/MI-MCF.
引用
收藏
页码:9864 / 9878
页数:15
相关论文
共 47 条
[1]  
Aliferis C F, 2003, AMIA Annu Symp Proc, P21
[2]  
[Anonymous], 2016, IJCAI
[3]   Learning multi-label scene classification [J].
Boutell, MR ;
Luo, JB ;
Shen, XP ;
Brown, CM .
PATTERN RECOGNITION, 2004, 37 (09) :1757-1771
[4]   Multi-Label Feature Selection using Correlation Information [J].
Braytee, Ali ;
Liu, Wei ;
Catchpoole, Daniel R. ;
Kennedy, Paul J. .
CIKM'17: PROCEEDINGS OF THE 2017 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2017, :1649-1656
[5]   Multi-label feature selection via feature manifold learning and sparsity regularization [J].
Cai, Zhiling ;
Zhu, William .
INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2018, 9 (08) :1321-1334
[6]   Multi-label feature selection by strongly relevant label gain and label mutual aid [J].
Dai, Jianhua ;
Huang, Weiyi ;
Zhang, Chucai ;
Liu, Jie .
PATTERN RECOGNITION, 2024, 145
[7]  
Demsar J, 2006, J MACH LEARN RES, V7, P1
[8]   MULTIPLE COMPARISONS AMONG MEANS [J].
DUNN, OJ .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1961, 56 (293) :52-&
[9]   Multi-label feature selection with global and local label correlation [J].
Faraji, Mohammad ;
Seyedi, Seyed Amjad ;
Tab, Fardin Akhlaghian ;
Mahmoodi, Reza .
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 246
[10]   Using Bayesian networks to analyze expression data [J].
Friedman, N ;
Linial, M ;
Nachman, I ;
Pe'er, D .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2000, 7 (3-4) :601-620