Semi-supervised partially labeled heterogeneous feature selection based on information-theoretic three-way decision model

被引:0
作者
Sun, Qianqian [1 ]
Zhang, Hongying [1 ]
Ding, Weiping [1 ,2 ]
机构
[1] Xi An Jiao Tong Univ, Sch Math & Stat, Xian 710049, Peoples R China
[2] Nantong Univ, Sch Informat Sci & Technol, Nantong 226019, Jiangsu, Peoples R China
基金
中国国家自然科学基金;
关键词
Feature selection; Neighborhood rough set; Unlabeled sample selection; Three-way decision; Entropy;
D O I
10.1016/j.asoc.2025.112880
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature selection plays an increasingly vital role in addressing large-scale partially labeled heterogeneous data. Three-way decision (TWD) theory is an important extension of classical two-way decision, which provides an approach to acquire a ternary classification of the universe as acceptance region, rejection region and boundary region, respectively, while the boundary region can capture the uncertain information. In this paper, taking consideration of heterogeneous data possessing tremendous unlabeled samples, we present two kinds of feature representation metric based on unlabeled sample selection mechanism to construct more effective feature selection models. Specifically, a generalized variable-precision neighborhood rough set model is first proposed based on a TWD model developed by optimal threshold pair, which describes the relationships between features and labels from a more fine-grained level. Second, a unlabeled sample selection framework is proposed to comprehensively measure the importance of unlabeled samples based on their uncertainty, graph density and label transfer ability. We then define six TWD-based measures which reveal nonlinear correlation and inconsistency between features and labels by extended information entropy and complementary entropy, respectively. Furthermore, the unified feature measures are established to boost global feature selection in partially labeled heterogeneous datasets. Finally, the corresponding feature selection algorithm is designed, and the comparative experiments demonstrate the effectiveness and efficiency.
引用
收藏
页数:19
相关论文
共 50 条
[1]   Analyzing the Deep Learning Techniques Based on Three Way Decision Under Double Hierarchy Linguistic Information and Application [J].
Abdullah, Saleem ;
Ullah, Ihsan ;
Khan, Faisal .
IEEE ACCESS, 2024, 12 :85880-85893
[2]   A New Approach to Artificial Intelligent Based Three-Way Decision Making and Analyzing S-Box Image Encryption Using TOPSIS Method [J].
Abdullah, Saleem ;
Almagrabi, Alaa O. ;
Ullah, Ihsan .
MATHEMATICS, 2023, 11 (06)
[3]   Fuzzy Entropy Based Max-Relevancy and Min-Redundancy Feature Selection [J].
An, Shuang ;
Hu, Qinghua ;
Yu, Daren .
2008 IEEE INTERNATIONAL CONFERENCE ON GRANULAR COMPUTING, VOLS 1 AND 2, 2008, :101-106
[4]  
Cour T, 2011, J MACH LEARN RES, V12, P1501
[5]   Attribute reduction for heterogeneous data based on monotonic relative neighborhood granularity [J].
Dai, Jianhua ;
Zhu, Zhilin ;
Li, Min ;
Zou, Xiongtao ;
Zhang, Chucai .
INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2024, 170
[6]   Attribute Selection for Partially Labeled Categorical Data By Rough Set Approach [J].
Dai, Jianhua ;
Hu, Qinghua ;
Zhang, Jinghong ;
Hu, Hu ;
Zheng, Nenggan .
IEEE TRANSACTIONS ON CYBERNETICS, 2017, 47 (09) :2460-2471
[7]   Analysis of core attribute and approximate reduct based on the three-way decision [J].
Gao, Can ;
Wang, Zhicheng ;
Zhou, Jie ;
Zeng, Hang ;
Yue, Xiaodong .
APPLIED SOFT COMPUTING, 2024, 150
[8]   Neighborhood rough set based heterogeneous feature subset selection [J].
Hu, Qinghua ;
Yu, Daren ;
Liu, Jinfu ;
Wu, Congxin .
INFORMATION SCIENCES, 2008, 178 (18) :3577-3594
[9]   Measuring relevance between discrete and continuous features based on neighborhood mutual information [J].
Hu, Qinghua ;
Zhang, Lei ;
Zhang, David ;
Pan, Wei ;
An, Shuang ;
Pedrycz, Witold .
EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (09) :10737-10750
[10]   Learning from ambiguously labeled examples [J].
Huellermeier, Eyke ;
Beringer, Juergen .
INTELLIGENT DATA ANALYSIS, 2006, 10 (05) :419-439