Measures of uncertainty for partially labeled categorical data based on an indiscernibility relation: an application in semi-supervised attribute reduction

被引：1

作者：

He, Jiali ^{[1
]}

Zhang, Gangqiang ^{[2
]}

Huang, Dan ^{[3
]}

Wang, Pei ^{[1
]}

Yu, Guangji ^{[4
]}

机构：

[1] Yulin Normal Univ, Ctr Appl Math Guangxi, Yulin 537000, Guangxi, Peoples R China

[2] Guangxi Minzu Univ, Sch Artificial Intelligence, Nanning 530006, Guangxi, Peoples R China

[3] Yulin Normal Univ, Sch Comp Sci & Engn, Yulin 537000, Guangxi, Peoples R China

[4] Guangxi Univ Finance & Econ, Sch Big Data & Artificial Intelligence, Nanning 530003, Guangxi, Peoples R China

来源：

APPLIED INTELLIGENCE | 2023年 / 53卷 / 23期

关键词：

p-CDIS; Uncertainty measurement; Indiscernibility relation; Conditional information entropy; Conditional information amount; Semi-supervised attribute reduction; FEATURE-SELECTION; GRANULATION; ENTROPY; RULES;

D O I：

10.1007/s10489-023-05078-2

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In many practical applications of machine learning, only part of data is labeled because the cost of assessing class label is relatively high. This paper concentrates on measures of uncertainty for a partial label categorical decision information system (p-CDIS), and considers an application to semi-supervised attribute reduction. Firstly, two decision information systems (DISs) can be induced by a p-CDIS (U, C, d): one is for a decision information system for labeled categorical data (U-l, C, d) and the other one is a decision information system for unlabeled categorical data (U-u, C, d), and the missing rate of labels in (U, C, d) is introduced. In view of partial label data, the existential research did not take into account the missing rate of labels and only considered one importance of each attribute subset. Then, four importance of an attribute subset P subset of C in (U, C, d) are defined based on an indiscernibility relation. They are the weighted sum of the importance of P in (U-l, C, d) and (U-u, C, d) determined by the missing rate of labels. These four importance can be regarded as four uncertainty measurements (UMs) for (U, P, d). Next, numerical experiments and statistical tests are carried out on 15 datasets of UCI to demonstrate four UMs' advantages and disadvantages. Finally, as an application for UM in p-CDIS, two better UMs are used as semi-supervised attribute reduction and two corresponding algorithms are designed that can automatically adapt to different missing rates of labels. The experimental results show the feasibility and superiority of the designed algorithms.

引用

页码：29486 / 29513

页数：28

共 49 条

[1] [Anonymous], 2010, UCI Machine Learning Repository
[2] Submodular Feature Selection for Partial Label Learning
Bao, Wei-Xuan
Hang, Jun-Yi
Zhang, Min-Ling
[J]. PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 26 - 34
[3] Partial Label Dimensionality Reduction via Confidence-Based Dependence Maximization
Bao, Wei-Xuan
Hang, Jun-Yi
Zhang, Min-Ling
[J]. KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, : 46 - 54
[4] Information-theoretic measures of uncertainty for rough sets and rough relational databases
Beaubouef, T
Petry, FE
Arora, G
[J]. INFORMATION SCIENCES, 1998, 109 (1-4) : 185 - 195
[5] Efficient Semi-Supervised Feature Selection: Constraint, Relevance, and Redundancy
Benabdeslem, Khalid
Hindawi, Mohammed
[J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (05) : 1131 - 1143
[6] Blum A., 1998, Proceedings of the Eleventh Annual Conference on Computational Learning Theory, P92, DOI 10.1145/279943.279962
[7] Campagner Andrea, 2022, Information Processing and Management of Uncertainty in Knowledge-Based Systems: 19th International Conference, IPMU 2022, Proceedings. Communications in Computer and Information Science (1602), P761, DOI 10.1007/978-3-031-08974-9_60
[8] Rough set-based feature selection for weakly labeled data
Campagner, Andrea
Ciucci, Davide
Huellermeier, Eyke
[J]. INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2021, 136 : 150 - 167
[9] A survey on feature selection methods
Chandrashekar, Girish
Sahin, Ferat
[J]. COMPUTERS & ELECTRICAL ENGINEERING, 2014, 40 (01) : 16 - 28
[10] Attribute group for attribute reduction
Chen, Yan
Liu, Keyu
Song, Jingjing
Fujita, Hamido
Yang, Xibei
Qian, Yuhua
[J]. INFORMATION SCIENCES, 2020, 535 : 64 - 80

← 1 2 3 4 5 →