Outlier detection for partially labeled categorical data based on conditional information entropy

被引:4
作者
Zhao, Zhengwei [1 ]
Wang, Rongrong [2 ]
Huang, Dan [3 ]
Li, Zhaowen [4 ]
机构
[1] Guangxi Minzu Univ, Sch Math & Phys, Nanning 530006, Guangxi, Peoples R China
[2] Guangxi Minzu Univ, Elect & Informat Engn, Nanning 530000, Guangxi, Peoples R China
[3] Yulin Normal Univ, Sch Comp Sci & Engn, Yulin 537000, Guangxi, Peoples R China
[4] Putian Univ, Key Lab Appl Math Fujian Prov Univ, Fujian Key Lab Financial Informat Proc, Putian 351100, Fujian, Peoples R China
基金
中国国家自然科学基金;
关键词
Partially labeled categorical data; Partially labeled categorical decision; information system; Outlier detection; Conditional information entropy; ALGORITHMS; CLUSTERS;
D O I
10.1016/j.ijar.2023.109086
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Labeling a large amount of data is exceptionally costly and practically infeasible, and thus available data may have missing labels. In this article, we investigate outlier detection for partially labeled categorical data based on conditional information entropy. Firstly, the equivalence class in a partially labeled categorical decision information system (p-CDIS) is introduced, so that the missing labels can be predicted by use of conditional probability. Then, conditional information entropy in a p-CDIS is calculated, which provides a more comprehensive measure of uncertainty. Additionally, the relative information entropy and relative cardinality in a p-CDIS are proposed. Next, the degree of outlierness and the weight function are presented to find outlier factors. Finally, an outlier detection method in a p-CDIS based on conditional information entropy is proposed, and a corresponding conditional information entropy algorithm (CEOF) is designed. To evaluate the stability of the CEOF algorithm, experiments are performed on ten UCI Machine Learning Repository datasets. Compared with five other algorithms, the proposed method is shown to have good effectiveness and adaptability for categorical data.
引用
收藏
页数:25
相关论文
共 45 条
[1]  
Akoglu L, 2012, P 21 ACM INT C INF K, P415, DOI DOI 10.1145/2396761.2396816
[2]   Attribute reduction in interval-valued information systems based on information entropies [J].
Dai, Jian-hua ;
Hu, Hu ;
Zheng, Guo-jie ;
Hu, Qing-hua ;
Han, Hui-feng ;
Shi, Hong .
FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2016, 17 (09) :919-928
[3]   Efficient density and cluster based incremental outlier detection in data streams [J].
Degirmenci, Ali ;
Karal, Omer .
INFORMATION SCIENCES, 2022, 607 :901-920
[4]   Pixel-wise Anomaly Detection in Complex Driving Scenes [J].
Di Biase, Giancarlo ;
Blum, Hermann ;
Siegwart, Roland ;
Cadena, Cesar .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :16913-16922
[5]   Exploiting evolving micro-clusters for data stream classification with emerging class detection [J].
Din, Salah Ud ;
Shao, Junming .
INFORMATION SCIENCES, 2020, 507 :404-420
[6]   A comparative evaluation of outlier detection algorithms: Experiments and analyses [J].
Domingues, Remi ;
Filippone, Maurizio ;
Michiardi, Pietro ;
Zouaoui, Jihane .
PATTERN RECOGNITION, 2018, 74 :406-421
[7]   Graph autoencoder-based unsupervised outlier detection [J].
Du, Xusheng ;
Yu, Jiong ;
Chu, Zheng ;
Jin, Lina ;
Chen, Jiaying .
INFORMATION SCIENCES, 2022, 608 :532-550
[8]  
Dua Dheeru, 2017, UCI machine learning repository
[9]   Uncertainty measures of rough set prediction [J].
Düntsch, I ;
Gediga, G .
ARTIFICIAL INTELLIGENCE, 1998, 106 (01) :109-137
[10]   Smart anomaly detection in sensor systems: A multi-perspective review [J].
Erhan, L. ;
Ndubuaku, M. ;
Di Mauro, M. ;
Song, W. ;
Chen, M. ;
Fortino, G. ;
Bagdasar, O. ;
Liotta, A. .
INFORMATION FUSION, 2021, 67 :64-79