Outlier detection in a multiset-valued information system based on rough set theory and granular computing

被引:20
作者
Song, Yan [1 ]
Lin, Hai [2 ]
Li, Zhaowen [3 ]
机构
[1] Yulin Normal Univ, Ctr Appl Math Guangxi, Yulin 537000, Guangxi, Peoples R China
[2] Guangxi Univ, Sch Math & Informat Sci, Nanning 530000, Guangxi, Peoples R China
[3] Putian Univ, Key Lab Appl Math Fujian Prov Univ, Fujian Key Lab Financial Informat Proc, Putian 351100, Fujian, Peoples R China
基金
中国国家自然科学基金;
关键词
Rough set theory; Granular computing; Multiset-valued data; Missing data; Outlier detection; ALGORITHMS;
D O I
10.1016/j.ins.2023.119950
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Outlier detection on data with missing information values is especially tricky because the uncertainty caused by missing information values may contribute to an object being an outlier. A multiset-valued information system (MSVIS) is an information system (IS) in which information values are multisets. This kind of IS is a useful way of handling datasets with missing information values. In this paper, we study outlier detection in an MSVIS based on rough set theory and granular computing. First, some concepts of multisets and probability distribution sets are reviewed, and the fact that a weak one-to-one correspondence exists between multisets and rational probability distribution sets is illustrated. In this way, multisets may be treated as rational probability distribution sets. Then, an MSVIS can be induced by an incomplete information system (IIS) and viewed as the result of information fusion of multiple categorical ISs. Next, a tolerance relation in an MSVIS is constructed with the induced rational probability distribution sets. Then, the outlier factor in an MSVIS is formulated, and the corresponding outlier detection algorithm is proposed. Finally, the performance evaluation by AUC (area under the curve) and F1-score shows the superiority of the proposed algorithm over some existing algorithms.
引用
收藏
页数:24
相关论文
共 49 条
[21]   Categorification of a set relation geometry induced by rough operators [J].
Infusino, Federico .
JOURNAL OF ALGEBRA AND ITS APPLICATIONS, 2023, 22 (10)
[22]   On the theory of bags and lists [J].
Jena, SP ;
Ghosh, SK ;
Tripathy, BK .
INFORMATION SCIENCES, 2001, 132 (1-4) :241-254
[23]   Outlier detection based on granular computing and rough set theory [J].
Jiang, Feng ;
Chen, Yu-Ming .
APPLIED INTELLIGENCE, 2015, 42 (02) :303-322
[24]   An information entropy-based approach to outlier detection in rough sets [J].
Jiang, Feng ;
Sui, Yuefei ;
Cao, Cungen .
EXPERT SYSTEMS WITH APPLICATIONS, 2010, 37 (09) :6338-6344
[25]   Some issues about outlier detection in rough set theory [J].
Jiang, Feng ;
Sui, Yuefei ;
Cao, Cungen .
EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (03) :4680-4687
[26]  
John G. H., 1995, KDD-95 Proceedings. First International Conference on Knowledge Discovery and Data Mining, P174
[27]   DIVERGENCE AND BHATTACHARYYA DISTANCE MEASURES IN SIGNAL SELECTION [J].
KAILATH, T .
IEEE TRANSACTIONS ON COMMUNICATION TECHNOLOGY, 1967, CO15 (01) :52-&
[28]   The prevention and handling of the missing data [J].
Kang, Hyun .
KOREAN JOURNAL OF ANESTHESIOLOGY, 2013, 64 (05) :402-406
[29]   Handling missing data through deep convolutional neural network [J].
Khan, Hufsa ;
Wang, Xizhao ;
Liu, Han .
INFORMATION SCIENCES, 2022, 595 :278-293
[30]   Revealing quantum chaos with machine learning [J].
Kharkov, Y. A. ;
Sotskov, V. E. ;
Karazeev, A. A. ;
Kiktenko, E. O. ;
Fedorov, A. K. .
PHYSICAL REVIEW B, 2020, 101 (06)