Outlier detection in a multiset-valued information system based on rough set theory and granular computing

被引:19
作者
Song, Yan [1 ]
Lin, Hai [2 ]
Li, Zhaowen [3 ]
机构
[1] Yulin Normal Univ, Ctr Appl Math Guangxi, Yulin 537000, Guangxi, Peoples R China
[2] Guangxi Univ, Sch Math & Informat Sci, Nanning 530000, Guangxi, Peoples R China
[3] Putian Univ, Key Lab Appl Math Fujian Prov Univ, Fujian Key Lab Financial Informat Proc, Putian 351100, Fujian, Peoples R China
基金
中国国家自然科学基金;
关键词
Rough set theory; Granular computing; Multiset-valued data; Missing data; Outlier detection; ALGORITHMS;
D O I
10.1016/j.ins.2023.119950
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Outlier detection on data with missing information values is especially tricky because the uncertainty caused by missing information values may contribute to an object being an outlier. A multiset-valued information system (MSVIS) is an information system (IS) in which information values are multisets. This kind of IS is a useful way of handling datasets with missing information values. In this paper, we study outlier detection in an MSVIS based on rough set theory and granular computing. First, some concepts of multisets and probability distribution sets are reviewed, and the fact that a weak one-to-one correspondence exists between multisets and rational probability distribution sets is illustrated. In this way, multisets may be treated as rational probability distribution sets. Then, an MSVIS can be induced by an incomplete information system (IIS) and viewed as the result of information fusion of multiple categorical ISs. Next, a tolerance relation in an MSVIS is constructed with the induced rational probability distribution sets. Then, the outlier factor in an MSVIS is formulated, and the corresponding outlier detection algorithm is proposed. Finally, the performance evaluation by AUC (area under the curve) and F1-score shows the superiority of the proposed algorithm over some existing algorithms.
引用
收藏
页数:24
相关论文
共 49 条
[1]   Fraud detection system: A survey [J].
Abdallah, Aisha ;
Maarof, Mohd Aizaini ;
Zainal, Anazida .
JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2016, 68 :90-113
[2]  
Aggarwal CC, 2001, SIGMOD RECORD, V30, P37
[3]   Rough Sets, Kernel Set, and Spatiotemporal Outlier Detection [J].
Albanese, Alessia ;
Pal, Sankar K. ;
Petrosino, Alfredo .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (01) :194-207
[4]   IOGOD: An interpretable outlier generation-based outlier detector for categorical databases [J].
Alexander Zenkl-Galaz, Michael ;
Loyola-Gonzalez, Octavio ;
Angel Medina-Perez, Miguel .
EXPERT SYSTEMS WITH APPLICATIONS, 2022, 195
[5]  
[Anonymous], 2008, Proceedings of the 8th SIAM International Conference on Data Mining (SDM)
[6]  
[Anonymous], 2007, Granular computing: past, present and future prospects
[7]   Aggregation operators on shadowed sets [J].
Boffa, Stefania ;
Campagner, Andrea ;
Ciucci, Davide ;
Yao, Yiyu .
INFORMATION SCIENCES, 2022, 595 :313-333
[8]   Object similarity measures and Pawlak's indiscernibility on decision tables [J].
Catanzariti, Francesca ;
Chiaselotti, Giampiero ;
Infusino, Federico G. ;
Marino, Giuseppe .
INFORMATION SCIENCES, 2020, 539 :104-135
[9]   Neighborhood outlier detection [J].
Chen, Yumin ;
Miao, Duoqian ;
Zhang, Hongyun .
EXPERT SYSTEMS WITH APPLICATIONS, 2010, 37 (12) :8745-8749
[10]  
Chen YM, 2008, LECT NOTES ARTIF INT, V5306, P283, DOI 10.1007/978-3-540-88425-5_29