An outlier detection approach in large-scale data stream using rough set

被引:10
作者
Singh, Manmohan [1 ]
Pamula, Rajendra [1 ]
机构
[1] Indian Sch Mines, Indian Inst Technol, Dept Comp Sci & Engn, Dhanbad 826004, Jharkhand, India
关键词
Relative information entropy; Outlier detection; Rough sets; Data mining; Indiscernible sets; INFORMATION-ENTROPY; UNCERTAINTY; REDUCTION;
D O I
10.1007/s00521-019-04421-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Outlier detection has become an important research area in the field of stream data mining due to its vast applications. In the literature, many methods have been proposed, but they work well for simple and positive regions of outliers, where boundary regions are not given much importance. Moreover, an algorithm which processes stream data must be effective and able to compute infinite data in one pass or limited number of passes. These problems have motivated us to propose an outlier detection approach for large-scale data stream. The proposed algorithm employs the concept of relative cardinality, entropy outlier factor theory of information-based system, and size-variant sliding window in stream data. In addition, we propose a new methodology for concept drift adaptation on evolving data streams. The proposed method is executed on nine benchmark datasets and compared with six existing methods that are EXPoSE, iForest, OC-SVM, LOF, KDE, and FastAbod. Experimental results show that the proposed method outperforms six existing methods in terms of receiver operating characteristic curve, precision recall, and computational time for positive regions as well as for boundary regions.
引用
收藏
页码:9113 / 9127
页数:15
相关论文
共 50 条
[21]   Mining pinyin-to-character conversion rules from large-scale corpus: A rough set approach [J].
Wang, XL ;
Chen, QC ;
Yeung, DS .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2004, 34 (02) :834-844
[22]   Data Intensive vs Sliding Window Outlier Detection in the Stream Data - An Experimental Approach [J].
Kalisch, Mateusz ;
Michalak, Marcin ;
Sikora, Marek ;
Wrobel, Lukasz ;
Przystalka, Piotr .
ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, (ICAISC 2016), PT II, 2016, 9693 :73-87
[23]   Distance-Based k-Nearest Neighbors Outlier Detection Method in Large-Scale Traffic Data [J].
Dang, Taurus T. ;
Ngan, Henry E. T. ;
Liu, Wei .
2015 IEEE INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING (DSP), 2015, :507-510
[24]   A comparison of parallel large-scale knowledge acquisition using rough set theory on different MapReduce runtime systems [J].
Zhang, Junbo ;
Wong, Jian-Syuan ;
Li, Tianrui ;
Pan, Yi .
INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2014, 55 (03) :896-907
[25]   Outlier detection for incomplete real-valued data via rough set theory and granular computing [J].
Zhao, Zhengwei ;
Yang, Genteng ;
Li, Zhaowen ;
Yu, Guangji .
JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 45 (04) :6247-6271
[26]   Research on outlier detection for high dimensional data stream [J].
Yu, Liping ;
Li, Yunfei ;
Jia, Juncheng .
PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND ENGINEERING APPLICATIONS, 2016, 63 :395-398
[27]   An Incremental Local Outlier Detection Method in the Data Stream [J].
Yao, Haiqing ;
Fu, Xiuwen ;
Yang, Yongsheng ;
Postolache, Octavian .
APPLIED SCIENCES-BASEL, 2018, 8 (08)
[28]   A Comparative Study of Outlier Detection for Large-scale Traffic Data by One-Class SVM and Kernel Density Estimation [J].
Ngan, Henry Y. T. ;
Yung, Nelson H. C. ;
Yeh, Anthony G. O. .
IMAGE PROCESSING: MACHINE VISION APPLICATIONS VIII, 2015, 9405
[29]   Large Scale, data driven, Digital Twin Models: Outlier Detection and Imputation [J].
Wieser, Raymond ;
Fan, Yangxin ;
Yu, Xuanji ;
Braid, Jennifer ;
Shaton, Avishai ;
Hoffma, Adam ;
Ben Spurgeon ;
Gibbons, Daniel ;
Bruckman, Laura S. ;
Wu, Yinghui ;
French, Roger H. .
2024 IEEE 52ND PHOTOVOLTAIC SPECIALIST CONFERENCE, PVSC, 2024, :0902-0905
[30]   A rough set approach to attribute generalization in data mining [J].
Chan, CC .
INFORMATION SCIENCES, 1998, 107 (1-4) :169-176