An outlier detection approach in large-scale data stream using rough set

被引:10
作者
Singh, Manmohan [1 ]
Pamula, Rajendra [1 ]
机构
[1] Indian Sch Mines, Indian Inst Technol, Dept Comp Sci & Engn, Dhanbad 826004, Jharkhand, India
关键词
Relative information entropy; Outlier detection; Rough sets; Data mining; Indiscernible sets; INFORMATION-ENTROPY; UNCERTAINTY; REDUCTION;
D O I
10.1007/s00521-019-04421-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Outlier detection has become an important research area in the field of stream data mining due to its vast applications. In the literature, many methods have been proposed, but they work well for simple and positive regions of outliers, where boundary regions are not given much importance. Moreover, an algorithm which processes stream data must be effective and able to compute infinite data in one pass or limited number of passes. These problems have motivated us to propose an outlier detection approach for large-scale data stream. The proposed algorithm employs the concept of relative cardinality, entropy outlier factor theory of information-based system, and size-variant sliding window in stream data. In addition, we propose a new methodology for concept drift adaptation on evolving data streams. The proposed method is executed on nine benchmark datasets and compared with six existing methods that are EXPoSE, iForest, OC-SVM, LOF, KDE, and FastAbod. Experimental results show that the proposed method outperforms six existing methods in terms of receiver operating characteristic curve, precision recall, and computational time for positive regions as well as for boundary regions.
引用
收藏
页码:9113 / 9127
页数:15
相关论文
共 50 条
[31]   ODABK:An effective approach to detecting outlier in data stream [J].
Han, Feng ;
Wang, Yan-Ming ;
Wang, Hua-Peng .
PROCEEDINGS OF 2006 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2006, :1036-+
[32]   Outlier detection for high dimensional data using the Comedian approach [J].
Sajesh, T. A. ;
Srinivasan, M. R. .
JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2012, 82 (05) :745-757
[33]   A Mathematical Approach in Evaluating Biotechnology Attitude Scale: Rough Set Data Analysis [J].
Narli, Serkan ;
Sinan, Olcay .
KURAM VE UYGULAMADA EGITIM BILIMLERI, 2011, 11 (02) :720-726
[34]   Fast outlier detection using rough sets theory [J].
Shaari, F. ;
Bakar, A. A. ;
Hamdan, A. R. .
DATA MINING IX: DATA MINING, PROTECTION, DETECTION AND OTHER SECURITY TECHNOLOGIES, 2008, 40 :25-34
[35]   An Efficient Outlier Detection Approach on Weighted Data Stream Based on Minimal Rare Pattern Mining [J].
Cai, Saihua ;
Sun, Ruizhi ;
Hao, Shangbo ;
Li, Sicong ;
Yuan, Gang .
CHINA COMMUNICATIONS, 2019, 16 (10) :83-99
[36]   An Efficient Outlier Detection Approach on Weighted Data Stream Based on Minimal Rare Pattern Mining [J].
Saihua Cai ;
Ruizhi Sun ;
Shangbo Hao ;
Sicong Li ;
Gang Yuan .
中国通信, 2019, 16 (10) :83-99
[37]   An Efficient Outlier Detection Approach Over Uncertain Data Stream Based on Frequent Itemset Mining [J].
Hao, Shangbo ;
Cai, Saihua ;
Sun, Ruizhi ;
Li, Sicong .
INFORMATION TECHNOLOGY AND CONTROL, 2019, 48 (01) :34-46
[38]   Occupancy detection of residential buildings using smart meter data: A large-scale study [J].
Razavi, Rouzbeh ;
Gharipour, Amin ;
Fleury, Martin ;
Akpan, Ikpe Justice .
ENERGY AND BUILDINGS, 2019, 183 :195-208
[39]   Discovering Concurrent Process Models in Data: A Rough Set Approach [J].
Suraj, Zbigniew .
ROUGH SETS, FUZZY SETS, DATA MINING AND GRANULAR COMPUTING, PROCEEDINGS, 2009, 5908 :12-19
[40]   Anonymizing classification data using rough set theory [J].
Ye, Mingquan ;
Wu, Xindong ;
Hu, Xuegang ;
Hu, Donghui .
KNOWLEDGE-BASED SYSTEMS, 2013, 43 :82-94