An outlier detection approach in large-scale data stream using rough set

被引:8
|
作者
Singh, Manmohan [1 ]
Pamula, Rajendra [1 ]
机构
[1] Indian Sch Mines, Indian Inst Technol, Dept Comp Sci & Engn, Dhanbad 826004, Jharkhand, India
关键词
Relative information entropy; Outlier detection; Rough sets; Data mining; Indiscernible sets; INFORMATION-ENTROPY; UNCERTAINTY; REDUCTION;
D O I
10.1007/s00521-019-04421-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Outlier detection has become an important research area in the field of stream data mining due to its vast applications. In the literature, many methods have been proposed, but they work well for simple and positive regions of outliers, where boundary regions are not given much importance. Moreover, an algorithm which processes stream data must be effective and able to compute infinite data in one pass or limited number of passes. These problems have motivated us to propose an outlier detection approach for large-scale data stream. The proposed algorithm employs the concept of relative cardinality, entropy outlier factor theory of information-based system, and size-variant sliding window in stream data. In addition, we propose a new methodology for concept drift adaptation on evolving data streams. The proposed method is executed on nine benchmark datasets and compared with six existing methods that are EXPoSE, iForest, OC-SVM, LOF, KDE, and FastAbod. Experimental results show that the proposed method outperforms six existing methods in terms of receiver operating characteristic curve, precision recall, and computational time for positive regions as well as for boundary regions.
引用
收藏
页码:9113 / 9127
页数:15
相关论文
共 50 条
  • [1] An outlier detection approach in large-scale data stream using rough set
    Manmohan Singh
    Rajendra Pamula
    Neural Computing and Applications, 2020, 32 : 9113 - 9127
  • [2] A rough set approach to outlier detection
    Jiang, Feng
    Sui, Yuefei
    Cao, Cungen
    INTERNATIONAL JOURNAL OF GENERAL SYSTEMS, 2008, 37 (05) : 519 - 536
  • [3] An information entropy-based approach to outlier detection in rough sets
    Jiang, Feng
    Sui, Yuefei
    Cao, Cungen
    EXPERT SYSTEMS WITH APPLICATIONS, 2010, 37 (09) : 6338 - 6344
  • [4] Outlier Detection Forest for Large-Scale Categorical Data Sets
    Sun, Zhipeng
    Du, Hongwei
    Ye, Qiang
    Liu, Chuang
    Kibenge, Patricia Lilian
    Huang, Hui
    Li, Yuying
    COMPUTATIONAL DATA AND SOCIAL NETWORKS, 2019, 11917 : 45 - 56
  • [5] An Algorithm for Outlier Detection Using Rough Set Theory
    Lou, Mingzhu
    2015 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ENGINEERING APPLICATIONS (CSEA 2015), 2015, : 99 - 103
  • [6] Information-Theoretic Outlier Detection for Large-Scale Categorical Data
    Wu, Shu
    Wang, Shengrui
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2013, 25 (03) : 589 - 602
  • [7] Outlier Detection Based on Fuzzy Rough Granules in Mixed Attribute Data
    Yuan, Zhong
    Chen, Hongmei
    Li, Tianrui
    Sang, Binbin
    Wang, Shu
    IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (08) : 8399 - 8412
  • [8] Fast outlier detection method based on Rough set
    El Meziati, Marouane
    Ziyati, Houssaine
    9TH INTERNATIONAL SYMPOSIUM ON SIGNAL, IMAGE, VIDEO AND COMMUNICATIONS (ISIVC 2018), 2018, : 60 - 66
  • [9] Outlier Detection and Elimination in Stream Data - An Experimental Approach
    Kalisch, Mateusz
    Michalak, Marcin
    Przystalka, Piotr
    Sikora, Marek
    Wrobel, Lukasz
    ROUGH SETS, (IJCRS 2016), 2016, 9920 : 416 - 426
  • [10] An outlier detection algorithm based on information entropy and rough set
    Li, Hui
    Zhang, Shu
    Wang, Xia
    International Journal of Digital Content Technology and its Applications, 2012, 6 (20) : 97 - 106