An outlier detection approach in large-scale data stream using rough set

被引:10
作者
Singh, Manmohan [1 ]
Pamula, Rajendra [1 ]
机构
[1] Indian Sch Mines, Indian Inst Technol, Dept Comp Sci & Engn, Dhanbad 826004, Jharkhand, India
关键词
Relative information entropy; Outlier detection; Rough sets; Data mining; Indiscernible sets; INFORMATION-ENTROPY; UNCERTAINTY; REDUCTION;
D O I
10.1007/s00521-019-04421-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Outlier detection has become an important research area in the field of stream data mining due to its vast applications. In the literature, many methods have been proposed, but they work well for simple and positive regions of outliers, where boundary regions are not given much importance. Moreover, an algorithm which processes stream data must be effective and able to compute infinite data in one pass or limited number of passes. These problems have motivated us to propose an outlier detection approach for large-scale data stream. The proposed algorithm employs the concept of relative cardinality, entropy outlier factor theory of information-based system, and size-variant sliding window in stream data. In addition, we propose a new methodology for concept drift adaptation on evolving data streams. The proposed method is executed on nine benchmark datasets and compared with six existing methods that are EXPoSE, iForest, OC-SVM, LOF, KDE, and FastAbod. Experimental results show that the proposed method outperforms six existing methods in terms of receiver operating characteristic curve, precision recall, and computational time for positive regions as well as for boundary regions.
引用
收藏
页码:9113 / 9127
页数:15
相关论文
共 50 条
[41]   Spatial data methods and vague regions: A rough set approach [J].
Beaubouef, Theresa ;
Petry, Frederick E. ;
Ladner, Roy .
APPLIED SOFT COMPUTING, 2007, 7 (01) :425-440
[42]   Analysis using rough set of time series data including a large variation [J].
Matsumoto, Yoshiyuki ;
Watada, Junzo .
2014 JOINT 7TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS (SCIS) AND 15TH INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS (ISIS), 2014, :1378-1381
[43]   A practical outlier detection approach for mixed-attribute data [J].
Bouguessa, Mohamed .
EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (22) :8637-8649
[44]   Using data images for outlier detection [J].
Marchette, DJ ;
Solka, JL .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2003, 43 (04) :541-552
[45]   Minimal weighted infrequent itemset mining-based outlier detection approach on uncertain data stream [J].
Saihua Cai ;
Ruizhi Sun ;
Shangbo Hao ;
Sicong Li ;
Gang Yuan .
Neural Computing and Applications, 2020, 32 :6619-6639
[46]   Minimal weighted infrequent itemset mining-based outlier detection approach on uncertain data stream [J].
Cai, Saihua ;
Sun, Ruizhi ;
Hao, Shangbo ;
Li, Sicong ;
Yuan, Gang .
NEURAL COMPUTING & APPLICATIONS, 2020, 32 (11) :6619-6639
[47]   Rough set-based entropy measure with weighted density outlier detection method [J].
Sangeetha, Tamilarasu ;
Mary, Amalanathan Geetha .
OPEN COMPUTER SCIENCE, 2022, 12 (01) :123-133
[48]   Semantic-Based Anomaly Detection Approach for Large-Scale Time Series Data in Acceleration Events [J].
Tichomirov, Deniel ;
Ferraris, Alberto ;
Lamprecht, Axel .
INTERNATIONAL JOURNAL ON SEMANTIC WEB AND INFORMATION SYSTEMS, 2025, 21 (01)
[49]   Rough set approach for generation of classification rules of breast cancer data [J].
Hassanien, AE ;
Ali, JMH .
INFORMATICA, 2004, 15 (01) :23-38
[50]   A rough set approach to mining concise rules from inconsistent data [J].
Sai, Ying ;
Nie, Peiyao ;
Xu, Ruzhi ;
Huang, Jincai .
2006 IEEE INTERNATIONAL CONFERENCE ON GRANULAR COMPUTING, 2006, :333-+