Outlier Mining Methods Based on Graph Structure Analysis

被引:5
作者
Amil, Pablo [1 ]
Almeira, Nahuel [2 ,3 ]
Masoller, Cristina [1 ]
机构
[1] Univ Politecn Cataluna, Dept Phys, Barcelona, Spain
[2] Univ Nacl Cordoba, Fac Matemat Astron Fis & Computac, Cordoba, Argentina
[3] Consejo Nacl Invest Cient & Tecn, Inst Fis Enrique Gaviola, Cordoba, Argentina
基金
欧盟地平线“2020”;
关键词
outlier mining; anomaly detection; complex networks; machine learning; unsupervised learning; supervised learning; percolation; CARD FRAUD DETECTION; ANOMALY DETECTION; LOCALIZATION;
D O I
10.3389/fphy.2019.00194
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
Outlier detection in high-dimensional datasets is a fundamental and challenging problem across disciplines that has also practical implications, as removing outliers from the training set improves the performance of machine learning algorithms. While many outlier mining algorithms have been proposed in the literature, they tend to be valid or efficient for specific types of datasets (time series, images, videos, etc.). Here we propose two methods that can be applied to generic datasets, as long as there is a meaningful measure of distance between pairs of elements of the dataset. Both methods start by defining a graph, where the nodes are the elements of the dataset, and the links have associated weights that are the distances between the nodes. Then, the first method assigns an outlier score based on the percolation (i.e., the fragmentation) of the graph. The second method uses the popular IsoMap non-linear dimensionality reduction algorithm, and assigns an outlier score by comparing the geodesic distances with the distances in the reduced space. We test these algorithms on real and synthetic datasets and show that they either outperform, or perform on par with other popular outlier detection methods. A main advantage of the percolation method is that is parameter free and therefore, it does not require any training; on the other hand, the IsoMap method has two integer number parameters, and when they are appropriately selected, the method performs similar to or better than all the other methods tested.
引用
收藏
页数:11
相关论文
共 62 条
[1]   Mobile-Assisted Anchor Outlier Detection for Localization in Wireless Sensor Networks [J].
Abukhalaf, Hala ;
Wang, Jianxin ;
Zhang, Shigeng .
INTERNATIONAL JOURNAL OF FUTURE GENERATION COMMUNICATION AND NETWORKING, 2016, 9 (07) :63-75
[2]   Outlier Detection Techniques for Localization in Wireless Sensor Networks: A Survey [J].
Abukhalaf, Hala ;
Wang, Jianxin ;
Zhang, Shigeng .
INTERNATIONAL JOURNAL OF FUTURE GENERATION COMMUNICATION AND NETWORKING, 2015, 8 (06) :99-113
[3]  
Agovic A, 2009, IND INNOV SER, P81
[4]   Anomaly detection using manifold embedding and its applications in transportation corridors [J].
Agovic, Amrudin ;
Banerjee, Arindam ;
Ganguly, Auroop ;
Protopopescu, Vladimir .
INTELLIGENT DATA ANALYSIS, 2009, 13 (03) :435-455
[5]   Survey on Anomaly Detection using Data Mining Techniques [J].
Agrawal, Shikha ;
Agrawal, Jitendra .
KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS 19TH ANNUAL CONFERENCE, KES-2015, 2015, 60 :708-713
[6]   Roadmap on optical rogue waves and extreme events [J].
Akhmediev, Nail ;
Kibler, Bertrand ;
Baronio, Fabio ;
Belic, Milivoj ;
Zhong, Wei-Ping ;
Zhang, Yiqi ;
Chang, Wonkeun ;
Soto-Crespo, Jose M. ;
Vouzas, Peter ;
Grelu, Philippe ;
Lecaplain, Caroline ;
Hammani, K. ;
Rica, S. ;
Picozzi, A. ;
Tlidi, Mustapha ;
Panajotov, Krassimir ;
Mussot, Arnaud ;
Bendahmane, Abdelkrim ;
Szriftgiser, Pascal ;
Genty, Goery ;
Dudley, John ;
Kudlinski, Alexandre ;
Demircan, Ayhan ;
Morgner, Uwe ;
Amiraranashvili, Shalva ;
Bree, Carsten ;
Steinmeyer, Guenter ;
Masoller, C. ;
Broderick, Neil G. R. ;
Runge, Antoine F. J. ;
Erkintalo, Miro ;
Residori, S. ;
Bortolozzo, U. ;
Arecchi, F. T. ;
Wabnitz, Stefan ;
Tiofack, C. G. ;
Coulibaly, S. ;
Taki, M. .
JOURNAL OF OPTICS, 2016, 18 (06)
[7]   CARDWATCH: A neural network based database mining system for credit card fraud detection [J].
Aleskerov, E ;
Freisleben, B ;
Rao, B .
PROCEEDINGS OF THE IEEE/IAFE 1997 COMPUTATIONAL INTELLIGENCE FOR FINANCIAL ENGINEERING (CIFER), 1997, :220-226
[8]   Unsupervised feature extraction of anterior chamber OCT images for ordering and classification [J].
Amil, Pablo ;
Gonzalez, Laura ;
Arrondo, Elena ;
Salinas, Cecilia ;
Guell, J. L. ;
Masoller, Cristina ;
Parlitz, Ulrich .
SCIENTIFIC REPORTS, 2019, 9 (1)
[9]   Outlier mining in large high-dimensional data sets [J].
Angiulli, F ;
Pizzuti, C .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005, 17 (02) :203-215
[10]   DOLPHIN: An Efficient Algorithm for Mining Distance-Based Outliers in Very Large Datasets [J].
Angiulli, Fabrizio ;
Fassetti, Fabio .
ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2009, 3 (01)