A mathematical assessment of the isolation random forest method for anomaly detection in big data

被引:0
|
作者
Morales, Fernando A. [1 ,3 ]
Ramirez, Jorge M. [1 ,2 ]
Ramos, Edgar A. [1 ]
机构
[1] Univ Nacl Colombia, Escuela Matemat, Antioquia, Colombia
[2] Oak Ridge Natl Lab, Comp Sci & Math, Oak Ridge, TN USA
[3] Carrera 65 59A 110,43-106, Medellin, Colombia
关键词
anomaly detection; isolation random forest; monte carlo methods; probabilistic algorithms;
D O I
10.1002/mma.8570
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
We present the mathematical analysis of the Isolation Random Forest Method (IRF Method) for anomaly detection, proposed by Liu F.T., Ting K.M. and Zhou Z. H. in their seminal work as a heuristic method for anomaly detection in Big Data. We prove that the IRF space can be endowed with a probability induced by the Isolation Tree algorithm (iTree). In this setting, the convergence of the IRF method is proved, using the Law of Large Numbers. A couple of counterexamples are presented to show that the method is inconclusive and no certificate of quality can be given, when using it as a means to detect anomalies. Hence, an alternative version of the method is proposed whose mathematical foundation is fully justified. Furthermore, a criterion for choosing the number of sampled trees needed to guarantee confidence intervals of the numerical results is presented. Finally, numerical experiments are presented to compare the performance of the classic method with the proposed one.
引用
收藏
页码:1156 / 1177
页数:22
相关论文
共 50 条
  • [31] A UAV flight data anomaly detection method for multidimensional data and random noise
    Li, Shaobo
    Wang, Yan
    Yang, Lei
    Zhang, Ansi
    Li, Chuanjiang
    Zhongguo Guanxing Jishu Xuebao/Journal of Chinese Inertial Technology, 2024, 32 (07): : 733 - 742
  • [32] Medicare Fraud Detection using Random Forest with Class Imbalanced Big Data
    Bauder, Richard A.
    Khoshgoftaar, Taghi M.
    2018 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION (IRI), 2018, : 80 - 87
  • [33] A Multilevel Deep Learning Method for Data Fusion and Anomaly Detection of Power Big Data
    Liu, Dong-Lan
    Liu, Xin
    Yu, Hao
    Wang, Wen-Ting
    Zhao, Xiao-Hong
    Chen, Jian-Fei
    PROCEEDINGS OF THE 3RD ANNUAL INTERNATIONAL CONFERENCE ON ELECTRONICS, ELECTRICAL ENGINEERING AND INFORMATION SCIENCE (EEEIS 2017), 2017, 131 : 533 - 539
  • [34] Anomaly Detection Guidelines for Data Streams in Big Data
    Rana, Annie Ibrahim
    Estrada, Giovani
    Sole, Marc
    Muntes, Victor
    2016 3RD INTERNATIONAL CONFERENCE ON SOFT COMPUTING & MACHINE INTELLIGENCE (ISCMI 2016), 2016, : 94 - 98
  • [35] Random forest algorithm in big data environment
    Liu, Yingchun
    Computer Modelling and New Technologies, 2014, 18 (12): : 147 - 151
  • [36] `An enhanced variable selection and Isolation Forest based methodology for anomaly detection with OES data
    Puggini, Luca
    McLoone, Sean
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2018, 67 : 126 - 135
  • [37] Spectral-Spatial Anomaly Detection of Hyperspectral Data Based on Improved Isolation Forest
    Song, Xiangyu
    Aryal, Sunil
    Ting, Kai Ming
    Liu, Zhen
    He, Bin
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [38] Bilateral-Weighted Online Adaptive Isolation Forest for anomaly detection in streaming data
    Hannak, Gabor
    Horvath, Gabor
    Kadar, Attila
    Szalai, Mark Daniel
    STATISTICAL ANALYSIS AND DATA MINING, 2023, 16 (03) : 215 - 223
  • [39] Anomaly Detection for Data Streams Based on Isolation Forest Using Scikit-Multiflow
    Togbe, Maurras Ulbricht
    Barry, Mariam
    Boly, Aliou
    Chabchoub, Yousra
    Chiky, Raja
    Montiel, Jacob
    Tran, Vinh-Thuy
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS, ICCSA 2020, PART IV, 2020, 12252 : 15 - 30
  • [40] Contextual Anomaly Detection in Big Sensor Data
    Hayes, Michael A.
    Capretz, Miriam A. M.
    2014 IEEE INTERNATIONAL CONGRESS ON BIG DATA (BIGDATA CONGRESS), 2014, : 64 - 71