A mathematical assessment of the isolation random forest method for anomaly detection in big data

被引:0
|
作者
Morales, Fernando A. [1 ,3 ]
Ramirez, Jorge M. [1 ,2 ]
Ramos, Edgar A. [1 ]
机构
[1] Univ Nacl Colombia, Escuela Matemat, Antioquia, Colombia
[2] Oak Ridge Natl Lab, Comp Sci & Math, Oak Ridge, TN USA
[3] Carrera 65 59A 110,43-106, Medellin, Colombia
关键词
anomaly detection; isolation random forest; monte carlo methods; probabilistic algorithms;
D O I
10.1002/mma.8570
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
We present the mathematical analysis of the Isolation Random Forest Method (IRF Method) for anomaly detection, proposed by Liu F.T., Ting K.M. and Zhou Z. H. in their seminal work as a heuristic method for anomaly detection in Big Data. We prove that the IRF space can be endowed with a probability induced by the Isolation Tree algorithm (iTree). In this setting, the convergence of the IRF method is proved, using the Law of Large Numbers. A couple of counterexamples are presented to show that the method is inconclusive and no certificate of quality can be given, when using it as a means to detect anomalies. Hence, an alternative version of the method is proposed whose mathematical foundation is fully justified. Furthermore, a criterion for choosing the number of sampled trees needed to guarantee confidence intervals of the numerical results is presented. Finally, numerical experiments are presented to compare the performance of the classic method with the proposed one.
引用
收藏
页码:1156 / 1177
页数:22
相关论文
共 50 条
  • [1] An Improved Data Anomaly Detection Method Based on Isolation Forest
    Xu, Dong
    Wang, Yanjun
    Meng, Yulong
    Zhang, Ziying
    2017 10TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID), VOL 2, 2017, : 287 - 291
  • [2] Extending Isolation Forest for Anomaly Detection in Big Data via K-Means
    Laskar, Md Tahmid Rahman
    Huang, Jimmy Xiangji
    Smetana, Vladan
    Stewart, Chris
    Pouw, Kees
    An, Aijun
    Chan, Stephen
    Liu, Lei
    ACM TRANSACTIONS ON CYBER-PHYSICAL SYSTEMS, 2021, 5 (04)
  • [3] Distribution Forest: An Anomaly Detection Method Based on Isolation Forest
    Yao, Chengfei
    Ma, Xiaoqing
    Chen, Biao
    Zhao, Xiaosong
    Bai, Gang
    ADVANCED PARALLEL PROCESSING TECHNOLOGIES (APPT 2019), 2019, 11719 : 135 - 147
  • [4] Leveraging an Isolation Forest to Anomaly Detection and Data Clustering
    Yepmo, Veronne
    Smits, Gregory
    Lesot, Marie -Jeanne
    Pivert, Olivier
    DATA & KNOWLEDGE ENGINEERING, 2024, 151
  • [5] Anomaly Detection in Streaming Data using Isolation Forest
    Kareem, Mohammed Shaker
    Muhammed, Lamia AbedNoor
    PROCEEDINGS 2024 SEVENTH INTERNATIONAL WOMEN IN DATA SCIENCE CONFERENCE AT PRINCE SULTAN UNIVERSITY, WIDS-PSU 2024, 2024, : 223 - 228
  • [6] On the statistical properties of the isolation forest anomaly detection method
    Pelletier, Bruno
    ELECTRONIC JOURNAL OF STATISTICS, 2024, 18 (02): : 4322 - 4381
  • [7] An anomaly detection method based on random convolutional kernel and isolation forest for equipment state monitoring
    Shu, Xinhao
    Zhang, Shigang
    Li, Yue
    Chen, Mengqiao
    EKSPLOATACJA I NIEZAWODNOSC-MAINTENANCE AND RELIABILITY, 2022, 24 (04): : 758 - 770
  • [8] Financial Data Anomaly Detection Method Based on Decision Tree and Random Forest Algorithm
    Zhang, Qingyang
    JOURNAL OF MATHEMATICS, 2022, 2022
  • [9] Similarity-Measured Isolation Forest: Anomaly Detection Method for Machine Monitoring Data
    Li, Changgen
    Guo, Liang
    Gao, Hongli
    Li, Yi
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2021, 70 (70)
  • [10] Anomaly credit data detection based on enhanced Isolation Forest
    Zhang, Xiaodong
    Yao, Yuan
    Lv, Congdong
    Wang, Tao
    INTERNATIONAL JOURNAL OF ADVANCED MANUFACTURING TECHNOLOGY, 2022, 122 (01): : 185 - 192