Statistical Hypothesis Testing Based on Machine Learning: Large Deviations Analysis

被引:7
|
作者
Braca, Paolo [1 ]
Millefiori, Leonardo M. [1 ]
Aubry, Augusto [2 ]
Marano, Stefano [3 ]
De Maio, Antonio [2 ]
Willett, Peter [4 ]
机构
[1] Ctr Maritime Res & Experimentat, Res Dept, I-19126 La Spezia, SP, Italy
[2] Univ Naples Federico II, DIETI, I-80125 Naples, NA, Italy
[3] Univ Salerno, DIEM, I-84084 Fisciano, SA, Italy
[4] Univ Connecticut, Dept Elect & Comp Engn, Storrs, CT 06269 USA
来源
IEEE OPEN JOURNAL OF SIGNAL PROCESSING | 2022年 / 3卷
关键词
Error probability; Training; Artificial intelligence; Convergence; Error analysis; Surveillance; Signal processing; Machine learning; deep learning; large deviations principle; exact asymptotics; statistical hypothesis testing; Fenchel-Legendre transform; extended target detection; radar; sonar detection; X-band maritime radar; EXTENDED TARGET TRACKING; DISTRIBUTED DETECTION; ARTIFICIAL-INTELLIGENCE; MARITIME SURVEILLANCE; MULTIPLE SENSORS; NEURAL-NETWORK; DEEP; CLASSIFICATION; ALGORITHMS; CONSENSUS;
D O I
10.1109/OJSP.2022.3232284
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
We study the performance of Machine Learning (ML) classification techniques. Leveraging the theory of large deviations, we provide the mathematical conditions for a ML classifier to exhibit error probabilities that vanish exponentially, say exp(-n I), where n is the number of informative observations available for testing (or another relevant parameter, such as the size of the target in an image) and I is the error rate. Such conditions depend on the Fenchel-Legendre transform of the cumulant-generating function of the Data-Driven Decision Function (D3F, i.e., what is thresholded before the final binary decision is made) learned in the training phase. As such, the D3F and the related error rate I depend on the given training set. The conditions for the exponential convergence can be verified and tested numerically exploiting the available dataset or a synthetic dataset generated according to the underlying statistical model. Coherently with the large deviations theory, we can also establish the convergence of the normalized D3F statistic to a Gaussian distribution. Furthermore, approximate error probability curves zeta(n) exp(-n I) are provided, thanks to the refined asymptotic derivation, where zeta n represents the most representative sub-exponential terms of the error probabilities. Leveraging the refined asymptotic, we are able to compute an accurate analytical approximation of the classification performance for both the regimes of small and large values of n. Theoretical findings are corroborated by extensive numerical simulations and by the use of real-world data, acquired by an X-band maritime radar system for surveillance.
引用
收藏
页码:464 / 495
页数:32
相关论文
共 50 条
  • [31] Statistical Machine Learning and Dissolved Gas Analysis: A Review
    Mirowski, Piotr
    LeCun, Yann
    IEEE TRANSACTIONS ON POWER DELIVERY, 2012, 27 (04) : 1791 - 1799
  • [32] Advanced Machine Learning Models for Large Scale Gene Expression Analysis in Cancer Classification: Deep Learning Versus Classical Models
    Zenbout, Imene
    Meshoul, Souham
    BIG DATA, CLOUD AND APPLICATIONS, BDCA 2018, 2018, 872 : 210 - 221
  • [33] Recent advancements in machine learning for bone marrow cell morphology analysis
    Lin, Yifei
    Chen, Qingquan
    Chen, Tebin
    FRONTIERS IN MEDICINE, 2024, 11
  • [34] STATISTICAL HYPOTHESIS TESTING IN EXCEL
    Hic, Pavel
    Pokorny, Milan
    APLIMAT 2009: 8TH INTERNATIONAL CONFERENCE, PROCEEDINGS, 2009, : 663 - 666
  • [35] Automatic Optimization-Based Methods in Machine Learning: A Systematic Review
    Shahrabadi, Somayeh
    Adao, Telmo
    Alves, Victor
    Magalhaes, Luis G.
    INTELLIGENT SYSTEMS AND APPLICATIONS, VOL 2, INTELLISYS 2023, 2024, 823 : 309 - 326
  • [36] Machine learning-based analysis of historical towers
    Dabiri, Hamed
    Clementi, Jessica
    Marini, Roberta
    Mugnozza, Gabriele Scarascia
    Bozzano, Francesca
    Mazzanti, Paolo
    ENGINEERING STRUCTURES, 2024, 304
  • [37] A review of psoriasis image analysis based on machine learning
    Li, Huihui
    Chen, Guangjie
    Zhang, Li
    Xu, Chunlin
    Wen, Ju
    FRONTIERS IN MEDICINE, 2024, 11
  • [38] Machine Learning and Deep Learning for Diagnosis of Lumbar Spinal Stenosis: Systematic Review and Meta-Analysis
    Wang, Tianyi
    Chen, Ruiyuan
    Fan, Ning
    Zang, Lei
    Yuan, Shuo
    Du, Peng
    Wu, Qichao
    Wang, Aobo
    Li, Jian
    Kong, Xiaochuan
    Zhu, Wenyi
    JOURNAL OF MEDICAL INTERNET RESEARCH, 2024, 26
  • [39] Machine Learning Model Construction and Testing: Anticipating Cancer Incidence and Mortality
    Ding, Yuanzhao
    DISEASES, 2024, 12 (07)
  • [40] Breaking barriers: a statistical and machine learning-based hybrid system for predicting dementia
    Javeed, Ashir
    Anderberg, Peter
    Ghazi, Ahmad Nauman
    Noor, Adeeb
    Elmstahl, Solve
    Berglund, Johan Sanmartin
    FRONTIERS IN BIOENGINEERING AND BIOTECHNOLOGY, 2024, 11