Statistical Hypothesis Testing Based on Machine Learning: Large Deviations Analysis

被引:7
|
作者
Braca, Paolo [1 ]
Millefiori, Leonardo M. [1 ]
Aubry, Augusto [2 ]
Marano, Stefano [3 ]
De Maio, Antonio [2 ]
Willett, Peter [4 ]
机构
[1] Ctr Maritime Res & Experimentat, Res Dept, I-19126 La Spezia, SP, Italy
[2] Univ Naples Federico II, DIETI, I-80125 Naples, NA, Italy
[3] Univ Salerno, DIEM, I-84084 Fisciano, SA, Italy
[4] Univ Connecticut, Dept Elect & Comp Engn, Storrs, CT 06269 USA
来源
IEEE OPEN JOURNAL OF SIGNAL PROCESSING | 2022年 / 3卷
关键词
Error probability; Training; Artificial intelligence; Convergence; Error analysis; Surveillance; Signal processing; Machine learning; deep learning; large deviations principle; exact asymptotics; statistical hypothesis testing; Fenchel-Legendre transform; extended target detection; radar; sonar detection; X-band maritime radar; EXTENDED TARGET TRACKING; DISTRIBUTED DETECTION; ARTIFICIAL-INTELLIGENCE; MARITIME SURVEILLANCE; MULTIPLE SENSORS; NEURAL-NETWORK; DEEP; CLASSIFICATION; ALGORITHMS; CONSENSUS;
D O I
10.1109/OJSP.2022.3232284
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
We study the performance of Machine Learning (ML) classification techniques. Leveraging the theory of large deviations, we provide the mathematical conditions for a ML classifier to exhibit error probabilities that vanish exponentially, say exp(-n I), where n is the number of informative observations available for testing (or another relevant parameter, such as the size of the target in an image) and I is the error rate. Such conditions depend on the Fenchel-Legendre transform of the cumulant-generating function of the Data-Driven Decision Function (D3F, i.e., what is thresholded before the final binary decision is made) learned in the training phase. As such, the D3F and the related error rate I depend on the given training set. The conditions for the exponential convergence can be verified and tested numerically exploiting the available dataset or a synthetic dataset generated according to the underlying statistical model. Coherently with the large deviations theory, we can also establish the convergence of the normalized D3F statistic to a Gaussian distribution. Furthermore, approximate error probability curves zeta(n) exp(-n I) are provided, thanks to the refined asymptotic derivation, where zeta n represents the most representative sub-exponential terms of the error probabilities. Leveraging the refined asymptotic, we are able to compute an accurate analytical approximation of the classification performance for both the regimes of small and large values of n. Theoretical findings are corroborated by extensive numerical simulations and by the use of real-world data, acquired by an X-band maritime radar system for surveillance.
引用
收藏
页码:464 / 495
页数:32
相关论文
共 50 条
  • [1] Machine learning-based statistical testing hypothesis for fault detection in photovoltaic systems
    Fazai, R.
    Abodayeh, K.
    Mansouri, M.
    Trabelsi, M.
    Nounou, H.
    Nounou, M.
    Georghiou, G. E.
    SOLAR ENERGY, 2019, 190 : 405 - 413
  • [2] Machine Learning-Based Statistical Hypothesis Testing for Fault Detection
    Fazai, Radhia
    Mansouri, Majdi
    Abodayeh, Kamal
    Trabelsi, Mohamed
    Nounou, Hazem
    Nounou, Mohamed
    2019 4TH CONFERENCE ON CONTROL AND FAULT TOLERANT SYSTEMS (SYSTOL), 2019, : 38 - 43
  • [3] Framework for Testing Robustness of Machine Learning-Based Classifiers
    Chuah, Joshua
    Kruger, Uwe
    Wang, Ge
    Yan, Pingkun
    Hahn, Juergen
    JOURNAL OF PERSONALIZED MEDICINE, 2022, 12 (08):
  • [4] Lung disease recognition methods using audio-based analysis with machine learning
    Sabry, Ahmad H.
    Bashi, Omar I. Dallal
    Ali, N. H. Nik
    Al Kubaisi, Yasir Mahmood
    HELIYON, 2024, 10 (04)
  • [5] Large and Small Deviations for Statistical Sequence Matching
    Zhou, Lin
    Wang, Qianyun
    Wang, Jingjing
    Bai, Lin
    Hero III, Alfred O.
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2024, 70 (11) : 7532 - 7562
  • [6] Demystifying image-based machine learning: a practical guide to automated analysis of field imagery using modern machine learning tools
    Belcher, Byron T.
    Bower, Eliana H.
    Burford, Benjamin
    Celis, Maria Rosa
    Fahimipour, Ashkaan K.
    Guevara, Isabela L.
    Katija, Kakani
    Khokhar, Zulekha
    Manjunath, Anjana
    Nelson, Samuel
    Olivetti, Simone
    Orenstein, Eric
    Saleh, Mohamad H.
    Vaca, Brayan
    Valladares, Salma
    Hein, Stella A.
    Hein, Andrew M.
    FRONTIERS IN MARINE SCIENCE, 2023, 10
  • [7] Machine Learning Models for Statistical Analysis
    Grebovic, Marko
    Filipovic, Luka
    Katnic, Ivana
    Vukotic, Milica
    Popovic, Tomo
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2023, 20 (3A) : 505 - 514
  • [8] Parametric Circuit Fault Diagnosis Through Oscillation-Based Testing in Analogue Circuits: Statistical and Deep Learning Approaches
    Cloete, Jacob B.
    Stander, Tinus
    Wilke, Daniel N.
    IEEE ACCESS, 2022, 10 : 15671 - 15680
  • [9] HypoML: Visual Analysis for Hypothesis-based Evaluation of Machine Learning Models
    Wang, Qianwen
    Alexander, William
    Pegg, Jack
    Qu, Huamin
    Chen, Min
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2021, 27 (02) : 1417 - 1426
  • [10] Machine learning based small bowel video capsule endoscopy analysis: Challenges and opportunities
    Wahab, Haroon
    Mehmood, Irfan
    Ugail, Hassan
    Sangaiah, Arun Kumar
    Muhammad, Khan
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2023, 143 : 191 - 214