Statistical Hypothesis Testing Based on Machine Learning: Large Deviations Analysis

被引:7
|
作者
Braca, Paolo [1 ]
Millefiori, Leonardo M. [1 ]
Aubry, Augusto [2 ]
Marano, Stefano [3 ]
De Maio, Antonio [2 ]
Willett, Peter [4 ]
机构
[1] Ctr Maritime Res & Experimentat, Res Dept, I-19126 La Spezia, SP, Italy
[2] Univ Naples Federico II, DIETI, I-80125 Naples, NA, Italy
[3] Univ Salerno, DIEM, I-84084 Fisciano, SA, Italy
[4] Univ Connecticut, Dept Elect & Comp Engn, Storrs, CT 06269 USA
来源
IEEE OPEN JOURNAL OF SIGNAL PROCESSING | 2022年 / 3卷
关键词
Error probability; Training; Artificial intelligence; Convergence; Error analysis; Surveillance; Signal processing; Machine learning; deep learning; large deviations principle; exact asymptotics; statistical hypothesis testing; Fenchel-Legendre transform; extended target detection; radar; sonar detection; X-band maritime radar; EXTENDED TARGET TRACKING; DISTRIBUTED DETECTION; ARTIFICIAL-INTELLIGENCE; MARITIME SURVEILLANCE; MULTIPLE SENSORS; NEURAL-NETWORK; DEEP; CLASSIFICATION; ALGORITHMS; CONSENSUS;
D O I
10.1109/OJSP.2022.3232284
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
We study the performance of Machine Learning (ML) classification techniques. Leveraging the theory of large deviations, we provide the mathematical conditions for a ML classifier to exhibit error probabilities that vanish exponentially, say exp(-n I), where n is the number of informative observations available for testing (or another relevant parameter, such as the size of the target in an image) and I is the error rate. Such conditions depend on the Fenchel-Legendre transform of the cumulant-generating function of the Data-Driven Decision Function (D3F, i.e., what is thresholded before the final binary decision is made) learned in the training phase. As such, the D3F and the related error rate I depend on the given training set. The conditions for the exponential convergence can be verified and tested numerically exploiting the available dataset or a synthetic dataset generated according to the underlying statistical model. Coherently with the large deviations theory, we can also establish the convergence of the normalized D3F statistic to a Gaussian distribution. Furthermore, approximate error probability curves zeta(n) exp(-n I) are provided, thanks to the refined asymptotic derivation, where zeta n represents the most representative sub-exponential terms of the error probabilities. Leveraging the refined asymptotic, we are able to compute an accurate analytical approximation of the classification performance for both the regimes of small and large values of n. Theoretical findings are corroborated by extensive numerical simulations and by the use of real-world data, acquired by an X-band maritime radar system for surveillance.
引用
收藏
页码:464 / 495
页数:32
相关论文
共 50 条
  • [21] Machine Learning Tools for Image-Based Glioma Grading and the Quality of Their Reporting: Challenges and Opportunities
    Merkaj, Sara
    Bahar, Ryan C.
    Zeevi, Tal
    Lin, MingDe
    Ikuta, Ichiro
    Bousabarah, Khaled
    Cassinelli Petersen, Gabriel I.
    Staib, Lawrence
    Payabvash, Seyedmehdi
    Mongan, John T.
    Cha, Soonmee
    Aboian, Mariam S.
    CANCERS, 2022, 14 (11)
  • [22] The ethics of machine learning-based clinical decision support: an analysis through the lens of professionalisation theory
    Heyen, Nils B.
    Salloch, Sabine
    BMC MEDICAL ETHICS, 2021, 22 (01)
  • [23] Performance of Radiomics-based machine learning and deep learning-based methods in the prediction of tumor grade in meningioma: a systematic review and meta-analysis
    Tavanaei, Roozbeh
    Akhlaghpasand, Mohammadhosein
    Alikhani, Alireza
    Hajikarimloo, Bardia
    Ansari, Ali
    Yong, Raymund L.
    Margetis, Konstantinos
    NEUROSURGICAL REVIEW, 2025, 48 (01)
  • [24] Analysis of Machine Learning Techniques for Information Classification in Mobile Applications
    Arteaga, Sandra Perez
    Orozco, Ana Lucila Sandoval
    Villalba, Luis Javier Garcia
    APPLIED SCIENCES-BASEL, 2023, 13 (09):
  • [25] Primer on machine learning: utilization of large data set analyses to individualize pain management
    Rashidi, Parisa
    Edwards, David A.
    Tighe, Patrick J.
    CURRENT OPINION IN ANESTHESIOLOGY, 2019, 32 (05) : 653 - 660
  • [26] Combining Multiple Hypothesis Testing with Machine Learning Increases the Statistical Power of Genome-wide Association Studies
    Mieth, Bettina
    Kloft, Marius
    Rodriguez, Juan Antonio
    Sonnenburg, Soren
    Vobruba, Robin
    Morcillo-Suarez, Carlos
    Farre, Xavier
    Marigorta, Urko M.
    Fehr, Ernst
    Dickhaus, Thorsten
    Blanchard, Gilles
    Schunk, Daniel
    Navarro, Arcadi
    Mueller, Klaus-Robert
    SCIENTIFIC REPORTS, 2016, 6
  • [27] Review of machine learning-based Mineral Resource estimation
    Mahoob, M. A.
    Celik, T.
    Genc, B.
    JOURNAL OF THE SOUTHERN AFRICAN INSTITUTE OF MINING AND METALLURGY, 2022, 122 (11) : 655 - 664
  • [28] Image-Based Cardiac Diagnosis With Machine Learning: A Review
    Martin-Isla, Carlos
    Campello, Victor M.
    Izquierdo, Cristian
    Raisi-Estabragh, Zahra
    Baessler, Bettina
    Petersen, Steffen E.
    Lekadir, Karim
    FRONTIERS IN CARDIOVASCULAR MEDICINE, 2020, 7
  • [29] Big data analysis and artificial intelligence in epilepsy - common data model analysis and machine learning-based seizure detection and forecasting
    Chung, Yoon Gi
    Jeon, Yonghoon
    Yoo, Sooyoung
    Kim, Hunmin
    Hwang, Hee
    CLINICAL AND EXPERIMENTAL PEDIATRICS, 2022, 65 (06) : 272 - 282
  • [30] Using machine learning to identify clotted specimens in coagulation testing
    Fang, Kui
    Dong, Zheqing
    Chen, Xiling
    Zhu, Ji
    Zhang, Bing
    You, Jinbiao
    Xiao, Yingjun
    Xia, Wenjin
    CLINICAL CHEMISTRY AND LABORATORY MEDICINE, 2021, 59 (07) : 1289 - 1297