Statistical monitoring applied to data science in classification: continuous validation in predictive models

被引:0
作者
Bueno, Carlos Renato [1 ]
Sordan, Juliano Endrigo [1 ]
Oprime, Pedro Carlos [1 ]
Vicentin, Damaris Chieregato [1 ]
Conde, Giovanni Claudio Pinto [1 ]
机构
[1] Univ Fed Sao Carlos, Dept Prod Engn, Sao Carlos, Brazil
关键词
Industry; 4.0; Big data; Data mining; Predictive models; Continuous validation; CONTROL CHART; PERFORMANCE; INDUSTRY; SYSTEM; COEFFICIENT; UNIVARIATE; CAPABILITY; AGREEMENT; MORTALITY;
D O I
10.1108/BIJ-02-2024-0171
中图分类号
C93 [管理学];
学科分类号
12 ; 1201 ; 1202 ; 120202 ;
摘要
PurposeThis study aims to analyze the performance of quality indices to continuously validate a predictive model focused on the control chart classification.Design/methodology/approachThe research method used analytical statistical methods to propose a classification model. The project science research concepts were integrated with the statistical process monitoring (SPM) concepts using the modeling methods applied in the data science (DS) area. For the integration development, SPM Phases I and II were associated, generating models with a structured data analysis process, creating a continuous validation approach.FindingsValidation was performed by simulation and analytical techniques applied to the Cohen's Kappa index, supported by voluntary comparisons in the Matthews correlation coefficient (MCC) and the Youden index, generating prescriptive criteria for the classification. Kappa-based control charts performed well for m = 5 sample amounts and n = 500 sizes when Pe is less than 0.8. The simulations also showed that Kappa control requires fewer samples than the other indices studied.Originality/valueThe main contributions of this study to both theory and practitioners is summarized as follows: (1) it proposes DS and SPM integration; (2) it develops a tool for continuous predictive classification models validation; (3) it compares different indices for model quality, indicating their advantages and disadvantages; (4) it defines sampling criteria and procedure for SPM application considering the technique's Phases I and II and (5) the validated approach serves as a basis for various analyses, enabling an objective comparison among all alternative designs.
引用
收藏
页数:28
相关论文
共 98 条
[1]  
Acosta-Mejia CA, 1999, IIE TRANS, V31, P509, DOI 10.1080/07408179908969854
[3]   Prognosis and prognostic research: validating a prognostic model [J].
Altman, Douglas G. ;
Vergouwe, Yvonne ;
Royston, Patrick ;
Moons, Karel G. M. .
BMJ-BRITISH MEDICAL JOURNAL, 2009, 338 :1432-1435
[4]  
[Anonymous], 2020, APPL SCI-BASEL, DOI DOI 10.3390/app10010308
[5]  
Austin P.C., 2008, Statistics in Medicine, V26, P4267
[6]  
Ayankoya K., 2014, Intrinsic Relations between Data Science, Big Data, Business Analytics and Datafication, P192
[7]   About the relationship between ROC curves and Cohen's kappa [J].
Ben-David, Arie .
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2008, 21 (06) :874-882
[8]   On flexible Statistical Process Control with Artificial Intelligence: Classification control charts [J].
Boaventura, Laion Lima ;
Ferreira, Paulo Henrique ;
Fiaccone, Rosemeire Leovigildo .
EXPERT SYSTEMS WITH APPLICATIONS, 2022, 194
[9]   Evaluation of two atmospheric models for wind-wave modelling in the NW Mediterranean [J].
Bolanos-Sanchez, R. ;
Sanchez-Arcilla, A. ;
Cateura, J. .
JOURNAL OF MARINE SYSTEMS, 2007, 65 (1-4) :336-353
[10]   Optimal classifier for imbalanced data using Matthews Correlation Coefficient metric [J].
Boughorbel, Sabri ;
Jarray, Fethi ;
El-Anbari, Mohammed .
PLOS ONE, 2017, 12 (06)