Anomaly analytics in data-driven machine learning applications

被引:3
作者
Azimi, Shelernaz [1 ]
Pahl, Claus [1 ]
机构
[1] Free Univ Bozen Bolzano, Fac Engn, I-39100 Bolzano, Italy
关键词
Data quality; Machine learning; ML model quality; Anomaly detection; Data analysis; Root cause analysis; Data quality remediation; Explainable AI;
D O I
10.1007/s41060-024-00593-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Machine learning is used widely to create a range of prediction or classification models. The quality of the machine learning (ML) models depends not only on the model creation process, but also on the input data quality. We investigate here the impact of data quality on the quality of the ML model in a generic way. The aim is to identify a possible data quality problem based on observed anomalies in the ML model over time. This is achieved in the form of a root cause analysis of anomalies detected in the ML model. We develop a generic anomaly detection and analysis framework and demonstrate its application to two prediction scenarios based on sensor data.
引用
收藏
页码:155 / 180
页数:26
相关论文
共 32 条
[1]  
Azimi S, 2022, Journal of Data Intelligence, V3, P218, DOI [10.26421/jdi3.2-2, 10.26421/JDI3.2-2, DOI 10.26421/JDI3.2-2]
[2]   Continuous Data Quality Management for Machine Learning based Data-as-a-Service Architectures [J].
Azimi, Shelernaz ;
Pahl, Claus .
CLOSER: PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND SERVICES SCIENCE, 2021, :328-335
[3]   Root Cause Analysis and Remediation for Quality and Value Improvement in Machine Learning Driven Information Models [J].
Azimi, Shelernaz ;
Pahl, Claus .
PROCEEDINGS OF THE 22ND INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS (ICEIS), VOL 1, 2020, :656-665
[4]   AutoML: state of the art with a focus on anomaly detection, challenges, and research directions [J].
Bahri, Maroua ;
Salutari, Flavia ;
Putina, Andrian ;
Sozio, Mauro .
INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2022, 14 (02) :113-126
[5]  
Bhowmik P., 2022, PREPRINT
[6]   Anomaly Detection: A Survey [J].
Chandola, Varun ;
Banerjee, Arindam ;
Kumar, Vipin .
ACM COMPUTING SURVEYS, 2009, 41 (03)
[7]  
DeHoog J., 2019, CEUR WORKSH P, V2491
[8]   A DaQL to Monitor Data Quality in Machine Learning Applications [J].
Ehrlinger, Lisa ;
Haunschmid, Verena ;
Palazzini, Davide ;
Lettner, Christian .
DATABASE AND EXPERT SYSTEMS APPLICATIONS, PT I, 2019, 11706 :227-237
[9]   Large scale anomaly detection in mixed numerical and categorical input spaces [J].
Eiras-Franco, Carlos ;
Martinez-Rego, David ;
Guijarro-Berdinas, Bertha ;
Alonso-Betanzos, Amparo ;
Bahamonde, Antonio .
INFORMATION SCIENCES, 2019, 487 :115-127
[10]  
Even A., 2005, 10 INT C INF QUAL IC