Measuring the Robustness of ML Models Against Data Quality Issues in Industrial Time Series Data

被引：2

作者：

Dix, Marcel ^{[1
]}

Manca, Gianluca ^{[1
]}

Okafor, Kenneth Chigozie ^{[2
]}

Borrison, Reuben ^{[1
]}

Kirchheim, Konstantin ^{[2
]}

Sharma, Divyasheel ^{[3
]}

Chandrika, K. R. ^{[3
]}

Maduskar, Deepti ^{[3
]}

Ortmeier, Frank ^{[2
]}

机构：

[1] ABB Corp Res Ctr, Ind AI, Ladenburg, Germany

[2] Otto von Guericke Univ, Dept Comp Sci, Magdeburg, Germany

[3] ABB Corp Res Ctr, Ind Software Res, Bangalore, Karnataka, India

来源：

2023 IEEE 21ST INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS, INDIN | 2023年

关键词：

Data quality; time series data; ML model robustness testing;

D O I：

10.1109/INDIN51400.2023.10218129

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

The performance of machine learning models can be significantly impacted by variations in data quality. Typically, conventional model testing does not examine how robust the model would be in the face of potential data quality deterioration. In an industrial use case, however, data quality is a pertinent issue, as sensors are susceptible to a variety of technical and external issues that may result in poor data quality over time. In order to develop robust machine learning models, industrial data scientists must understand the sensitivity of their models against data quality issues, through the application of an appropriate and comprehensive testing solution. In this work, we propose a generic framework for systematically analyzing the impact of data quality issues on the performance of machine learning models by intentionally applying gradual perturbations to the original time series data. The evaluation is performed using a benchmark industrial process consisting of multivariate time series from sensors in a complex chemical process.

引用

页数：8

共 29 条

[1] Revision of the Tennessee Eastman Process Model
Bathelt, Andreas
Ricker, N. Lawrence
Jelali, Mohieddine
[J]. IFAC PAPERSONLINE, 2015, 48 (08): : 309 - 314
[2] Batini C., 2016, DATA INFORM QUALITY
[3] Methodologies for Data Quality Assessment and Improvement
Batini, Carlo
Cappiello, Cinzia
Francalanci, Chiara
Maurino, Andrea
[J]. ACM COMPUTING SURVEYS, 2009, 41 (03)
[4] Binary Shapelet Transform for Multiclass Time Series Classification
Bostrom, Aaron
Bagnall, Anthony
[J]. TRANSACTIONS ON LARGE-SCALE DATA- AND KNOWLEDGE-CENTERED SYSTEMS XXXII, 2017, 10420 : 24 - 46
[5] Budach L, 2022, Arxiv, DOI [arXiv:2207.14529, DOI 10.48550/ARXIV.2207.14529, 10.48550/arXiv.2207.14529, 10.48550/ARXIV.2207.14529]
[6] Dix Marcel, 2022, International Conference on Deep Learning, Big Data and Blockchain (Deep-BDB 2021). Lecture Notes in Networks and Systems (309), P15, DOI 10.1007/978-3-030-84337-3_2
[7] A PLANT-WIDE INDUSTRIAL-PROCESS CONTROL PROBLEM
DOWNS, JJ
VOGEL, EF
[J]. COMPUTERS & CHEMICAL ENGINEERING, 1993, 17 (03) : 245 - 255
[8] Ehrlinger Lisa, 2022, FRONT BIG DATA, P28
[9] Gitzel R., 2016, 18 IEEE C BUS INF IN, P41
[10] Requirements for Data Quality Metrics
Heinrich, Bernd
Hristova, Diana
Klier, Mathias
Schiller, Alexander
Szubartowicz, Michael
[J]. ACM JOURNAL OF DATA AND INFORMATION QUALITY, 2018, 9 (02):

← 1 2 3 →