Measuring the Robustness of ML Models Against Data Quality Issues in Industrial Time Series Data

被引:2
作者
Dix, Marcel [1 ]
Manca, Gianluca [1 ]
Okafor, Kenneth Chigozie [2 ]
Borrison, Reuben [1 ]
Kirchheim, Konstantin [2 ]
Sharma, Divyasheel [3 ]
Chandrika, K. R. [3 ]
Maduskar, Deepti [3 ]
Ortmeier, Frank [2 ]
机构
[1] ABB Corp Res Ctr, Ind AI, Ladenburg, Germany
[2] Otto von Guericke Univ, Dept Comp Sci, Magdeburg, Germany
[3] ABB Corp Res Ctr, Ind Software Res, Bangalore, Karnataka, India
来源
2023 IEEE 21ST INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS, INDIN | 2023年
关键词
Data quality; time series data; ML model robustness testing;
D O I
10.1109/INDIN51400.2023.10218129
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The performance of machine learning models can be significantly impacted by variations in data quality. Typically, conventional model testing does not examine how robust the model would be in the face of potential data quality deterioration. In an industrial use case, however, data quality is a pertinent issue, as sensors are susceptible to a variety of technical and external issues that may result in poor data quality over time. In order to develop robust machine learning models, industrial data scientists must understand the sensitivity of their models against data quality issues, through the application of an appropriate and comprehensive testing solution. In this work, we propose a generic framework for systematically analyzing the impact of data quality issues on the performance of machine learning models by intentionally applying gradual perturbations to the original time series data. The evaluation is performed using a benchmark industrial process consisting of multivariate time series from sensors in a complex chemical process.
引用
收藏
页数:8
相关论文
共 29 条
  • [1] Revision of the Tennessee Eastman Process Model
    Bathelt, Andreas
    Ricker, N. Lawrence
    Jelali, Mohieddine
    [J]. IFAC PAPERSONLINE, 2015, 48 (08): : 309 - 314
  • [2] Batini C., 2016, DATA INFORM QUALITY
  • [3] Methodologies for Data Quality Assessment and Improvement
    Batini, Carlo
    Cappiello, Cinzia
    Francalanci, Chiara
    Maurino, Andrea
    [J]. ACM COMPUTING SURVEYS, 2009, 41 (03)
  • [4] Binary Shapelet Transform for Multiclass Time Series Classification
    Bostrom, Aaron
    Bagnall, Anthony
    [J]. TRANSACTIONS ON LARGE-SCALE DATA- AND KNOWLEDGE-CENTERED SYSTEMS XXXII, 2017, 10420 : 24 - 46
  • [5] Budach L, 2022, Arxiv, DOI [arXiv:2207.14529, DOI 10.48550/ARXIV.2207.14529, 10.48550/arXiv.2207.14529, 10.48550/ARXIV.2207.14529]
  • [6] Dix Marcel, 2022, International Conference on Deep Learning, Big Data and Blockchain (Deep-BDB 2021). Lecture Notes in Networks and Systems (309), P15, DOI 10.1007/978-3-030-84337-3_2
  • [7] A PLANT-WIDE INDUSTRIAL-PROCESS CONTROL PROBLEM
    DOWNS, JJ
    VOGEL, EF
    [J]. COMPUTERS & CHEMICAL ENGINEERING, 1993, 17 (03) : 245 - 255
  • [8] Ehrlinger Lisa, 2022, FRONT BIG DATA, P28
  • [9] Gitzel R., 2016, 18 IEEE C BUS INF IN, P41
  • [10] Requirements for Data Quality Metrics
    Heinrich, Bernd
    Hristova, Diana
    Klier, Mathias
    Schiller, Alexander
    Szubartowicz, Michael
    [J]. ACM JOURNAL OF DATA AND INFORMATION QUALITY, 2018, 9 (02):