Data-driven approach for labelling process plant event data

被引:4
|
作者
Correa, Debora [1 ,2 ]
Polpo, Adriano [2 ,3 ]
Small, Michael [1 ,2 ,4 ]
Srikanth, Shreyas [5 ]
Hollins, Kylie [2 ,5 ]
Hodkiewicz, Melinda [2 ,6 ]
机构
[1] Univ Western Australia, Complex Syst Grp, Dept Math & Stat, Crawley, WA 6009, Australia
[2] Univ Western Australia, ARC Ind Transformat Training Ctr Transforming Mai, Crawley, WA 6009, Australia
[3] Univ Western Australia, Dept Math & Stat, Crawley, WA 6009, Australia
[4] CSIRO, Mineral Resources, Kensington, WA 6151, Australia
[5] Alcoa Australia, Continuous Improvement Ctr Excellence, Booragoon, WA 6154, Australia
[6] Univ Western Australia, Sch Engn, Crawley, WA 6009, Australia
基金
澳大利亚研究理事会;
关键词
CLUSTERS; NUMBER;
D O I
10.36001/IJPHM.2022.v13i1.3045
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
An essential requirement in any data analysis is to have a response variable representing the aim of the analysis. Much academic work is based on laboratory or simulated data, where the experiment is controlled, and the ground truth clearly defined. This is seldom the reality for equipment performance in an industrial environment and it is common to find issues with the response variable in industry situations. We discuss this matter using a case study where the problem is to detect an asset event (failure) using data available but for which no ground truth is available from historical records. Our data frame contains measurements of 14 sensors recorded every minute from a process control system and 4 current motors on the asset of interest over a three year period. In this situation the "how to" label the event of interest is of fundamental importance. Different labelling strategies will generate different models with direct impact on the in-service fault detection efficacy of the resulting model. We discuss a data-driven approach to label a binary response variable (fault/anomaly detection) and compare it to a rule-based approach. Labelling of the time series was performed using dynamic time warping followed by agglomerative hierarchical clustering to group events with similar event dynamics. Both data sets have significant imbalance with 1,200,000 non-event data but only 150 events in the rule-based data set and 64 events in the data-driven data set. We study the performance of the models based on these two different labelling strategies, treating each data set independently. We describe decisions made in window-size selection, managing imbalance, hyper-parameter tuning, training and test selection, and use two models, logistic regression and random forest for event detection. We estimate useful models for both data sets. By useful, we understand that we could detect events for the first four months in the test set. However as the months progressed the performance of both models deteriorated, with an increasing number of false positives, reflecting possible changes in dynamics of the system. This work raises questions such as "what are we detecting?" and "is there a right way to label?" and presents a data driven approach to support labelling of historical events in process plant data for event detection in the absence of ground truth data.
引用
收藏
页数:15
相关论文
共 50 条
  • [21] Data-driven process planning for shipbuilding
    Bao, Jinsong
    Zheng, Xiaohu
    Zhang, Jianguo
    Ji, Xia
    Zhang, Jie
    AI EDAM-ARTIFICIAL INTELLIGENCE FOR ENGINEERING DESIGN ANALYSIS AND MANUFACTURING, 2018, 32 (01): : 122 - 130
  • [22] Data-driven business process similarity
    Amiri, Mohammad Javad
    Koupaee, Mahnaz
    IET SOFTWARE, 2017, 11 (06) : 309 - 318
  • [23] A Data-driven Process Recommender Framework
    Yang, Sen
    Dong, Xin
    Sun, Leilei
    Zhou, Yichen
    Farneth, Richard A.
    Xiong, Hui
    Burd, Randall S.
    Marsic, Ivan
    KDD'17: PROCEEDINGS OF THE 23RD ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2017, : 2111 - 2120
  • [24] VOXEL LABELLING IN CT IMAGES WITH DATA-DRIVEN CONTEXTUAL FEATURES
    Dang, Kang
    Yuan, Junsong
    Tiong, Ho Yee
    2013 20TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP 2013), 2013, : 680 - 684
  • [25] Optimizing Cooling Load in a Central Chiller Plant: A Data-Driven Approach
    Souza, Diego de M.
    Stockar, Stephanie
    IFAC PAPERSONLINE, 2024, 58 (28): : 893 - 898
  • [26] Flexibility of data-driven process structures
    Mueller, Dominic
    Reichert, Manfred
    Herbst, Joachim
    BUSINESS PROCESS MANAGEMENT WORKSHOPS, 2006, 4103 : 181 - 192
  • [27] A Data-Driven Approach to SAR Data-Focusing
    Guaragnella, Cataldo
    D'Orazio, Tiziana
    SENSORS, 2019, 19 (07):
  • [28] A Data-Driven Approach for Improving Energy Efficiency in a Semiconductor Manufacturing Plant
    Hong, Zhao
    Yong, Chew Ze
    Lucky, Kosasih
    Rong, Goh Jun
    Joheng, Wang
    IEEE TRANSACTIONS ON SEMICONDUCTOR MANUFACTURING, 2024, 37 (04) : 475 - 480
  • [29] Scanning the horizon for invasive plant threats using a data-driven approach
    Kendig, Amy E.
    Canavan, Susan
    Anderson, Patti J.
    Flory, S. Luke
    Gettys, Lyn A.
    Gordon, Doria R.
    Iannone III, Basil V.
    Kunzer, John M.
    Petri, Tabitha
    Pfingsten, Ian A.
    Lieurance, Deah
    NEOBIOTA, 2022, 74 : 129 - 154
  • [30] A Data-Driven Approach for GPS Trajectory Data Cleaning
    Li, Lun
    Chen, Xiaohang
    Liu, Qizhi
    Bao, Zhifeng
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2020), PT I, 2020, 12112 : 3 - 19