Semi-Supervised Robust Hidden Markov Regression for Large-Scale Time-Series Industrial Data Analytics and Its Applications to Soft Sensing

被引:0
|
作者
Shao, Weiming [1 ]
Han, Wenxue [1 ]
Xiao, Chuanfa [1 ]
Chen, Lei [2 ]
Yu, Meng-Qin [3 ]
Chen, Junghui [3 ]
机构
[1] China Univ Petr, Coll New Energy, Qingdao 266580, Peoples R China
[2] China Univ Petr, Coll Pipeline & Civil Engn, Qingdao 266580, Peoples R China
[3] Chung Yuan Christian Univ, Dept Chem Engn, Taoyuan 32023, Taiwan
基金
中国国家自然科学基金;
关键词
Soft sensor; robust hidden Markov regression; semi-supervised learning; large-scale time-series data; distributed computing; SENSORS; MODEL;
D O I
10.1109/TASE.2024.3417019
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Hidden Markov models (HMMs) for time-series data analysis are attracting wide interests in industries due to their ability to model the extensively existing dynamics and non-Gaussianities. In this paper, with the focus on industrial soft sensor applications, a semi-supervised robust hidden Markov regression (SsRHMR) model is first proposed to improve the performance of the HMMs in two challenging industrial scenarios, i.e., the scarcity of labeled samples and outlying data, which may prevent the HMMs from learning well-suited parameters. Furthermore, a distributed learning algorithm for the SsRHMR (termed D-SsRHMR) is developed to overcome the limitations of the HMMs in modeling large-scale time-series data, namely computational complexity and inability of handling long-period missing values. Performance evaluations of both the SsRHMR and D-SsRHMR are presented using a synthetic case and an actual process, based on which the effectiveness and feasibility of the proposed models and learning algorithms in improving the prediction accuracy and in accelerating the training speed have been demonstrated. Note to Practitioners-Before applying the SsRHMR to industrial soft sensing, we advise to first select features based on the process mechanisms and expert knowledge. That is, to carefully select the secondary variables so as to reduce the dimensionality of the input space. This is because, in general the lower the dimensionality of the secondary variables, the more accurate the estimated distributions of the secondary variables and the more efficient the training process for the SsRHMR. In addition, the D-SsRHMR would benefit from equal-sized subsets, since the efficiency of the distributed learning algorithm depends on the most computationally demanding slave computer, such as the one processing the largest number of data. Therefore, practically it is preferable for the D-SsRHMR to partition the entire time-series dataset with as equal size as possible.
引用
收藏
页码:1 / 15
页数:15
相关论文
共 3 条
  • [1] A Fast Semi-Supervised Clustering Framework for Large-Scale Time Series Data
    He, Guoliang
    Pan, Yanzhou
    Xia, Xuewen
    He, Jinrong
    Peng, Rong
    Xiong, Neal N.
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2021, 51 (07): : 4201 - 4216
  • [2] Developing a Visual Analytics Tool for Large-scale Proteomics Time-series Data
    Jenny Vuong
    Stolte, Christian
    Kaur, Sandeep
    O'Donoghue, Sean
    2016 INTERNATIONAL SYMPOSIUM ON BIG DATA VISUAL ANALYTICS (BDVA), 2016, : 68 - 69
  • [3] An Efficient Organization Method for Large-Scale and Long Time-Series Remote Sensing Data in a Cloud Computing Environment
    Yan, Jining
    Liu, Yuanxing
    Wang, Lizhe
    Wang, Zhipeng
    Huang, Xiaohui
    Liu, Hong
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2021, 14 : 9350 - 9363