Meta-learning for vessel time series data imputation method recommendation

被引:2
作者
Fatyanosa, Tirana Noor [1 ,2 ]
Firdausanti, Neni Alya [1 ]
Prayoga, Putu Hangga Nan [3 ]
Kuriu, Minoki [1 ]
Aritsugi, Masayoshi [1 ]
Mendonca, Israel [1 ]
机构
[1] Kumamoto Univ, 2 Chome,39-1 Kurokami,Chuo ku, Kumamoto 8608555, Japan
[2] Brawijaya Univ, Malang 65145, Jawa Timur, Indonesia
[3] MTI Co Ltd, Yusen Bldg,3-2 Marunouchi,2 Chome,Chiyoda ku, Tokyo, 1000005, Japan
关键词
Time series; Data imputation; Data preprocessing; Meta-learning; MISSING DATA; RECOVERY; SYSTEM;
D O I
10.1016/j.eswa.2024.124016
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A missing data problem is inevitable when collecting time series datasets from marine sensors. Due to this, sensor data is not reliable enough to assist decision-making. To impute missing values, a number of methods are available. Choosing the best imputation method, however, is not a trivial task, as it usually involves domain expertise and trial-and-error iterations. Additionally, if imputations are done carelessly, they produce a high error, resulting in incorrect assumptions by stakeholders. In this paper, a meta-learning approach is presented that can be used to extract characteristics of the underlying data, and based on that, a less error- prone imputation method is recommended. Ten commercial ocean-going vessel datasets are used to evaluate our proposed method. A total of 29,527 data samples were generated, comprising 22 inputs and 1 target. The proposed method achieves a weighted F1-Score of 87.5% when utilizing stratified 10-fold cross-validation. Our approach can improve the average imputation score up to 86%, with the worst-case improvement being 5%. This demonstrates that our proposed approach is efficient and effective in recommending the best imputation methods.
引用
收藏
页数:16
相关论文
共 57 条
  • [1] Missing Data Imputation in the Internet of Things Sensor Networks
    Agbo, Benjamin
    Al-Aqrabi, Hussain
    Hill, Richard
    Alsboui, Tariq
    [J]. FUTURE INTERNET, 2022, 14 (05)
  • [2] Automatic gap-filling of daily streamflow time series in data-scarce regions using a machine learning algorithm
    Arriagada, Pedro
    Karelovic, Bruno
    Link, Oscar
    [J]. JOURNAL OF HYDROLOGY, 2021, 598
  • [3] Imputation of missing data with class imbalance using conditional generative adversarial networks
    Awan, Saqib Ejaz
    Bennamoun, Mohammed
    Sohel, Ferdous
    Sanfilippo, Frank
    Dwivedi, Girish
    [J]. NEUROCOMPUTING, 2021, 453 : 164 - 171
  • [4] Balakrishnan S. M., 2018, Intelligent datacentric systems, Computational intelligence for multimedia big data on the cloud with engineering applications, P135, DOI [10.1016/B978-0-12-813314-9.00006-2, DOI 10.1016/B978-0-12-813314-9.00006-2]
  • [5] Bashir F., 2019, Handling of missing values in static and dynamic data sets
  • [6] A Novel Missing Data Imputation Approach for Time Series Air Quality Data Based on Logistic Regression
    Chen, Mei
    Zhu, Hongyu
    Chen, Yongxu
    Wang, Youshuai
    [J]. ATMOSPHERE, 2022, 13 (07)
  • [7] Chong A., 2016, SIMBUILD C ASHRAE IB, V6, P407
  • [8] NEAREST NEIGHBOR PATTERN CLASSIFICATION
    COVER, TM
    HART, PE
    [J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 1967, 13 (01) : 21 - +
  • [9] A recommendation system for meta-modeling: A meta-learning based approach
    Cui, Can
    Hu, Mengqi
    Weir, Jeffery D.
    Wu, Teresa
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2016, 46 : 33 - 44
  • [10] Cukierski W., 2012, Titanic: machine learning from disaster