Handling Missing Data with Markov Boundary

被引:0
作者
Mohammed, Azhar [1 ]
Nguyen, Dang [1 ]
Duong, Bao [1 ]
Nichols, Melanie [1 ]
Nguyen, Thin [1 ]
机构
[1] Deakin Univ, Appl Artificial Intelligence Inst A2I2, Geelong, Vic, Australia
来源
ADVANCED DATA MINING AND APPLICATIONS (ADMA 2022), PT I | 2022年 / 13725卷
基金
澳大利亚国家健康与医学研究理事会;
关键词
Data imputation; Causal graphical model; Markov boundary; DATA IMPUTATION; ALGORITHMS; SELECTION;
D O I
10.1007/978-3-031-22064-7_24
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In machine learning (ML) applications, high-quality data are very important to train a well-performed model that can provide robust predictions and responsible decisions. A common problem in ML applications e.g., healthcare is that the training dataset often consists of samples (or records) with missing values. As a result, the ML model cannot use such samples in its training phase. Handling missing data is thus an important and open research problem. In this paper, we propose a method to predict missing values by considering a causal graphical model framework. Our method exploits the Markov boundary encapsulating all necessary information about the missing variables. By utilizing the information encoded in the Markov boundary, we formulate a predictive function for each feature that has missing values to predict its missing values. Compared to existing methods, our predictive function is trained with only the features involved in the Markov boundary. To demonstrate the effectiveness of our proposed method, we compare its imputation performance with those of state-of-the-art imputation methods via a comprehensive experiment on seven real-world datasets. Our empirical results highlight that our method is significantly better than those of the baselines in terms of the imputation error thanks to its Markov information.
引用
收藏
页码:319 / 333
页数:15
相关论文
共 50 条
  • [11] Event Classification with Imbalanced and Missing Data for an Air-Handling Unit
    Huotari, Matti
    Framling, Kary
    2022 IEEE THE 5TH INTERNATIONAL CONFERENCE ON BIG DATA AND ARTIFICIAL INTELLIGENCE (BDAI 2022), 2022, : 82 - 86
  • [12] Markov Boundary Learning With Streaming Data for Supervised Classification
    Liu, Chaofan
    Yang, Shuai
    Yu, Kui
    IEEE ACCESS, 2020, 8 : 102222 - 102234
  • [13] A survey on missing data in machine learning
    Emmanuel, Tlamelo
    Maupong, Thabiso
    Mpoeleng, Dimane
    Semong, Thabo
    Mphago, Banyatsang
    Tabona, Oteng
    JOURNAL OF BIG DATA, 2021, 8 (01)
  • [14] A hidden Markov model for continuous longitudinal data with missing responses and dropout
    Pandolfi, Silvia
    Bartolucci, Francesco
    Pennoni, Fulvia
    BIOMETRICAL JOURNAL, 2023, 65 (05)
  • [15] Handling missing data in near real-time environmental monitoring: A system and a review of selected methods
    Zhang, Yifan
    Thorburn, Peter J.
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2022, 128 : 63 - 72
  • [16] A Safe-Region Imputation Method for Handling Medical Data with Missing Values
    Huang, Shu-Fen
    Cheng, Ching-Hsue
    SYMMETRY-BASEL, 2020, 12 (11): : 1 - 19
  • [17] Swamping and masking in Markov boundary discovery
    Liu, Xuqing
    Liu, Xinsheng
    MACHINE LEARNING, 2016, 104 (01) : 25 - 54
  • [18] Handling Missing Values in Local Post-hoc Explainability
    Cinquini, Martina
    Giannotti, Fosca
    Guidotti, Riccardo
    Mattei, Andrea
    EXPLAINABLE ARTIFICIAL INTELLIGENCE, XAI 2023, PT II, 2023, 1902 : 256 - 278
  • [19] Tensor-Based Methods for Handling Missing Data in Quality-of-Life Questionnaires
    Garg, Lalit
    Dauwels, Justin
    Earnest, Arul
    Leong, Khai Pang
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2014, 18 (05) : 1571 - 1580
  • [20] Handling high-dimensional data with missing values by modern machine learning techniques
    Chen, Sixia
    Xu, Chao
    JOURNAL OF APPLIED STATISTICS, 2023, 50 (03) : 786 - 804