Handling Missing Data with Markov Boundary

被引:0
|
作者
Mohammed, Azhar [1 ]
Nguyen, Dang [1 ]
Duong, Bao [1 ]
Nichols, Melanie [1 ]
Nguyen, Thin [1 ]
机构
[1] Deakin Univ, Appl Artificial Intelligence Inst A2I2, Geelong, Vic, Australia
来源
ADVANCED DATA MINING AND APPLICATIONS (ADMA 2022), PT I | 2022年 / 13725卷
基金
澳大利亚国家健康与医学研究理事会;
关键词
Data imputation; Causal graphical model; Markov boundary; DATA IMPUTATION; ALGORITHMS; SELECTION;
D O I
10.1007/978-3-031-22064-7_24
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In machine learning (ML) applications, high-quality data are very important to train a well-performed model that can provide robust predictions and responsible decisions. A common problem in ML applications e.g., healthcare is that the training dataset often consists of samples (or records) with missing values. As a result, the ML model cannot use such samples in its training phase. Handling missing data is thus an important and open research problem. In this paper, we propose a method to predict missing values by considering a causal graphical model framework. Our method exploits the Markov boundary encapsulating all necessary information about the missing variables. By utilizing the information encoded in the Markov boundary, we formulate a predictive function for each feature that has missing values to predict its missing values. Compared to existing methods, our predictive function is trained with only the features involved in the Markov boundary. To demonstrate the effectiveness of our proposed method, we compare its imputation performance with those of state-of-the-art imputation methods via a comprehensive experiment on seven real-world datasets. Our empirical results highlight that our method is significantly better than those of the baselines in terms of the imputation error thanks to its Markov information.
引用
收藏
页码:319 / 333
页数:15
相关论文
共 50 条
  • [1] Handling missing wind data using a modified Markov Chains approach
    Siripitayananon, P
    Chen, HC
    Jin, KR
    6TH WORLD MULTICONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL XVIII, PROCEEDINGS: INFORMATION SYSTEMS, CONCEPTS AND APPLICATIONS OF SYSTEMICS, CYBERNETICS AND INFORMATICS, 2002, : 468 - 473
  • [2] Handling of Missing Data
    Budhiraja, Pooja
    Kaplan, Bruce
    Mustafa, Reem A.
    TRANSPLANTATION, 2020, 104 (01) : 24 - 26
  • [3] HANDLING OF MISSING DATA
    Torres, F.
    BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2011, 109 : 17 - 17
  • [4] Handling missing data
    不详
    CURRENT PROBLEMS IN CANCER, 2005, 29 (06) : 317 - 325
  • [5] PROPOSAL FOR HANDLING MISSING DATA
    GLEASON, TC
    STAELIN, R
    PSYCHOMETRIKA, 1975, 40 (02) : 229 - 252
  • [6] Conservative handling of missing data
    Berger, Vance W.
    CONTEMPORARY CLINICAL TRIALS, 2012, 33 (03) : 460 - 460
  • [7] The prevention and handling of the missing data
    Kang, Hyun
    KOREAN JOURNAL OF ANESTHESIOLOGY, 2013, 64 (05) : 402 - 406
  • [8] Best Practices for Handling Missing Data
    Srijan, Shukla
    Rajagopalan, Iyer R.
    ANNALS OF SURGICAL ONCOLOGY, 2024, 31 (01) : 12 - 13
  • [9] Handling Missing Data in CGM Records
    Zulj, Sara
    Carvalho, Paulo
    Ribeiro, Rogerio
    Magjarevic, Ratko
    FUTURE TRENDS IN BIOMEDICAL AND HEALTH INFORMATICS AND CYBERSECURITY IN MEDICAL DEVICES, ICBHI 2019, 2020, 74 : 420 - 427
  • [10] Handling missing values in trait data
    Johnson, Thomas F.
    Isaac, Nick J. B.
    Paviolo, Agustin
    Gonzalez-Suarez, Manuela
    GLOBAL ECOLOGY AND BIOGEOGRAPHY, 2021, 30 (01): : 51 - 62