Handling Missing Data with Markov Boundary

被引:0
|
作者
Mohammed, Azhar [1 ]
Nguyen, Dang [1 ]
Duong, Bao [1 ]
Nichols, Melanie [1 ]
Nguyen, Thin [1 ]
机构
[1] Deakin Univ, Appl Artificial Intelligence Inst A2I2, Geelong, Vic, Australia
来源
ADVANCED DATA MINING AND APPLICATIONS (ADMA 2022), PT I | 2022年 / 13725卷
基金
澳大利亚国家健康与医学研究理事会;
关键词
Data imputation; Causal graphical model; Markov boundary; DATA IMPUTATION; ALGORITHMS; SELECTION;
D O I
10.1007/978-3-031-22064-7_24
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In machine learning (ML) applications, high-quality data are very important to train a well-performed model that can provide robust predictions and responsible decisions. A common problem in ML applications e.g., healthcare is that the training dataset often consists of samples (or records) with missing values. As a result, the ML model cannot use such samples in its training phase. Handling missing data is thus an important and open research problem. In this paper, we propose a method to predict missing values by considering a causal graphical model framework. Our method exploits the Markov boundary encapsulating all necessary information about the missing variables. By utilizing the information encoded in the Markov boundary, we formulate a predictive function for each feature that has missing values to predict its missing values. Compared to existing methods, our predictive function is trained with only the features involved in the Markov boundary. To demonstrate the effectiveness of our proposed method, we compare its imputation performance with those of state-of-the-art imputation methods via a comprehensive experiment on seven real-world datasets. Our empirical results highlight that our method is significantly better than those of the baselines in terms of the imputation error thanks to its Markov information.
引用
收藏
页码:319 / 333
页数:15
相关论文
共 50 条
  • [21] Handling missing data in diaries of alcohol consumption
    Longford, NT
    Ely, M
    Hardy, R
    Wadsworth, MEJ
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES A-STATISTICS IN SOCIETY, 2000, 163 : 381 - 402
  • [22] The Handling of Missing Data in Molecular Epidemiology Studies
    Desai, Manisha
    Kubo, Jessica
    Esserman, Denise
    Terry, Mary Beth
    CANCER EPIDEMIOLOGY BIOMARKERS & PREVENTION, 2011, 20 (08) : 1571 - 1579
  • [23] Handling missing data in clinical trials: An overview
    Myers, WR
    DRUG INFORMATION JOURNAL, 2000, 34 (02): : 525 - 533
  • [24] Strategies for handling missing data in randomised trials
    Ian R White
    Trials, 12 (Suppl 1)
  • [25] The handling of missing binary data in language research
    Pichette, Francois
    Beland, Sebastien
    Jolani, Shahab
    Lesniewska, Justyna
    STUDIES IN SECOND LANGUAGE LEARNING AND TEACHING, 2015, 5 (01) : 153 - 169
  • [26] Missing and Incomplete Data Handling in Cybersecurity Applications
    Pawlicki, Marek
    Choras, Michal
    Kozik, Rafal
    Holubowicz, Witold
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2021, 2021, 12672 : 413 - 426
  • [27] Handling Missing Data in Growth Mixture Models
    Lee, Daniel Y. Y.
    Harring, Jeffrey R.
    JOURNAL OF EDUCATIONAL AND BEHAVIORAL STATISTICS, 2023, 48 (03) : 320 - 348
  • [28] Methods for Handling Missing Secondary Respondent Data
    Young, Rebekah
    Johnson, David
    JOURNAL OF MARRIAGE AND FAMILY, 2013, 75 (01) : 221 - 234
  • [29] Comparison of Methods for Handling Missing Covariate Data
    Åsa M. Johansson
    Mats O. Karlsson
    The AAPS Journal, 2013, 15 : 1232 - 1241
  • [30] A comparison of imputation techniques for handling missing data
    Musil, CM
    Warner, CB
    Yobas, PK
    Jones, SL
    WESTERN JOURNAL OF NURSING RESEARCH, 2002, 24 (07) : 815 - 829