DualBoost : Handling Missing Values with Feature Weights and Weak Classifiers that Abstain

被引:3
|
作者
Wang, Weihong [1 ]
Xu, Jie [1 ]
Wang, Yang [1 ]
Cai, Chen [1 ]
Chen, Fang [1 ]
机构
[1] CSIRO, Data61, Sydney, NSW, Australia
来源
CIKM'18: PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT | 2018年
关键词
Boosting; missing values; feature weights; weak classifiers that abstain;
D O I
10.1145/3269206.3269319
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Missing values in real world datasets are a common issue. Handling missing values is one of the most key aspects in data mining, as it can seriously impact the performance of predictive models. In this paper we proposed a unified Boosting framework that consolidates model construction and missing value handling. At each Boosting iteration, weights are assigned to both the samples and features. The sample weights make difficult samples become the learning focus, while the feature weights enable critical features to be compensated by less critical features when they are unavailable. A weak classifier that abstains (i.e, produce no prediction when required feature value is missing) is learned on a data subset determined by the feature weights. Experimental results demonstrate the efficacy and robustness of the proposed method over existing Boosting algorithms.
引用
收藏
页码:1543 / 1546
页数:4
相关论文
共 50 条
  • [21] A Review of Missing Values Handling Methods on Time-Series Data
    Pratama, Irfan
    Permanasari, Adhistya Erna
    Ardiyanto, Igi
    Indrayani, Rini
    PROCEEDINGS OF 2016 INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY SYSTEMS AND INNOVATION (ICITSI), 2016,
  • [22] Handling Missing Values in Information Systems Research: A Review of Methods and Assumptions
    Peng, Jiaxu
    Hahn, Jungpil
    Huang, Ke-Wei
    INFORMATION SYSTEMS RESEARCH, 2023, 34 (01) : 5 - 26
  • [23] The Effect of Methods for Handling Missing Values on the Performance of the MEWMA Control Chart
    Madbuly, Doaa F.
    Maravelakis, Petros E.
    Mahmoud, Mahmoud A.
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2013, 42 (06) : 1437 - 1454
  • [24] An investigation of solutions for handling incomplete online review datasets with missing values
    Hu, Ya-Han
    Tsai, Chih-Fong
    JOURNAL OF EXPERIMENTAL & THEORETICAL ARTIFICIAL INTELLIGENCE, 2022, 34 (06) : 971 - 987
  • [25] A novel weighted distance threshold method for handling medical missing values
    Cheng, Ching-Hsue
    Chang, Jing-Rong
    Huang, Hao-Hsuan
    COMPUTERS IN BIOLOGY AND MEDICINE, 2020, 122 (122)
  • [26] PARAFACM: A second-order calibration algorithm for handling data with missing values
    Dong, Ming-Yue
    Wu, Hai-Long
    Wang, Tong
    Huang, Kun
    Ren, Hang
    Yu, Ru-Qin
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2024, 244
  • [27] Handling missing values for mining gradual patterns from NoSQL graph databases
    Shah, Faaiz
    Castelltort, Arnaud
    Laurent, Anne
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2020, 111 : 523 - 538
  • [28] Scalable Data Quality for Big Data: The Pythia Framework for Handling Missing Values
    Cahsai, Atoshum
    Anagnostopoulos, Christos
    Triantafillou, Peter
    BIG DATA, 2015, 3 (03) : 159 - 172
  • [29] A Safe-Region Imputation Method for Handling Medical Data with Missing Values
    Huang, Shu-Fen
    Cheng, Ching-Hsue
    SYMMETRY-BASEL, 2020, 12 (11): : 1 - 19
  • [30] Effective Handling of Missing Values in Datasets for Classification Using Machine Learning Methods
    Palanivinayagam, Ashokkumar
    Damasevicius, Robertas
    INFORMATION, 2023, 14 (02)