An efficient and effective wrapper based on paired t-test for learning naive Bayes classifiers from large-scale domains

被引:7
|
作者
Kim, Chanju [1 ]
Li, Honglan [1 ]
Shin, Soo-Yong [2 ]
Hwang, Kyu-Baek [1 ]
机构
[1] Soongsil Univ, Sch Comp Sci & Engn, Seoul 156743, South Korea
[2] Univ Ulsan Coll Med, Asan Med Ctr, Dept Clin Epidemiol & Esiostsat, Seoul 138736, South Korea
来源
4TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL SYSTEMS-BIOLOGY AND BIOINFORMATICS (CSBIO2013) | 2013年 / 23卷
关键词
feature selection; wrappers; Naive Bayes classifiers; microarray data; GENE SELECTION; CLASSIFICATION; CANCER; PREDICTION;
D O I
10.1016/j.procs.2013.10.014
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Feature selection is one of the crucial steps in supervised learning, which influences the entire subsequent classification (or regression) process. The approaches to this task can largely be divided into two categories: filter-based and wrapper-based methods. Generally, the latter produces better results than the former with regard to given learning methods, though it consumes more computational resources for searches over the feature subset space. In this paper, we propose an Efficient wRapper based on a Paired t-Test (ERPT) for choosing features from large-scale data consisting of thousands of variables, such as microarrays. Statistical tests are a reasonable option when the number of features is very large because they have more predictable behavior and can be more efficient than most search methods. The proposed method consists of two phases: decrement phase and increment phase. In the decrement phase, it selects strongly relevant features. In the increment phase, it adds weakly relevant features, given the previously selected features. Our method, combined with naive Bayes classifiers, has been tested in an extensive set of experiments on University of California Irvine (UCI) Machine Learning Repository data. The results showed that the performance of the proposed method is comparable to that of the backward search-based wrapper and superior to that of the forward search-based wrapper. Furthermore, it demonstrated much better performance than the forward search-based wrapper when applied to three microarray data sets, for which the backward search-based wrapper was impractical because of the computational burden involved. The proposed method has the following three merits: (1) it is applicable to data sets having thousands of variables, (2) it provides a theoretically sound and controllable criterion for thresholding features, and (3) it finds feature subsets for the maximizing of classification performance on sparse domains. (C) 2013 The Authors. Published by Elsevier B.V.
引用
收藏
页码:102 / 112
页数:11
相关论文
共 10 条
  • [1] Effective and Efficient Feature Selection for Large-scale Data Using Bayes' Theorem
    Subramanian Appavu Alias Balamurugan
    Ramasamy Rajaram
    Machine Intelligence Research, 2009, 6 (01) : 62 - 71
  • [2] Effective and Efficient Feature Selection for Large-scale Data Using Bayes' Theorem
    Balamurugan, Subramanian Appavu Alias
    Rajaram, Ramasamy
    INTERNATIONAL JOURNAL OF AUTOMATION AND COMPUTING, 2009, 6 (01) : 62 - 71
  • [3] A modified two-sample t-test based on permutation method for large-scale data
    Salehi, Mohsen
    Mohammadpour, Adel
    Mohammadi, Mohammad
    Aminghafari, Mina
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2019, 48 (02) : 372 - 384
  • [4] Deep Learning-Based Classification and Reconstruction of Residential Scenes From Large-Scale Point Clouds
    Zhang, Liqiang
    Zhang, Liang
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2018, 56 (04): : 1887 - 1897
  • [5] Uncovering Predictors of Low Hippocampal Volume: Evidence from a Large-Scale Machine-Learning-Based Study in the UK Biobank
    Yeshaw, Yigizie
    Madakkatel, Iqbal
    Mulugeta, Anwar
    Lumsden, Amanda
    Hypponen, Elina
    NEUROEPIDEMIOLOGY, 2024, 58 (05) : 369 - 382
  • [6] A Deep Learning-Based Solution for Large-Scale Extraction of the Secondary Road Network from High-Resolution Aerial Orthoimagery
    Cira, Calimanut-Ionut
    Alcarria, Ramon
    Manso-Callejo, Miguel-Angel
    Serradilla, Francisco
    APPLIED SCIENCES-BASEL, 2020, 10 (20): : 1 - 18
  • [7] Large-scale land use/land cover extraction from Landsat imagery using feature relationships matrix based deep-shallow learning
    Dou, Peng
    Shen, Huanfeng
    Huang, Chunlin
    Li, Zhiwei
    Mao, Yujun
    Li, Xinghua
    INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2024, 129
  • [8] Large-scale investigations of Neolithic settlement dynamics in Central Germany based on machine learning analysis: A case study from the Weisse Elster river catchment
    Miera, Jan Johannes
    Schmidt, Karsten
    von Suchodoletz, Hans
    Ulrich, Mathias
    Werther, Lukas
    Zielhofer, Christoph
    Ettel, Peter
    Veit, Ulrich
    PLOS ONE, 2022, 17 (04):
  • [9] Deep Learning-Based Land Cover Extraction from Very-High-Resolution Satellite Imagery for Assisting Large-Scale Topographic Map Production
    Hakim, Yofri Furqani
    Tsai, Fuan
    REMOTE SENSING, 2025, 17 (03)
  • [10] Large-scale deep learning based binary and semantic change detection in ultra high resolution remote sensing imagery: From benchmark datasets to urban application
    Tian, Shiqi
    Zhong, Yanfei
    Zheng, Zhuo
    Ma, Ailong
    Tan, Xicheng
    Zhang, Liangpei
    ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2022, 193 : 164 - 186