Classification of mislabelled microarrays using robust sparse logistic regression

被引:31
|
作者
Bootkrajang, Jakramate [1 ]
Kaban, Ata [1 ]
机构
[1] Univ Birmingham, Sch Comp Sci, Birmingham B15 2TT, W Midlands, England
关键词
DISCRIMINANT-ANALYSIS; INITIAL SAMPLES; GENE SELECTION; CANCER;
D O I
10.1093/bioinformatics/btt078
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Previous studies reported that labelling errors are not uncommon in microarray datasets. In such cases, the training set may become misleading, and the ability of classifiers to make reliable inferences from the data is compromised. Yet, few methods are currently available in the bioinformatics literature to deal with this problem. The few existing methods focus on data cleansing alone, without reference to classification, and their performance crucially depends on some tuning parameters. Results: In this article, we develop a new method to detect mislabelled arrays simultaneously with learning a sparse logistic regression classifier. Our method may be seen as a label-noise robust extension of the well-known and successful Bayesian logistic regression classifier. To account for possible mislabelling, we formulate a label-flipping process as part of the classifier. The regularization parameter is automatically set using Bayesian regularization, which not only saves the computation time that cross-validation would take, but also eliminates any unwanted effects of label noise when setting the regularization parameter. Extensive experiments with both synthetic data and real microarray datasets demonstrate that our approach is able to counter the bad effects of labelling errors in terms of predictive performance, it is effective at identifying marker genes and simultaneously it detects mislabelled arrays to high accuracy.
引用
收藏
页码:870 / 877
页数:8
相关论文
共 50 条
  • [31] Parametric classification with soft labels using the evidential EM algorithm: linear discriminant analysis versus logistic regression
    Quost, Benjamin
    Denoeux, Thierry
    Li, Shoumei
    ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2017, 11 (04) : 659 - 690
  • [32] Parametric classification with soft labels using the evidential EM algorithm: linear discriminant analysis versus logistic regression
    Benjamin Quost
    Thierry Denœux
    Shoumei Li
    Advances in Data Analysis and Classification, 2017, 11 : 659 - 690
  • [33] Classification of Breast Cancer Histopathological Images using Adaptive Penalized Logistic Regression with Wilcoxon Rank Sum Test
    Alkahya, Mohammed Abdulrazaq
    Alreahan, Hussein Obeid
    Algamal, Zakariya Yahya
    ELECTRONIC JOURNAL OF APPLIED STATISTICAL ANALYSIS, 2023, 16 (03) : 507 - 518
  • [34] A novel active learning approach for the classification of hyperspectral imagery using quasi-Newton multinomial logistic regression
    Tan, Kun
    Wang, Xue
    Zhu, Jishuai
    Hu, Jun
    Li, Jun
    INTERNATIONAL JOURNAL OF REMOTE SENSING, 2018, 39 (10) : 3029 - 3054
  • [35] Network-Regularized Sparse Logistic Regression Models for Clinical Risk Prediction and Biomarker Discovery
    Min, Wenwen
    Liu, Juan
    Zhang, Shihua
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2018, 15 (03) : 944 - 953
  • [36] Sparse Bayesian multinomial probit regression model with correlation prior for high-dimensional data classification
    Yang Aijun
    Jiang Xuejun
    Liu Pengfei
    Lin Jinguan
    STATISTICS & PROBABILITY LETTERS, 2016, 119 : 241 - 247
  • [37] RAMRSGL: A Robust Adaptive Multinomial Regression Model for Multicancer Classification
    Wang, Lei
    Li, Juntao
    Liu, Juanfang
    Chang, Mingming
    COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE, 2021, 2021
  • [38] The use of discriminant analysis, logistic regression and classification tree analysis in the development of classification models for human health effects
    Worth, AP
    Cronin, MTD
    JOURNAL OF MOLECULAR STRUCTURE-THEOCHEM, 2003, 622 (1-2): : 97 - 111
  • [39] Prediction of Protein Solubility in Escherichia coli Using Logistic Regression
    Diaz, Armando A.
    Tomba, Emanuele
    Lennarson, Reese
    Richard, Rex
    Bagajewicz, Miguel J.
    Harrison, Roger G.
    BIOTECHNOLOGY AND BIOENGINEERING, 2010, 105 (02) : 374 - 383
  • [40] Bankruptcy prediction using Partial Least Squares Logistic Regression
    Ben Jabeur, Sami
    JOURNAL OF RETAILING AND CONSUMER SERVICES, 2017, 36 : 197 - 202