Classification of mislabelled microarrays using robust sparse logistic regression

被引:31
|
作者
Bootkrajang, Jakramate [1 ]
Kaban, Ata [1 ]
机构
[1] Univ Birmingham, Sch Comp Sci, Birmingham B15 2TT, W Midlands, England
关键词
DISCRIMINANT-ANALYSIS; INITIAL SAMPLES; GENE SELECTION; CANCER;
D O I
10.1093/bioinformatics/btt078
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Previous studies reported that labelling errors are not uncommon in microarray datasets. In such cases, the training set may become misleading, and the ability of classifiers to make reliable inferences from the data is compromised. Yet, few methods are currently available in the bioinformatics literature to deal with this problem. The few existing methods focus on data cleansing alone, without reference to classification, and their performance crucially depends on some tuning parameters. Results: In this article, we develop a new method to detect mislabelled arrays simultaneously with learning a sparse logistic regression classifier. Our method may be seen as a label-noise robust extension of the well-known and successful Bayesian logistic regression classifier. To account for possible mislabelling, we formulate a label-flipping process as part of the classifier. The regularization parameter is automatically set using Bayesian regularization, which not only saves the computation time that cross-validation would take, but also eliminates any unwanted effects of label noise when setting the regularization parameter. Extensive experiments with both synthetic data and real microarray datasets demonstrate that our approach is able to counter the bad effects of labelling errors in terms of predictive performance, it is effective at identifying marker genes and simultaneously it detects mislabelled arrays to high accuracy.
引用
收藏
页码:870 / 877
页数:8
相关论文
共 50 条
  • [1] Classification of COVID19 Patients Using Robust Logistic Regression
    Ghosh, Abhik
    Jaenada, Maria
    Pardo, Leandro
    JOURNAL OF STATISTICAL THEORY AND PRACTICE, 2022, 16 (04)
  • [2] Prediction of siRNA Potency Using Sparse Logistic Regression
    Hu, Wei
    Hu, John
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2014, 21 (06) : 420 - 427
  • [3] Gene Selection in Cancer Classification Using Sparse Logistic Regression with L1/2 Regularization
    Wu, Shengbing
    Jiang, Hongkun
    Shen, Haiwei
    Yang, Ziyi
    APPLIED SCIENCES-BASEL, 2018, 8 (09):
  • [4] Classification of breast lesions in ultrasonography using sparse logistic regression and morphology-based texture features
    Nemat, Hoda
    Fehri, Hamid
    Ahmadinejad, Nasrin
    Frangi, Alejandro F.
    Gooya, Ali
    MEDICAL PHYSICS, 2018, 45 (09) : 4112 - 4124
  • [5] Large-Scale Sparse Logistic Regression
    Liu, Jun
    Chen, Jianhui
    Ye, Jieping
    KDD-09: 15TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2009, : 547 - 555
  • [6] Sparse logistic regression with a L1/2 penalty for gene selection in cancer classification
    Yong Liang
    Cheng Liu
    Xin-Ze Luan
    Kwong-Sak Leung
    Tak-Ming Chan
    Zong-Ben Xu
    Hai Zhang
    BMC Bioinformatics, 14
  • [7] Sparse logistic regression with a L1/2 penalty for gene selection in cancer classification
    Liang, Yong
    Liu, Cheng
    Luan, Xin-Ze
    Leung, Kwong-Sak
    Chan, Tak-Ming
    Xu, Zong-Ben
    Zhang, Hai
    BMC BIOINFORMATICS, 2013, 14
  • [8] Classification of the insurance sector with logistic regression
    Ruzgar, Bahadtin
    Ruzgar, Nursel Selver
    PROCEEDINGS OF THE 9TH WSEAS INTERNATIONAL CONFERENCE ON AUTOMATIC CONTROL, MODELING & SIMULATION, 2007, : 52 - +
  • [9] Robust adaptive LASSO in high-dimensional logistic regression
    Basu, Ayanendranath
    Ghosh, Abhik
    Jaenada, Maria
    Pardo, Leandro
    STATISTICAL METHODS AND APPLICATIONS, 2024,
  • [10] Nonconvex Sparse Logistic Regression With Weakly Convex Regularization
    Shen, Xinyue
    Gu, Yuantao
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2018, 66 (12) : 3199 - 3211