Classification of mislabelled microarrays using robust sparse logistic regression

被引:31
|
作者
Bootkrajang, Jakramate [1 ]
Kaban, Ata [1 ]
机构
[1] Univ Birmingham, Sch Comp Sci, Birmingham B15 2TT, W Midlands, England
关键词
DISCRIMINANT-ANALYSIS; INITIAL SAMPLES; GENE SELECTION; CANCER;
D O I
10.1093/bioinformatics/btt078
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Previous studies reported that labelling errors are not uncommon in microarray datasets. In such cases, the training set may become misleading, and the ability of classifiers to make reliable inferences from the data is compromised. Yet, few methods are currently available in the bioinformatics literature to deal with this problem. The few existing methods focus on data cleansing alone, without reference to classification, and their performance crucially depends on some tuning parameters. Results: In this article, we develop a new method to detect mislabelled arrays simultaneously with learning a sparse logistic regression classifier. Our method may be seen as a label-noise robust extension of the well-known and successful Bayesian logistic regression classifier. To account for possible mislabelling, we formulate a label-flipping process as part of the classifier. The regularization parameter is automatically set using Bayesian regularization, which not only saves the computation time that cross-validation would take, but also eliminates any unwanted effects of label noise when setting the regularization parameter. Extensive experiments with both synthetic data and real microarray datasets demonstrate that our approach is able to counter the bad effects of labelling errors in terms of predictive performance, it is effective at identifying marker genes and simultaneously it detects mislabelled arrays to high accuracy.
引用
收藏
页码:870 / 877
页数:8
相关论文
共 50 条
  • [21] Ensemble Bagging Discriminant and Logistic Regression in Classification Analysis
    Solimun
    Fernandes, Adji Achmad Rinaldo
    NEW MATHEMATICS AND NATURAL COMPUTATION, 2025, 21 (01) : 91 - 111
  • [22] Greedy Projected Gradient-Newton Method for Sparse Logistic Regression
    Wang, Rui
    Xiu, Naihua
    Zhang, Chao
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (02) : 527 - 538
  • [23] Review of sparse methods in regression and classification with application to chemometrics
    Filzmoser, Peter
    Gschwandtner, Moritz
    Todorov, Valentin
    JOURNAL OF CHEMOMETRICS, 2012, 26 (3-4) : 42 - 51
  • [24] Finite population Bayesian bootstrapping in high-dimensional classification via logistic regression
    Zarei, Shaho
    Mohammadpour, Adel
    Rezakhah, Saeid
    INTELLIGENT DATA ANALYSIS, 2018, 22 (05) : 1115 - 1126
  • [25] Ultrasonic Classification of Multicategory Thyroid Nodules Based on Logistic Regression
    Zheng, Yi
    Xu, Shangyan
    Zheng, Zhan
    Wu, Lili
    Chen, Lin
    Zhan, Weiwei
    ULTRASOUND QUARTERLY, 2020, 36 (02) : 146 - 157
  • [26] Locality Preserving Robust Regression for Jointly Sparse Subspace Learning
    Liu, Ning
    Lai, Zhihui
    Li, Xuechen
    Chen, Yudong
    Mo, Dongmei
    Kong, Heng
    Shen, Linlin
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (06) : 2274 - 2287
  • [27] Robust and sparse multigroup classification by the optimal scoring approach
    Ortner, Irene
    Filzmoser, Peter
    Croux, Christophe
    DATA MINING AND KNOWLEDGE DISCOVERY, 2020, 34 (03) : 723 - 741
  • [28] Evaluation of Forensic Data Using Logistic Regression-Based Classification Methods and an R Shiny Implementation
    Biosa, Giulia
    Giurghita, Diana
    Alladio, Eugenio
    Vincenti, Marco
    Neocleous, Tereza
    FRONTIERS IN CHEMISTRY, 2020, 8
  • [29] Robust classification using l2,1-norm based regression model
    Ren, Chuan-Xian
    Dai, Dao-Qing
    Yan, Hong
    PATTERN RECOGNITION, 2012, 45 (07) : 2708 - 2718
  • [30] Penalized logistic regression with prior information for microarray gene expression classification
    Genc, Murat
    INTERNATIONAL JOURNAL OF BIOSTATISTICS, 2024, 20 (01): : 107 - 122