Outlier detection and ambiguity detection for microarray data in probabilistic discriminant partial least squares regression

被引:7
作者
Botella, C. [1 ]
Ferre, J. [1 ]
Boque, R. [1 ]
机构
[1] Univ Rovira & Virgili, Dept Analyt Chem & Organ Chem, Tarragona 43007, Spain
关键词
outlier detection; ambiguous samples; discriminant partial least squares; reject option; EXPRESSION; CLASSIFICATION; PREDICTION; ERROR; CALIBRATION; DESIGN;
D O I
10.1002/cem.1304
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The reject option plays an important role in the classification of microarray data. In this work, a reject option is implemented in the probabilistic discriminant partial least squares (p-DPLS) method in order to reject to classify both outliers and ambiguous samples. Microarray data are highly susceptible to present outliers because of the many steps involved in the experimental process. During the development of the classifier, outliers in the training data may strongly influence the model and degrade its performance. Some future samples to be classified may also be outliers that will most probably be misclassified. Ambiguous samples are samples that cannot be clearly assigned to any of the classes with a high confidence. In this work, outlier detection and ambiguity detection are implemented taking into account the x-residuals, the leverage and the predicted (y) over cap. The method was applied to oligonucleotide microarray data and cDNA microarray data. For the first dataset (prostate cancer data set), the outlier detection criteria allowed us to remove nine samples from the training set. The model without those samples had better classification ability, with a decrease in the classification cost per sample from 0.10 to 0.07. The method was also used in a second dataset (small round blue cell tumours of childhood dataset) to detect prediction outliers so that most of the outliers were rejected to classify and misclassifications were reduced from 100 to 5%. Copyright (C) 2010 John Wiley & Sons, Ltd.
引用
收藏
页码:434 / 443
页数:10
相关论文
共 50 条
[41]   A Partial least squares-based regression approach for analysis of frontotemporal dementia gene markers in human brain gene microarray data [J].
Chan, S. C. ;
Wu, H. C. ;
Lin, J. Q. ;
Zhang, Z. G. .
2018 IEEE 23RD INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING (DSP), 2018,
[42]   Sparse partial least-squares regression and its applications to high-throughput data analysis [J].
Lee, Donghwan ;
Lee, Woojoo ;
Lee, Youngjo ;
Pawitan, Yudi .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2011, 109 (01) :1-8
[43]   Using Partial Least Squares Regression in Lifetime Analysis [J].
Mdimagh, Intissar ;
Benammou, Salwa .
NEW PERSPECTIVES IN STATISTICAL MODELING AND DATA ANALYSIS, 2011, :291-299
[44]   Tide modeling using partial least squares regression [J].
Okwuashi, Onuwa ;
Ndehedehe, Christopher ;
Attai, Hosanna .
OCEAN DYNAMICS, 2020, 70 (08) :1089-1101
[45]   Computing Frechet derivatives in partial least squares regression [J].
Elden, Lars .
LINEAR ALGEBRA AND ITS APPLICATIONS, 2015, 473 :316-338
[46]   Some theoretical aspects of partial least squares regression [J].
Helland, IS .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2001, 58 (02) :97-107
[47]   Collaborative representation based classifier with partial least squares regression for the classification of spectral data [J].
Song, Weiran ;
Wang, Hui ;
Maguire, Paul ;
Nibouche, Omar .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2018, 182 :79-86
[48]   Locality preserving partial least squares discriminant analysis for face recognition [J].
Aminu, Muhammad ;
Ahmad, Noor Atinah .
JOURNAL OF KING SAUD UNIVERSITY COMPUTER AND INFORMATION SCIENCES, 2022, 34 (02) :153-164
[49]   The Application of Regression Diagnosis in Outlier Detection [J].
Chen, Mingming ;
Gao, Meng ;
Ma, Jinglian .
PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON MANAGEMENT SCIENCE AND INNOVATIVE EDUCATION (MSIE 2015), 2015, 32 :124-127
[50]   OUTLIER DETECTION BY ROBUST ALTERNATING REGRESSION [J].
UKKELBERG, A ;
BORGEN, OS .
ANALYTICA CHIMICA ACTA, 1993, 277 (02) :489-494