Detecting Outlier Samples in Microarray Data

被引:53
作者
Shieh, Albert D. [1 ]
Hung, Yeung Sam [2 ]
机构
[1] Harvard Univ, Cambridge, MA 02138 USA
[2] Univ Hong Kong, Hong Kong, Hong Kong, Peoples R China
关键词
GENE-EXPRESSION DATA; MULTIVARIATE LOCATION; CLASSIFICATION; IDENTIFICATION; PARAMETERS; SELECTION; TUMOR;
D O I
10.2202/1544-6115.1426
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
In this paper, we address the problem of detecting outlier samples with highly different expression patterns in microarray data. Although outliers are not common, they appear even in widely used benchmark data sets and can negatively affect microarray data analysis. It is important to identify outliers in order to explore underlying experimental or biological problems and remove erroneous data. We propose an outlier detection method based on principal component analysis (PCA) and robust estimation of Mahalanobis distances that is fully automatic. We demonstrate that our outlier detection method identifies biologically significant outliers with high accuracy and that outlier removal improves the prediction accuracy of classifiers. Our outlier detection method is closely related to existing robust PCA methods, so we compare our outlier detection method to a prominent robust PCA method.
引用
收藏
页数:26
相关论文
共 28 条
  • [1] Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays
    Alon, U
    Barkai, N
    Notterman, DA
    Gish, K
    Ybarra, S
    Mack, D
    Levine, AJ
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) : 6745 - 6750
  • [2] [Anonymous], 1985, MATH STAT APPL, V8, P283, DOI DOI 10.1007/978-94-009-5438-0_20
  • [3] [Anonymous], CHEMOM INTELL LAB SY
  • [4] Barnett V., 1994, Wiley series in probability and mathematical statistics applied probability and statistics, P224
  • [5] Stability of gene contributions and identification of outliers in multivariate analysis of microarray data
    Baty, Florent
    Jaeger, Daniel
    Preiswerk, Frank
    Schumacher, Martin M.
    Brutsche, Martin H.
    [J]. BMC BIOINFORMATICS, 2008, 9 (1)
  • [6] A comparison of normalization methods for high density oligonucleotide array data based on variance and bias
    Bolstad, BM
    Irizarry, RA
    Åstrand, M
    Speed, TP
    [J]. BIOINFORMATICS, 2003, 19 (02) : 185 - 193
  • [7] Between-group analysis of microarray data
    Culhane, AC
    Perrière, G
    Considine, EC
    Cotter, TG
    Higgins, DG
    [J]. BIOINFORMATICS, 2002, 18 (12) : 1600 - 1608
  • [9] Comparison of discrimination methods for the classification of tumors using gene expression data
    Dudoit, S
    Fridlyand, J
    Speed, TP
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2002, 97 (457) : 77 - 87
  • [10] Outlier detection in multivariate analytical chemical data
    Egan, WJ
    Mogan, SL
    [J]. ANALYTICAL CHEMISTRY, 1998, 70 (11) : 2372 - 2379