Identification of differentially expressed genes by means of outlier detection

被引:5
作者
Irigoien, Itziar [1 ]
Arenas, Concepcion [2 ]
机构
[1] Univ Basque Country, UPV EHU, Dept Computat Sci & Artificial Intelligence, Donostia San Sebastian, Spain
[2] Univ Barcelona, Dept Genet Microbiol & Stat, Barcelona, Spain
来源
BMC BIOINFORMATICS | 2018年 / 19卷
关键词
Differentially expressed gene; Multivariate statistics; Outlier; Quantile; DISCRIMINANT-ANALYSIS; MICROARRAY; DISCOVERY; CLASSIFICATION; CANCER; TUMOR;
D O I
10.1186/s12859-018-2318-8
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: An important issue in microarray data is to select, from thousands of genes, a small number of informative differentially expressed (DE) genes which may be key elements for a disease. If each gene is analyzed individually, there is a big number of hypotheses to test and a multiple comparison correction method must be used. Consequently, the resulting cut-off value may be too small. Moreover, an important issue is the selection's replicability of the DE genes. We present a new method, called ORdensity, to obtain a reproducible selection of DE genes. It takes into account the relation between all genes and it is not a gene-by-gene approach, unlike the usually applied techniques to DE gene selection. Results: The proposed method returns three measures, related to the concepts of outlier and density of false positives in a neighbourhood, which allow us to identify the DE genes with high classification accuracy. To assess the performance of ORdensity, we used simulated microarray data and four real microarray cancer data sets. The results indicated that the method correctly detects the DE genes; it is competitive with other well accepted methods; the list of DE genes that it obtains is useful for the correct classification or diagnosis of new future samples and, in general, it is more stable than other procedures. Conclusions: ORdensity is a new method for identifying DE genes that avoids some of the shortcomings of the individual gene identification and it is stable when the original sample is changed by subsamples.
引用
收藏
页数:20
相关论文
共 29 条
[1]   Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling [J].
Alizadeh, AA ;
Eisen, MB ;
Davis, RE ;
Ma, C ;
Lossos, IS ;
Rosenwald, A ;
Boldrick, JG ;
Sabet, H ;
Tran, T ;
Yu, X ;
Powell, JI ;
Yang, LM ;
Marti, GE ;
Moore, T ;
Hudson, J ;
Lu, LS ;
Lewis, DB ;
Tibshirani, R ;
Sherlock, G ;
Chan, WC ;
Greiner, TC ;
Weisenburger, DD ;
Armitage, JO ;
Warnke, R ;
Levy, R ;
Wilson, W ;
Grever, MR ;
Byrd, JC ;
Botstein, D ;
Brown, PO ;
Staudt, LM .
NATURE, 2000, 403 (6769) :503-511
[2]   Microarray data analysis: from disarray to consolidation and consensus [J].
Allison, DB ;
Cui, XQ ;
Page, GP ;
Sabripour, M .
NATURE REVIEWS GENETICS, 2006, 7 (01) :55-65
[3]   Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays [J].
Alon, U ;
Barkai, N ;
Notterman, DA ;
Gish, K ;
Ybarra, S ;
Mack, D ;
Levine, AJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) :6745-6750
[4]   Generalized discriminant analysis based on distances [J].
Anderson, MJ ;
Robinson, J .
AUSTRALIAN & NEW ZEALAND JOURNAL OF STATISTICS, 2003, 45 (03) :301-318
[5]  
[Anonymous], 1987, CLUSTERING MEANS MED
[6]  
[Anonymous], 2015, LANG ENV STAT COMP
[7]  
Arenas C., 2017, EXTENDED ABSTRACTS F, P3, DOI [10.1007/978-3-319-55639-0_1, DOI 10.1007/978-3-319-55639-0_1]
[8]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[9]   The proximity of an individual to a population with applications in discriminant analysis [J].
Cuadras, CM ;
Fortiana, J ;
Oliva, F .
JOURNAL OF CLASSIFICATION, 1997, 14 (01) :117-136
[10]  
Dembele Doulaye, 2013, Microarrays (Basel), V2, P115, DOI 10.3390/microarrays2020115