A New Ensemble Method for Detecting Anomalies in Gene Expression Matrices

被引:12
作者
Selicato, Laura [1 ,2 ]
Esposito, Flavia [1 ,2 ]
Gargano, Grazia [1 ]
Vegliante, Maria Carmela [3 ]
Opinto, Giuseppina [3 ]
Zaccaria, Gian Maria [3 ]
Ciavarella, Sabino [3 ]
Guarini, Attilio [3 ]
Del Buono, Nicoletta [1 ,2 ]
机构
[1] Univ Bari Aldo Moro, Dept Math, I-70125 Bari, Italy
[2] Ist Nazl Alta Matemat, GNCS, Ple Aldo Moro 5, I-00185 Rome, Italy
[3] IRCCS Ist Tumori Giovanni Paolo II, Hematol & Cell Therapy Unit, I-70124 Bari, Italy
关键词
anomaly; low rank decomposition; gene expression; clustering; outliers; FOLLICULAR LYMPHOMA; MICROARRAY; MUTATIONS; NUMBER;
D O I
10.3390/math9080882
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
One of the main problems in the analysis of real data is often related to the presence of anomalies. Namely, anomalous cases can both spoil the resulting analysis and contain valuable information at the same time. In both cases, the ability to detect these occurrences is very important. In the biomedical field, a correct identification of outliers could allow the development of new biological hypotheses that are not considered when looking at experimental biological data. In this work, we address the problem of detecting outliers in gene expression data, focusing on microarray analysis. We propose an ensemble approach for detecting anomalies in gene expression matrices based on the use of Hierarchical Clustering and Robust Principal Component Analysis, which allows us to derive a novel pseudo-mathematical classification of anomalies.
引用
收藏
页数:26
相关论文
共 43 条
[1]  
Barghash A., 2016, Journal of Proteomics and Bioinformatics, V9, P38, DOI DOI 10.4172/JPB.1000387
[2]   Identification of Highly Methylated Genes across Various Types of B-Cell Non-Hodgkin Lymphoma [J].
Bethge, Nicole ;
Honne, Hilde ;
Hilden, Vera ;
Troen, Gunhild ;
Eknaes, Mette ;
Liestol, Knut ;
Holte, Harald ;
Delabie, Jan ;
Smeland, Erlend B. ;
Lind, Guro E. .
PLOS ONE, 2013, 8 (11)
[3]  
Bhattacharya A, 2011, LECT NOTES COMPUT SC, V6744, P394, DOI 10.1007/978-3-642-21786-9_64
[4]   Molecular subtypes of diffuse large B cell lymphoma are associated with distinct pathogenic mechanisms and outcomes [J].
Chapuy, Bjoern ;
Stewart, Chip ;
Dunford, Andrew J. ;
Kim, Jaegil ;
Kamburov, Atanas ;
Redd, Robert A. ;
Lawrence, Mike S. ;
Roemer, Margaretha G. M. ;
Li, Amy J. ;
Ziepert, Marita ;
Staiger, Annette M. ;
Wala, Jeremiah A. ;
Ducar, Matthew D. ;
Leshchiner, Ignaty ;
Rheinbay, Ester ;
Taylor-Weiner, Amaro ;
Coughlin, Caroline A. ;
Hess, Julian M. ;
Pedamallu, Chandra S. ;
Livitz, Dimitri ;
Rosebrock, Daniel ;
Rosenberg, Mara ;
Tracy, Adam A. ;
Horn, Heike ;
van Hummelen, Paul ;
Feldman, Andrew L. ;
Link, Brian K. ;
Novak, Anne J. ;
Cerhan, James R. ;
Habermann, Thomas M. ;
Siebert, Reiner ;
Rosenwald, Andreas ;
Thorner, Aaron R. ;
Meyerson, Matthew L. ;
Golub, Todd R. ;
Beroukhim, Rameen ;
Wulf, Gerald G. ;
Ott, German ;
Rodig, Scott J. ;
Monti, Stefano ;
Neuberg, Donna S. ;
Loeffler, Markus ;
Pfreundschuh, Michael ;
Truemper, Lorenz ;
Getz, Gad ;
Shipp, Margaret A. .
NATURE MEDICINE, 2018, 24 (05) :679-+
[5]   Robust principal component analysis for accurate outlier sample detection in RNA-Seq data [J].
Chen, Xiaoying ;
Zhang, Bo ;
Wang, Ting ;
Bonni, Azad ;
Zhao, Guoyan .
BMC BIOINFORMATICS, 2020, 21 (01)
[6]   SELECTING THE NUMBER OF PRINCIPAL COMPONENTS: ESTIMATION OF THE TRUE RANK OF A NOISY MATRIX [J].
Choi, Yunjin ;
Taylor, Jonathan ;
Tibshirani, Robert .
ANNALS OF STATISTICS, 2017, 45 (06) :2590-2617
[7]   Algorithms for Projection - Pursuit robust principal component analysis [J].
Croux, C. ;
Filzmoser, P. ;
Oliveira, M. R. .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2007, 87 (02) :218-225
[8]   Identification of differentially expressed genes and pathways for intramuscular fat metabolism between breast and thigh tissues of chickens [J].
Cui, Huanxian ;
Zheng, Maiqing ;
Zhao, Guiping ;
Liu, Ranran ;
Wen, Jie .
BMC GENOMICS, 2018, 19
[9]  
Del Buono N., 2016, LECT NOTES COMPUTER, V10122, DOI [10.1007/978-3-319-51469-7_24, DOI 10.1007/978-3-319-51469-7_24]
[10]  
Del Buono N., 2020, P MACH LEARN OPT DAT, P100, DOI [DOI 10.1007/978-3-030-64583-0_11, 10.1007/978-3-030-64583-0_11]