The Ability of Different Imputation Methods to Preserve the Significant Genes and Pathways in Cancer

被引:7
作者
Aghdam, Rosa [1 ]
Baghfalaki, Taban [2 ]
Khosravi, Pegah [1 ,3 ,4 ]
Ansari, Elnaz Saberi [1 ,5 ]
机构
[1] Inst Res Fundamental Sci IPM, Sch Biol Sci, Tehran 193955746, Iran
[2] Tarbiat Modares Univ, Fac Math Sci, Dept Stat, Tehran 14115111, Iran
[3] Weill Cornell Med Coll, Inst Computat Biomed, Dept Physiol & Biophys, New York, NY 10021 USA
[4] Weill Cornell Med Coll, Inst Precis Med, New York, NY 10021 USA
[5] Univ Paris 05, CNRS, INSERM, Inst Cochin,U1016,UMR S1016,UMR 8104, F-75014 Paris, France
关键词
Gene expression; Missing data; Imputation method; Significant genes; Pathway enrichment; MISSING VALUE ESTIMATION; SOMATIC MUTATIONS; EXPRESSION; INFERENCE; SELECTION; VALUES; SAM;
D O I
10.1016/j.gpb.2017.08.003
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Deciphering important genes and pathways from incomplete gene expression data could facilitate a better understanding of cancer. Different imputation methods can be applied to estimate the missing values. In our study, we evaluated various imputation methods for their performance in preserving significant genes and pathways. In the first step, 5% genes are considered in random for two types of ignorable and non-ignorable missingness mechanisms with various missing rates. Next, 10 well-known imputation methods were applied to the complete datasets. The significance analysis of microarrays (SAM) method was applied to detect the significant genes in rectal and lung cancers to showcase the utility of imputation approaches in preserving significant genes. To determine the impact of different imputation methods on the identification of important genes, the chi-squared test was used to compare the proportions of overlaps between significant genes detected from original data and those detected from the imputed datasets. Additionally, the significant genes are tested for their enrichment in important pathways, using the ConsensusPathDB. Our results showed that almost all the significant genes and pathways of the original dataset can be detected in all imputed datasets, indicating that there is no significant difference in the performance of various imputation methods tested. The source code and selected datasets are available on http://profiles.bs.ipm.ir/softwares/imputation_methods/.
引用
收藏
页码:396 / 404
页数:9
相关论文
共 46 条
[1]   Dealing with missing values in large-scale studies: microarray data imputation and beyond [J].
Aittokallio, Tero .
BRIEFINGS IN BIOINFORMATICS, 2010, 11 (02) :253-264
[2]   NCBI GEO: archive for functional genomics data sets-update [J].
Barrett, Tanya ;
Wilhite, Stephen E. ;
Ledoux, Pierre ;
Evangelista, Carlos ;
Kim, Irene F. ;
Tomashevsky, Maxim ;
Marshall, Kimberly A. ;
Phillippy, Katherine H. ;
Sherman, Patti M. ;
Holko, Michelle ;
Yefanov, Andrey ;
Lee, Hyeseung ;
Zhang, Naigong ;
Robertson, Cynthia L. ;
Serova, Nadezhda ;
Davis, Sean ;
Soboleva, Alexandra .
NUCLEIC ACIDS RESEARCH, 2013, 41 (D1) :D991-D995
[3]   A comparison of normalization methods for high density oligonucleotide array data based on variance and bias [J].
Bolstad, BM ;
Irizarry, RA ;
Åstrand, M ;
Speed, TP .
BIOINFORMATICS, 2003, 19 (02) :185-193
[4]   Multiple Imputation for Missing Data via Sequential Regression Trees [J].
Burgette, Lane F. ;
Reiter, Jerome P. .
AMERICAN JOURNAL OF EPIDEMIOLOGY, 2010, 172 (09) :1070-1076
[5]   The genetic basis of colorectal cancer: Insights into critical pathways of tumorigenesis [J].
Chung, DC .
GASTROENTEROLOGY, 2000, 119 (03) :854-865
[6]   Impact of missing data imputation methods on gene expression clustering and classification [J].
de Souto, Marcilio C. P. ;
Jaskowiak, Pablo A. ;
Costa, Ivan G. .
BMC BIOINFORMATICS, 2015, 16
[7]   Somatic mutations affect key pathways in lung adenocarcinoma [J].
Ding, Li ;
Getz, Gad ;
Wheeler, David A. ;
Mardis, Elaine R. ;
McLellan, Michael D. ;
Cibulskis, Kristian ;
Sougnez, Carrie ;
Greulich, Heidi ;
Muzny, Donna M. ;
Morgan, Margaret B. ;
Fulton, Lucinda ;
Fulton, Robert S. ;
Zhang, Qunyuan ;
Wendl, Michael C. ;
Lawrence, Michael S. ;
Larson, David E. ;
Chen, Ken ;
Dooling, David J. ;
Sabo, Aniko ;
Hawes, Alicia C. ;
Shen, Hua ;
Jhangiani, Shalini N. ;
Lewis, Lora R. ;
Hall, Otis ;
Zhu, Yiming ;
Mathew, Tittu ;
Ren, Yanru ;
Yao, Jiqiang ;
Scherer, Steven E. ;
Clerc, Kerstin ;
Metcalf, Ginger A. ;
Ng, Brian ;
Milosavljevic, Aleksandar ;
Gonzalez-Garay, Manuel L. ;
Osborne, John R. ;
Meyer, Rick ;
Shi, Xiaoqi ;
Tang, Yuzhu ;
Koboldt, Daniel C. ;
Lin, Ling ;
Abbott, Rachel ;
Miner, Tracie L. ;
Pohl, Craig ;
Fewell, Ginger ;
Haipek, Carrie ;
Schmidt, Heather ;
Dunford-Shore, Brian H. ;
Kraja, Aldi ;
Crosby, Seth D. ;
Sawyer, Christopher S. .
NATURE, 2008, 455 (7216) :1069-1075
[8]   Multiple hypothesis testing in microarray experiments [J].
Dudoit, S ;
Shaffer, JP ;
Boldrick, JC .
STATISTICAL SCIENCE, 2003, 18 (01) :71-103
[9]   Empirical Bayes analysis of a microarray experiment [J].
Efron, B ;
Tibshirani, R ;
Storey, JD ;
Tusher, V .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2001, 96 (456) :1151-1160
[10]   Modulation of apoptosis signaling for cancer therapy [J].
Fulda, Simone ;
Debatin, Klaus-Michael .
ARCHIVUM IMMUNOLOGIAE ET THERAPIAE EXPERIMENTALIS, 2006, 54 (03) :173-175