Principal component analysis based unsupervised feature extraction applied to budding yeast temporally periodic gene expression

被引:23
作者
Taguchi, Y-h [1 ]
机构
[1] Chuo Univ, Dept Phys, Bunkyo Ku, 1-13-27 Kasuga, Tokyo 1128551, Japan
基金
日本学术振兴会;
关键词
Principal component analysis; Feature extraction; Budding yeast; Cell division cycle; Gene expression; CYCLE; DISCOVERY; FKH2;
D O I
10.1186/s13040-016-0101-9
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: The recently proposed principal component analysis (PCA) based unsupervised feature extraction (FE) has successfully been applied to various bioinformatics problems ranging from biomarker identification to the screening of disease causing genes using gene expression/epigenetic profiles. However, the conditions required for its successful use and the mechanisms involved in how it outperforms other supervised methods is unknown, because PCA based unsupervised FE has only been applied to challenging (i.e. not well known) problems. Results: In this study, PCA based unsupervised FE was applied to an extensively studied organism, i.e., budding yeast. When applied to two gene expression profiles expected to be temporally periodic, yeast metabolic cycle (YMC) and yeast cell division cycle (YCDC), PCA based unsupervised FE outperformed simple but powerful conventional methods, with sinusoidal fitting with regards to several aspects: (i) feasible biological term enrichment without assuming periodicity for YMC; (ii) identification of periodic profiles whose period was half as long as the cell division cycle for YMC; and (iii) the identification of no more than 37 genes associated with the enrichment of biological terms related to cell division cycle for the integrated analysis of seven YCDC profiles, for which sinusoidal fittings failed. The explantation for differences between methods used and the necessary conditions required were determined by comparing PCA based unsupervised FE with fittings to various periodic (artificial, thus pre-defined) profiles. Furthermore, four popular unsupervised clustering algorithms applied to YMC were not as successful as PCA based unsupervised FE. Conclusions: PCA based unsupervised FE is a useful and effective unsupervised method to investigate YMC and YCDC. This study identified why the unsupervised method without pre-judged criteria outperformed supervised methods requiring human defined criteria.
引用
收藏
页数:23
相关论文
共 36 条
[1]   YeastMine-an integrated data warehouse for Saccharomyces cerevisiae data as a multipurpose tool-kit [J].
Balakrishnan, Rama ;
Park, Julie ;
Karra, Kalpana ;
Hitz, Benjamin C. ;
Binkley, Gail ;
Hong, Eurie L. ;
Sullivan, Julie ;
Micklem, Gos ;
Cherry, J. Michael .
DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2012,
[2]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[3]   The forkhead transcription factor Fkh2 regulates the cell division cycle of Schizosaccharomyces pombe [J].
Bulmer, R ;
Pic-Taylor, A ;
Whitehall, SK ;
Martin, KA ;
Millar, JBA ;
Quinn, J ;
Morgan, BA .
EUKARYOTIC CELL, 2004, 3 (04) :944-954
[4]   A genome-wide transcriptional analysis of the mitotic cell cycle [J].
Cho, RJ ;
Campbell, MJ ;
Winzeler, EA ;
Steinmetz, L ;
Conway, A ;
Wodicka, L ;
Wolfsberg, TG ;
Gabrielian, AE ;
Landsman, D ;
Lockhart, DJ ;
Davis, RW .
MOLECULAR CELL, 1998, 2 (01) :65-73
[5]   New weakly expressed cell cycle-regulated genes in yeast [J].
de Lichtenberg, U ;
Wernersson, R ;
Jensen, TS ;
Nielsen, HB ;
Fausboll, A ;
Schmidt, P ;
Hansen, FB ;
Knudsen, S ;
Brunak, S .
YEAST, 2005, 22 (15) :1191-1201
[6]   Model-based clustering, discriminant analysis, and density estimation [J].
Fraley, C ;
Raftery, AE .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2002, 97 (458) :611-631
[7]   Cyclebase.org - a comprehensive multi-organism online database of cell-cycle experiments [J].
Gauthier, Nicholas Paul ;
Larsen, Malene Erup ;
Wernersson, Rasmus ;
de Lichtenberg, Ulrik ;
Jensen, Lars Juhl ;
Brunak, Soren ;
Jensen, Thomas Skot .
NUCLEIC ACIDS RESEARCH, 2008, 36 :D854-D859
[8]   Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources [J].
Huang, Da Wei ;
Sherman, Brad T. ;
Lempicki, Richard A. .
NATURE PROTOCOLS, 2009, 4 (01) :44-57
[9]   Fast and robust fixed-point algorithms for independent component analysis [J].
Hyvärinen, A .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1999, 10 (03) :626-634
[10]  
Ishida S, 2014, PROTEIN PEPTIDE LETT, V21, P828