Knowledge-guided multi-scale independent component analysis for biomarker identification

被引:19
作者
Chen, Li [1 ]
Xuan, Jianhua [1 ]
Wang, Chen [1 ]
Shih, Ie-Ming [2 ,3 ,4 ]
Wang, Yue [1 ]
Zhang, Zhen [2 ,3 ,4 ]
Hoffman, Eric [5 ]
Clarke, Robert [6 ,7 ]
机构
[1] Virginia Polytech Inst & State Univ, Dept Elect & Comp Engn, Arlington, VA USA
[2] Johns Hopkins Univ, Sch Med, Dept Pathol, Baltimore, MD USA
[3] Johns Hopkins Univ, Sch Med, Dept Gynecol, Baltimore, MD USA
[4] Johns Hopkins Univ, Sch Med, Dept Oncol, Baltimore, MD USA
[5] Childrens Natl Med Ctr, Res Ctr Genet Med, Washington, DC 20010 USA
[6] Georgetown Univ, Sch Med, Dept Oncol, Washington, DC USA
[7] Georgetown Univ, Sch Med, Dept Physiol & Biophys, Washington, DC USA
关键词
D O I
10.1186/1471-2105-9-416
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Many statistical methods have been proposed to identify disease biomarkers from gene expression profiles. However, from gene expression profile data alone, statistical methods often fail to identify biologically meaningful biomarkers related to a specific disease under study. In this paper, we develop a novel strategy, namely knowledge-guided multi-scale independent component analysis (ICA), to first infer regulatory signals and then identify biologically relevant biomarkers from microarray data. Results: Since gene expression levels reflect the joint effect of several underlying biological functions, disease-specific biomarkers may be involved in several distinct biological functions. To identify disease-specific biomarkers that provide unique mechanistic insights, a meta-data "knowledge gene pool" (KGP) is first constructed from multiple data sources to provide important information on the likely functions (such as gene ontology information) and regulatory events (such as promoter responsive elements) associated with potential genes of interest. The gene expression and biological meta data associated with the members of the KGP can then be used to guide subsequent analysis. ICA is then applied to multi-scale gene clusters to reveal regulatory modes reflecting the underlying biological mechanisms. Finally disease-specific biomarkers are extracted by their weighted connectivity scores associated with the extracted regulatory modes. A statistical significance test is used to evaluate the significance of transcription factor enrichment for the extracted gene set based on motif information. We applied the proposed method to yeast cell cycle microarray data and Rsf-I-induced ovarian cancer microarray data. The results show that our knowledge-guided ICA approach can extract biologically meaningful regulatory modes and outperform several baseline methods for biomarker identification. Conclusion: We have proposed a novel method, namely knowledge-guided multi-scale ICA, to identify disease-specific biomarkers. The goal is to infer knowledge-relevant regulatory signals and then identify corresponding biomarkers through a multi-scale strategy. The approach has been successfully applied to two expression profiling experiments to demonstrate its improved performance in extracting biologically meaningful and disease-related biomarkers. More importantly, the proposed approach shows promising results to infer novel biomarkers for ovarian cancer and extend current knowledge.
引用
收藏
页数:16
相关论文
共 36 条
  • [1] *AFF, 2005, GUID PROB LOG INT ER
  • [2] [Anonymous], INDEPENDENT COMPONEN
  • [3] Reverse engineering of regulatory networks in human B cells
    Basso, K
    Margolin, AA
    Stolovitzky, G
    Klein, U
    Dalla-Favera, R
    Califano, A
    [J]. NATURE GENETICS, 2005, 37 (04) : 382 - 390
  • [4] The properties of high-dimensional data spaces: implications for exploring gene and protein expression data
    Clarke, Robert
    Ressom, Habtom W.
    Wang, Antai
    Xuan, Jianhua
    Liu, Minetta C.
    Gehan, Edmund A.
    Wang, Yue
    [J]. NATURE REVIEWS CANCER, 2008, 8 (01) : 37 - 49
  • [5] maSigPro:: a method to identify significantly differential expression profiles in time-course microarray experiments
    Conesa, A
    Nueda, MJ
    Ferrer, A
    Talón, M
    [J]. BIOINFORMATICS, 2006, 22 (09) : 1096 - 1102
  • [6] Integrating regulatory motif discovery and genome-wide expression analysis
    Conlon, EM
    Liu, XS
    Lieb, JD
    Liu, JS
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2003, 100 (06) : 3339 - 3344
  • [7] Devore J., 1997, Statistics: the exploration and analysis of data
  • [8] Independent component analysis reveals new and biologically significant structures in micro array data
    Frigyesi, Attila
    Veerla, Srinivas
    Lindgren, David
    Hoglund, Mattias
    [J]. BMC BIOINFORMATICS, 2006, 7 (1)
  • [9] Gong T, 2007, GENE REGUL SYST BIO, V1, P349
  • [10] Hartigan J. A., 1979, Applied Statistics, V28, P100, DOI 10.2307/2346830