iBBiG: iterative binary bi-clustering of gene sets

被引:36
作者
Gusenleitner, Daniel [1 ]
Howe, Eleanor A. [1 ,2 ]
Bentink, Stefan [1 ,3 ]
Quackenbush, John [1 ,3 ,4 ]
Culhane, Aedin C. [1 ,3 ]
机构
[1] Dana Farber Canc Inst, Dept Biostat & Computat Biol, Boston, MA 02115 USA
[2] Univ Oxford, Dept Stat, Oxford OX1 3TG, England
[3] Harvard Univ, Sch Publ Hlth, Dept Biostat, Boston, MA 02115 USA
[4] Dana Farber Canc Inst, Dept Canc Biol, Boston, MA 02115 USA
基金
美国国家卫生研究院;
关键词
ENRICHMENT ANALYSIS; BIOLOGICAL PROCESSES; MICROARRAY DATA; EXPRESSION DATA; DISEASES; CCL5;
D O I
10.1093/bioinformatics/bts438
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Meta-analysis of genomics data seeks to identify genes associated with a biological phenotype across multiple datasets; however, merging data from different platforms by their features (genes) is challenging. Meta-analysis using functionally or biologically characterized gene sets simplifies data integration is biologically intuitive and is seen as having great potential, but is an emerging field with few established statistical methods. Results: We transform gene expression profiles into binary gene set profiles by discretizing results of gene set enrichment analyses and apply a new iterative bi-clustering algorithm (iBBiG) to identify groups of gene sets that are coordinately associated with groups of phenotypes across multiple studies. iBBiG is optimized for meta-analysis of large numbers of diverse genomics data that may have unmatched samples. It does not require prior knowledge of the number or size of clusters. When applied to simulated data, it outperforms commonly used clustering methods, discovers overlapping clusters of diverse sizes and is robust in the presence of noise. We apply it to meta-analysis of breast cancer studies, where iBBiG extracted novel gene set-phenotype association that predicted tumor metastases within tumor subtypes.
引用
收藏
页码:2484 / 2492
页数:9
相关论文
共 41 条
  • [1] Affenzeller M., 2005, ADAPTIVE NATURAL COM
  • [2] [Anonymous], NATURE
  • [3] NCBI GEO: archive for high-throughput functional genomic data
    Barrett, Tanya
    Troup, Dennis B.
    Wilhite, Stephen E.
    Ledoux, Pierre
    Rudnev, Dmitry
    Evangelista, Carlos
    Kim, Irene F.
    Soboleva, Alexandra
    Tomashevsky, Maxim
    Marshall, Kimberly A.
    Phillippy, Katherine H.
    Sherman, Patti M.
    Muertter, Rolf N.
    Edgar, Ron
    [J]. NUCLEIC ACIDS RESEARCH, 2009, 37 : D885 - D890
  • [4] CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING
    BENJAMINI, Y
    HOCHBERG, Y
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) : 289 - 300
  • [5] Prioritizing GWAS Results: A Review of Statistical Methods and Recommendations for Their Application
    Cantor, Rita M.
    Lange, Kenneth
    Sinsheimer, Janet S.
    [J]. AMERICAN JOURNAL OF HUMAN GENETICS, 2010, 86 (01) : 6 - 22
  • [6] Chakraborty M, 1997, ICICS - PROCEEDINGS OF 1997 INTERNATIONAL CONFERENCE ON INFORMATION, COMMUNICATIONS AND SIGNAL PROCESSING, VOLS 1-3, P407, DOI 10.1109/ICICS.1997.647128
  • [7] Cheng Y, 2000, Proc Int Conf Intell Syst Mol Biol, V8, P93
  • [8] Culhane A.C., 2009, NUCLEIC ACIDS RES, V38, pD725
  • [9] GeneSigDB: a manually curated database and resource for analysis of gene expression signatures
    Culhane, Aedin C.
    Schroeder, Markus S.
    Sultana, Razvan
    Picard, Shaita C.
    Martinelli, Enzo N.
    Kelly, Caroline
    Haibe-Kains, Benjamin
    Kapushesky, Misha
    St Pierre, Anne-Alyssa
    Flahive, William
    Picard, Kermshlise C.
    Gusenleitner, Daniel
    Papenhausen, Gerald
    O'Connor, Niall
    Correll, Mick
    Quackenbush, John
    [J]. NUCLEIC ACIDS RESEARCH, 2012, 40 (D1) : D1060 - D1066
  • [10] Analyzing gene expression data in terms of gene sets:: methodological issues
    Goeman, Jelle J.
    Buehlmann, Peter
    [J]. BIOINFORMATICS, 2007, 23 (08) : 980 - 987