iBBiG: iterative binary bi-clustering of gene sets

被引：36

作者：

Gusenleitner, Daniel ^{[1
]}

Howe, Eleanor A. ^{[1
,2
]}

Bentink, Stefan ^{[1
,3
]}

Quackenbush, John ^{[1
,3
,4
]}

Culhane, Aedin C. ^{[1
,3
]}

机构：

[1] Dana Farber Canc Inst, Dept Biostat & Computat Biol, Boston, MA 02115 USA

[2] Univ Oxford, Dept Stat, Oxford OX1 3TG, England

[3] Harvard Univ, Sch Publ Hlth, Dept Biostat, Boston, MA 02115 USA

[4] Dana Farber Canc Inst, Dept Canc Biol, Boston, MA 02115 USA

来源：

BIOINFORMATICS | 2012年 / 28卷 / 19期

基金：

美国国家卫生研究院;

关键词：

ENRICHMENT ANALYSIS; BIOLOGICAL PROCESSES; MICROARRAY DATA; EXPRESSION DATA; DISEASES; CCL5;

D O I：

10.1093/bioinformatics/bts438

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Motivation: Meta-analysis of genomics data seeks to identify genes associated with a biological phenotype across multiple datasets; however, merging data from different platforms by their features (genes) is challenging. Meta-analysis using functionally or biologically characterized gene sets simplifies data integration is biologically intuitive and is seen as having great potential, but is an emerging field with few established statistical methods. Results: We transform gene expression profiles into binary gene set profiles by discretizing results of gene set enrichment analyses and apply a new iterative bi-clustering algorithm (iBBiG) to identify groups of gene sets that are coordinately associated with groups of phenotypes across multiple studies. iBBiG is optimized for meta-analysis of large numbers of diverse genomics data that may have unmatched samples. It does not require prior knowledge of the number or size of clusters. When applied to simulated data, it outperforms commonly used clustering methods, discovers overlapping clusters of diverse sizes and is robust in the presence of noise. We apply it to meta-analysis of breast cancer studies, where iBBiG extracted novel gene set-phenotype association that predicted tumor metastases within tumor subtypes.

引用

页码：2484 / 2492

页数：9

共 41 条

[1] Affenzeller M., 2005, ADAPTIVE NATURAL COM
[2] [Anonymous], NATURE
[3] NCBI GEO: archive for high-throughput functional genomic data
Barrett, Tanya
Troup, Dennis B.
Wilhite, Stephen E.
Ledoux, Pierre
Rudnev, Dmitry
Evangelista, Carlos
Kim, Irene F.
Soboleva, Alexandra
Tomashevsky, Maxim
Marshall, Kimberly A.
Phillippy, Katherine H.
Sherman, Patti M.
Muertter, Rolf N.
Edgar, Ron
[J]. NUCLEIC ACIDS RESEARCH, 2009, 37 : D885 - D890
[4] CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING
BENJAMINI, Y
HOCHBERG, Y
[J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) : 289 - 300
[5] Prioritizing GWAS Results: A Review of Statistical Methods and Recommendations for Their Application
Cantor, Rita M.
Lange, Kenneth
Sinsheimer, Janet S.
[J]. AMERICAN JOURNAL OF HUMAN GENETICS, 2010, 86 (01) : 6 - 22
[6] Chakraborty M, 1997, ICICS - PROCEEDINGS OF 1997 INTERNATIONAL CONFERENCE ON INFORMATION, COMMUNICATIONS AND SIGNAL PROCESSING, VOLS 1-3, P407, DOI 10.1109/ICICS.1997.647128
[7] Cheng Y, 2000, Proc Int Conf Intell Syst Mol Biol, V8, P93
[8] Culhane A.C., 2009, NUCLEIC ACIDS RES, V38, pD725
[9] GeneSigDB: a manually curated database and resource for analysis of gene expression signatures
Culhane, Aedin C.
Schroeder, Markus S.
Sultana, Razvan
Picard, Shaita C.
Martinelli, Enzo N.
Kelly, Caroline
Haibe-Kains, Benjamin
Kapushesky, Misha
St Pierre, Anne-Alyssa
Flahive, William
Picard, Kermshlise C.
Gusenleitner, Daniel
Papenhausen, Gerald
O'Connor, Niall
Correll, Mick
Quackenbush, John
[J]. NUCLEIC ACIDS RESEARCH, 2012, 40 (D1) : D1060 - D1066
[10] Analyzing gene expression data in terms of gene sets:: methodological issues
Goeman, Jelle J.
Buehlmann, Peter
[J]. BIOINFORMATICS, 2007, 23 (08) : 980 - 987

← 1 2 3 4 5 →