Bi-Force: large-scale bicluster editing and its application to gene expression data biclustering

被引:75
作者
Sun, Peng [1 ,2 ]
Speicher, Nora K. [1 ,2 ]
Roettger, Richard [1 ,2 ]
Guo, Jiong [2 ]
Baumbach, Jan [1 ,3 ]
机构
[1] Univ Saarland, Max Planck Inst Informat, D-66123 Saarbrucken, Germany
[2] Univ Saarland, Cluster Excellence Multimodel Comp & Interact, D-66123 Saarbrucken, Germany
[3] Univ Southern Denmark, Inst Math & Comp Sci, DK-5230 Odense M, Denmark
关键词
MICROARRAY DATA; BIOLOGICAL DATA; ALGORITHMS;
D O I
10.1093/nar/gku201
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The explosion of the biological data has dramatically reformed today's biological research. The need to integrate and analyze high-dimensional biological data on a large scale is driving the development of novel bioinformatics approaches. Biclustering, also known as 'simultaneous clustering' or 'co-clustering', has been successfully utilized to discover local patterns in gene expression data and similar biomedical data types. Here, we contribute a new heuristic: 'Bi-Force'. It is based on the weighted bicluster editing model, to perform biclustering on arbitrary sets of biological entities, given any kind of pairwise similarities. We first evaluated the power of Bi-Force to solve dedicated bicluster editing problems by comparing Bi-Force with two existing algorithms in the BiCluE software package. We then followed a biclustering evaluation protocol in a recent review paper from Eren et al. (2013) (A comparative analysis of biclustering algorithms for gene expressiondata. Brief. Bioinform., 14:279-292.) and compared Bi-Force against eight existing tools: FABIA, QUBIC, Cheng and Church, Plaid, BiMax, Spectral, xMOTIFs and ISA. To this end, a suite of synthetic datasets as well as nine large gene expression datasets from Gene Expression Omnibus were analyzed. All resulting biclusters were subsequently investigated by Gene Ontology enrichment analysis to evaluate their biological relevance. The distinct theoretical foundation of Bi-Force (bicluster editing) is more powerful than strict biclustering. We thus outperformed existing tools with Bi-Force at least when following the evaluation protocols from Eren et al. Bi-Force is implemented in Java and integrated into the open source software package of BiCluE. The software as well as all used datasets are publicly available at http://biclue.mpi-inf.mpg.de.
引用
收藏
页数:12
相关论文
共 39 条
  • [1] NP-hardness of Euclidean sum-of-squares clustering
    Aloise, Daniel
    Deshpande, Amit
    Hansen, Pierre
    Popat, Preyas
    [J]. MACHINE LEARNING, 2009, 75 (02) : 245 - 248
  • [2] Amit N, 2004, The bicluster graph editing problem
  • [3] BicAT: a biclustering analysis toolbox
    Barkow, S
    Bleuler, S
    Prelic, A
    Zimmermann, P
    Zitzler, E
    [J]. BIOINFORMATICS, 2006, 22 (10) : 1282 - 1283
  • [4] NCBI GEO: archive for functional genomics data sets-update
    Barrett, Tanya
    Wilhite, Stephen E.
    Ledoux, Pierre
    Evangelista, Carlos
    Kim, Irene F.
    Tomashevsky, Maxim
    Marshall, Kimberly A.
    Phillippy, Katherine H.
    Sherman, Patti M.
    Holko, Michelle
    Yefanov, Andrey
    Lee, Hyeseung
    Zhang, Naigong
    Robertson, Cynthia L.
    Serova, Nadezhda
    Davis, Sean
    Soboleva, Alexandra
    [J]. NUCLEIC ACIDS RESEARCH, 2013, 41 (D1) : D991 - D995
  • [5] Baumbach J, 2007, J INTEGR BIOINFORMAT, V4
  • [6] Benson DA, 2013, NUCLEIC ACIDS RES, V41, pD36, DOI [10.1093/nar/gkn723, 10.1093/nar/gkp1024, 10.1093/nar/gkw1070, 10.1093/nar/gkr1202, 10.1093/nar/gkx1094, 10.1093/nar/gkl986, 10.1093/nar/gkq1079, 10.1093/nar/gks1195, 10.1093/nar/gkg057]
  • [7] Iterative signature algorithm for the analysis of large-scale gene expression data
    Bergmann, S
    Ihmels, J
    Barkai, N
    [J]. PHYSICAL REVIEW E, 2003, 67 (03): : 18
  • [8] Bocker Sebastian, 2013, The Nature of Computation. Logic, Algorithms, Applications. 9th Conference on Computability in Europe, CiE 2013. Proceedings: LNCS 7921, P33, DOI 10.1007/978-3-642-39053-1_5
  • [9] Going weighted: Parameterized algorithms for cluster editing
    Boecker, S.
    Briesemeister, S.
    Bui, Q. B. A.
    Truss, A.
    [J]. THEORETICAL COMPUTER SCIENCE, 2009, 410 (52) : 5467 - 5480
  • [10] Biclustering in data mining
    Busygin, Stanislav
    Prokopyev, Oleg
    Pardalos, Panos M.
    [J]. COMPUTERS & OPERATIONS RESEARCH, 2008, 35 (09) : 2964 - 2987