ON THE POWER AND LIMITS OF SEQUENCE SIMILARITY BASED CLUSTERING OF PROTEINS INTO FAMILIES

被引:0
作者
Wiwie, Christian [1 ]
Rottger, Richard [1 ]
机构
[1] Univ Southern Denmark, Dept Math & Comp Sci, Odense, Denmark
来源
PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 | 2017年
关键词
Protein Classification; Protein Evolution; Clustering; SEARCH; MODEL;
D O I
暂无
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Over the last decades, we have observed an ongoing tremendous growth of available sequencing data fueled by the advancements in wet-lab technology. The sequencing information is only the beginning of the actual understanding of how organisms survive and prosper. It is, for instance, equally important to also unravel the proteomic repertoire of an organism. A classical computational approach for detecting protein families is a sequence-based similarity calculation coupled with a subsequent cluster analysis. In this work we have intensively analyzed various clustering tools on a large scale. We used the data to investigate the behavior of the tools' parameters underlining the diversity of the protein families. Furthermore, we trained regression models for predicting the expected performance of a clustering tool for an unknown data set and aimed to also suggest optimal parameters in an automated fashion. Our analysis demonstrates the benefits and limitations of the clustering of proteins with low sequence similarity indicating that each protein family requires its own distinct set of tools and parameters. \
引用
收藏
页码:39 / 50
页数:12
相关论文
共 30 条
  • [1] BASIC LOCAL ALIGNMENT SEARCH TOOL
    ALTSCHUL, SF
    GISH, W
    MILLER, W
    MYERS, EW
    LIPMAN, DJ
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) : 403 - 410
  • [2] [Anonymous], 2010 9 INT C MACH LE
  • [3] [Anonymous], BIOINFORMATICS
  • [4] [Anonymous], 2008 IEEE INT JOINT
  • [5] [Anonymous], BIOMED RES INT
  • [6] The architecture of complex weighted networks
    Barrat, A
    Barthélemy, M
    Pastor-Satorras, R
    Vespignani, A
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2004, 101 (11) : 3747 - 3752
  • [7] Towards the integrated analysis, visualization and reconstruction of microbial gene regulatory networks
    Baumbach, Jan
    Tauch, Andreas
    Rahmann, Sven
    [J]. BRIEFINGS IN BIOINFORMATICS, 2009, 10 (01) : 75 - 83
  • [8] Evaluation and improvements of clustering algorithms for detecting remote homologous protein families
    Bernardes, Juliana S.
    Vieira, Fabio R. J.
    Costa, Lygia M. M.
    Zaverucha, Gerson
    [J]. BMC BIOINFORMATICS, 2015, 16
  • [9] Clustering evolving proteins into homologous families
    Chan, Cheong Xin
    Mahbob, Maisarah
    Ragan, Mark A.
    [J]. BMC BIOINFORMATICS, 2013, 14
  • [10] Diestel R., 2006, GRADUATE TEXTS MATH, V3rd