Using homology relations within a database markedly boosts protein sequence similarity search

被引:8
作者
Tong, Jing [1 ]
Sadreyev, Ruslan I. [2 ,3 ,4 ]
Pei, Jimin [5 ]
Kinch, Lisa N. [5 ]
Grishin, Nick V. [1 ,5 ]
机构
[1] Univ Texas SW Med Ctr Dallas, Dept Mol Biophys, Dallas, TX 75390 USA
[2] Massachusetts Gen Hosp, Dept Mol Biol, Boston, MA 02114 USA
[3] Massachusetts Gen Hosp, Dept Pathol, Boston, MA 02114 USA
[4] Harvard Univ, Sch Med, Boston, MA 02114 USA
[5] Univ Texas SW Med Ctr Dallas, Howard Hughes Med Inst, Dallas, TX 75390 USA
基金
美国国家卫生研究院;
关键词
homology detection; remote sequence; similarity search; homology network; protein modeling; similarity score; PSI-BLAST; GENERATION; PREDICTIONS; ALIGNMENTS; PROFILES; COMPASS; IMPACT; DOMAIN;
D O I
10.1073/pnas.1424324112
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Inference of homology from protein sequences provides an essential tool for analyzing protein structure, function, and evolution. Current sequence-based homology search methods are still unable to detect many similarities evident from protein spatial structures. In computer science a search engine can be improved by considering networks of known relationships within the search database. Here, we apply this idea to protein-sequence-based homology search and show that it dramatically enhances the search accuracy. Our new method, COMPADRE (COmparison of Multiple Protein sequence Alignments using Database RElationships) assesses the relationship between the query sequence and a hit in the database by considering the similarity between the query and hit's known homologs. This approach increases detection quality, boosting the precision rate from 18% to 83% at half-coverage of all database homologs. The increased precision rate allows detection of a large fraction of protein structural relationships, thus providing structure and function predictions for previously uncharacterized proteins. Our results suggest that this general approach is applicable to a wide variety of methods for detection of biological similarities. The web server is available at prodata.swmed.edu/compadre.
引用
收藏
页码:7003 / 7008
页数:6
相关论文
共 32 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[3]   Data growth and its impact on the SCOP database: new developments [J].
Andreeva, Antonina ;
Howorth, Dave ;
Chandonia, John-Marc ;
Brenner, Steven E. ;
Hubbard, Tim J. P. ;
Chothia, Cyrus ;
Murzin, Alexey G. .
NUCLEIC ACIDS RESEARCH, 2008, 36 :D419-D425
[4]   The anatomy of a large-scale hypertextual Web search engine [J].
Brin, S ;
Page, L .
COMPUTER NETWORKS AND ISDN SYSTEMS, 1998, 30 (1-7) :107-117
[5]   Structure of N-terminal domain of ZAP indicates how a zinc-finger protein recognizes complex RNA [J].
Chen, Shoudeng ;
Xu, Yihui ;
Zhang, Kuo ;
Wang, Xinlu ;
Sun, Jian ;
Gao, Guangxia ;
Liu, Yingfang .
NATURE STRUCTURAL & MOLECULAR BIOLOGY, 2012, 19 (04) :430-435
[6]   Open source clustering software [J].
de Hoon, MJL ;
Imoto, S ;
Nolan, J ;
Miyano, S .
BIOINFORMATICS, 2004, 20 (09) :1453-1454
[7]  
Dembo A, 1994, ANN PROBAB, V22, P18
[8]   Profile hidden Markov models [J].
Eddy, SR .
BIOINFORMATICS, 1998, 14 (09) :755-763
[9]   ORFeus: detection of distant homology using sequence profiles and predicted secondary structure [J].
Ginalski, K ;
Pas, J ;
Wyrwicz, LS ;
von Grotthuss, M ;
Bujnicki, JM ;
Rychlewski, L .
NUCLEIC ACIDS RESEARCH, 2003, 31 (13) :3804-3807
[10]   PROFILE ANALYSIS - DETECTION OF DISTANTLY RELATED PROTEINS [J].
GRIBSKOV, M ;
MCLACHLAN, AD ;
EISENBERG, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1987, 84 (13) :4355-4358