Fingerprinting protein structures effectively and efficiently

被引:8
作者
Cui, Xuefeng [1 ]
Li, Shuai Cheng [2 ]
He, Lin [1 ]
Li, Ming [1 ]
机构
[1] Univ Waterloo, David R Cheriton Sch Comp Sci, Waterloo, ON N2L 3G1, Canada
[2] City Univ Hong Kong, Dept Comp Sci, Kowloon, Hong Kong, Peoples R China
基金
加拿大自然科学与工程研究理事会;
关键词
STRUCTURE ALIGNMENT; CLASSIFICATION; ALGORITHM; SEQUENCES;
D O I
10.1093/bioinformatics/btt659
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: One common task in structural biology is to assess the similarities and differences among protein structures. A variety of structure alignment algorithms and programs has been designed and implemented for this purpose. A major drawback with existing structure alignment programs is that they require a large amount of computational time, rendering them infeasible for pairwise alignments on large collections of structures. To overcome this drawback, a fragment alphabet learned from known structures has been introduced. The method, however, considers local similarity only, and therefore occasionally assigns high scores to structures that are similar only in local fragments. Method: We propose a novel approach that eliminates false positives, through the comparison of both local and remote similarity, with little compromise in speed. Two kinds of contact libraries (ContactLib) are introduced to fingerprint protein structures effectively and efficiently. Each contact group of the contact library consists of one local or two remote fragments and is represented by a concise vector. These vectors are then indexed and used to calculate a new combined hit-rate score to identify similar protein structures effectively and efficiently. Results: We tested our method on the high-quality protein structure subset of SCOP30 containing 3297 protein structures. For each protein structure of the subset, we retrieved its neighbor protein structures from the rest of the subset. The best area under the Receiver-Operating Characteristic curve, archived by ContactLib, is as high as 0.960. This is a significant improvement compared with 0.747, the best result achieved by FragBag. We also demonstrated that incorporating remote contact information is critical to consistently retrieve accurate neighbor protein structures for all-beta query protein structures.
引用
收藏
页码:949 / 955
页数:7
相关论文
共 29 条
[1]  
Akutsu T, 1996, IEICE T INF SYST, VE79D, P1629
[2]   Rapid retrieval of protein structures from databases [J].
Aung, Zeyar ;
Tan, Kian-Lee .
DRUG DISCOVERY TODAY, 2007, 12 (17-18) :732-739
[3]   FragBag, an accurate representation of protein structure, retrieves structural neighbors from the entire PDB quickly and accurately [J].
Budowski-Tal, Inbal ;
Nov, Yuval ;
Kolodny, Rachel .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2010, 107 (08) :3481-3486
[4]   A graph-theory algorithm for rapid protein side-chain prediction [J].
Canutescu, AA ;
Shelenkov, AA ;
Dunbrack, RL .
PROTEIN SCIENCE, 2003, 12 (09) :2001-2014
[5]   The ASTRAL Compendium in 2004 [J].
Chandonia, JM ;
Hon, G ;
Walker, NS ;
Lo Conte, L ;
Koehl, P ;
Levitt, M ;
Brenner, SE .
NUCLEIC ACIDS RESEARCH, 2004, 32 :D189-D192
[6]   Local feature frequency profile: A method to measure structural similarity in proteins [J].
Choi, IG ;
Kwon, J ;
Kim, SH .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2004, 101 (11) :3797-3802
[7]   Protein Structure Idealization: How accurately is it possible to model protein structures with dihedral angles? [J].
Cui, Xuefeng ;
Li, Shuai Cheng ;
Bu, Dongbo ;
Alipanahi, Babak ;
Li, Ming .
ALGORITHMS FOR MOLECULAR BIOLOGY, 2013, 8
[8]   Toward a "Structural BLAST": Using structural relationships to infer function [J].
Dey, Fabian ;
Zhang, Qiangfeng Cliff ;
Petrey, Donald ;
Honig, Barry .
PROTEIN SCIENCE, 2013, 22 (04) :359-366
[9]  
Engh R.A., 2006, INT TABLES CRYSTALLO, P382, DOI DOI 10.1107/97809553602060000695
[10]   An introduction to ROC analysis [J].
Fawcett, Tom .
PATTERN RECOGNITION LETTERS, 2006, 27 (08) :861-874