iDNA-Prot|dis: Identifying DNA-Binding Proteins by Incorporating Amino Acid Distance-Pairs and Reduced Alphabet Profile into the General Pseudo Amino Acid Composition

被引:267
作者
Liu, Bin [1 ,2 ,3 ,4 ]
Xu, Jinghao [1 ]
Lan, Xun [5 ]
Xu, Ruifeng [1 ,2 ]
Zhou, Jiyun [1 ]
Wang, Xiaolong [1 ,2 ]
Chou, Kuo-Chen [4 ,6 ]
机构
[1] Harbin Inst Technol, Shenzhen Grad Sch, Sch Comp Sci & Technol, Shenzhen, Guangdong, Peoples R China
[2] Harbin Inst Technol, Shenzhen Grad Sch, Key Lab Network Oriented Intelligent Computat, Shenzhen, Guangdong, Peoples R China
[3] Shanghai Key Lab Intelligent Informat Proc, Shanghai, Peoples R China
[4] Gordon Life Sci Inst, Belmont, MA USA
[5] Stanford Univ, Stanford, CA 94305 USA
[6] King Abdulaziz Univ, CEGMR, Jeddah 21413, Saudi Arabia
基金
中国国家自然科学基金;
关键词
SUPPORT VECTOR MACHINES; REMOTE HOMOLOGY DETECTION; SUBCELLULAR LOCATION PREDICTION; FUNCTIONAL DOMAIN COMPOSITION; TUPLE NUCLEOTIDE COMPOSITION; SECONDARY STRUCTURE-CONTENT; STRUCTURAL CLASSES; SIGNAL PEPTIDES; CHOUS PSEAAC; PHYSICOCHEMICAL PROPERTIES;
D O I
10.1371/journal.pone.0106691
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Playing crucial roles in various cellular processes, such as recognition of specific nucleotide sequences, regulation of transcription, and regulation of gene expression, DNA-binding proteins are essential ingredients for both eukaryotic and prokaryotic proteomes. With the avalanche of protein sequences generated in the postgenomic age, it is a critical challenge to develop automated methods for accurate and rapidly identifying DNA-binding proteins based on their sequence information alone. Here, a novel predictor, called "iDNA-Prot|dis'', was established by incorporating the amino acid distance-pair coupling information and the amino acid reduced alphabet profile into the general pseudo amino acid composition (PseAAC) vector. The former can capture the characteristics of DNA-binding proteins so as to enhance its prediction quality, while the latter can reduce the dimension of PseAAC vector so as to speed up its prediction process. It was observed by the rigorous jackknife and independent dataset tests that the new predictor outperformed the existing predictors for the same purpose. As a user-friendly web-server, iDNA-Prot|dis is accessible to the public at http://bioinformatics.hitsz.edu.cn/iDNA-Prot_dis/. Moreover, for the convenience of the vast majority of experimental scientists, a step-by-step protocol guide is provided on how to use the web-server to get their desired results without the need to follow the complicated mathematic equations that are presented in this paper just for the integrity of its developing process. It is anticipated that the iDNA-Prot| dis predictor may become a useful high throughput tool for large-scale analysis of DNA-binding proteins, or at the very least, play a complementary role to the existing predictors in this regard.
引用
收藏
页数:12
相关论文
共 101 条
[81]   Annotating nucleic acid-binding function based on protein structure [J].
Stawiski, EW ;
Gregoret, LM ;
Mandel-Gutfreund, Y .
JOURNAL OF MOLECULAR BIOLOGY, 2003, 326 (04) :1065-1079
[82]   Identifying protein quaternary structural attributes by incorporating physicochemical properties into the general form of Chou's PseAAC via discrete wavelet transform [J].
Sun, Xing-Yu ;
Shi, Shao-Ping ;
Qiu, Jian-Ding ;
Suo, Sheng-Bao ;
Huang, Shu-Yun ;
Liang, Ru-Ping .
MOLECULAR BIOSYSTEMS, 2012, 8 (12) :3178-3184
[83]   Prediction of DNA-binding propensity of proteins by the ball-histogram method using automatic template search [J].
Szaboova, Andrea ;
Kuzelka, Ondrej ;
Zelezny, Filip ;
Tolar, Jakub .
BMC BIOINFORMATICS, 2012, 13 :S3
[84]   Efficient prediction of nucleic acid binding function from low-resolution protein structures [J].
Szilagyi, A ;
Skolnick, J .
JOURNAL OF MOLECULAR BIOLOGY, 2006, 358 (03) :922-933
[85]   Crystal structure of the CENP-B protein-DNA complex: the DNA-binding domains of CENP-B induce kinks in the CENP-B box DNA [J].
Tanaka, Y ;
Nureki, O ;
Kurumizaka, H ;
Fukai, S ;
Kawaguchi, S ;
Ikuta, M ;
Iwahara, J ;
Okazaki, T ;
Yokoyama, S .
EMBO JOURNAL, 2001, 20 (23) :6612-6618
[86]   DISPLAR: an accurate method for predicting DNA-binding sites on protein surfaces [J].
Tjong, Harianto ;
Zhou, Huan-Xiang .
NUCLEIC ACIDS RESEARCH, 2007, 35 (05) :1465-1477
[87]   GOASVM: A subcellular location predictor by incorporating term-frequency gene ontology into the general form of Chou's pseudo-amino acid composition [J].
Wan, Shibiao ;
Mak, Man-Wai ;
Kung, Sun-Yuan .
JOURNAL OF THEORETICAL BIOLOGY, 2013, 323 :40-48
[88]   PISCES: recent improvements to a PDB sequence culling server [J].
Wang, GL ;
Dunbrack, RL .
NUCLEIC ACIDS RESEARCH, 2005, 33 :W94-W98
[89]   SLLE for predicting membrane protein types [J].
Wang, M ;
Yang, H ;
Xu, ZH ;
Chou, KC .
JOURNAL OF THEORETICAL BIOLOGY, 2005, 232 (01) :7-15
[90]   Predicting membrane protein types by the LLDA algorithm [J].
Wang, Tong ;
Yang, Jie ;
Shen, Hong-Bin ;
Chou, Kuo-Chen .
PROTEIN AND PEPTIDE LETTERS, 2008, 15 (09) :915-921