Identifying DNA-binding proteins by combining support vector machine and PSSM distance transformation

被引:72
作者
Xu, Ruifeng [1 ,2 ]
Zhou, Jiyun [1 ]
Wang, Hongpeng [1 ]
He, Yulan [3 ]
Wang, Xiaolong [1 ,2 ]
Liu, Bin [1 ,2 ]
机构
[1] Harbin Inst Technol, Shenzhen Grad Sch, Sch Comp Sci & Technol, Shenzhen, Guangdong, Peoples R China
[2] Harbin Inst Technol, Key Lab Network Oriented Intelligent Computat, Shenzhen Grad Sch, Shenzhen, Guangdong, Peoples R China
[3] Aston Univ, Sch Engn & Appl Sci, Birmingham B4 7ET, W Midlands, England
基金
中国国家自然科学基金;
关键词
REMOTE HOMOLOGY DETECTION; SEQUENCE-BASED PREDICTOR; AMINO-ACID-COMPOSITION; RNA-BINDING; WEB SERVER; IDENTIFICATION; GENOME; SITES; CLASSIFIER; PROTOCOL;
D O I
10.1186/1752-0509-9-S1-S10
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background:: DNA-binding proteins play a pivotal role in various intra-and extra-cellular activities ranging from DNA replication to gene expression control. Identification of DNA-binding proteins is one of the major challenges in the field of genome annotation. There have been several computational methods proposed in the literature to deal with the DNA-binding protein identification. However, most of them can't provide an invaluable knowledge base for our understanding of DNA-protein interactions. Results:: We firstly presented a new protein sequence encoding method called PSSM Distance Transformation, and then constructed a DNA-binding protein identification method (SVM-PSSM-DT) by combining PSSM Distance Transformation with support vector machine (SVM). First, the PSSM profiles are generated by using the PSI-BLAST program to search the non-redundant (NR) database. Next, the PSSM profiles are transformed into uniform numeric representations appropriately by distance transformation scheme. Lastly, the resulting uniform numeric representations are inputted into a SVM classifier for prediction. Thus whether a sequence can bind to DNA or not can be determined. In benchmark test on 525 DNA-binding and 550 non DNA-binding proteins using jackknife validation, the present model achieved an ACC of 79.96%, MCC of 0.622 and AUC of 86.50%. This performance is considerably better than most of the existing state-of-the-art predictive methods. When tested on a recently constructed independent dataset PDB186, SVM-PSSM-DT also achieved the best performance with ACC of 80.00%, MCC of 0.647 and AUC of 87.40%, and outperformed some existing state-of-the-art methods. Conclusions:: The experiment results demonstrate that PSSM Distance Transformation is an available protein sequence encoding method and SVM-PSSM-DT is a useful tool for identifying the DNA-binding proteins. A user-friendly web-server of SVM-PSSM-DT was constructed, which is freely accessible to the public at the web-site on http://bioinformatics.hitsz.edu.cn/PSSM-DT/.
引用
收藏
页数:12
相关论文
共 78 条
[71]   Prediction of interactiveness of proteins and nucleic acids based on feature selections [J].
Yuan, YouLang ;
Shi, XiaoHe ;
Li, XinLei ;
Lu, WenCong ;
Cai, YuDong ;
Gu, Lei ;
Liu, Liang ;
Li, MinJie ;
Kong, XiangYin ;
Xing, Meng .
MOLECULAR DIVERSITY, 2010, 14 (04) :627-633
[72]   An Improved Profile-Level Domain Linker Propensity Index for Protein Domain Boundary Prediction [J].
Zhang, Yanfeng ;
Liu, Bin ;
Dong, Qiwen ;
Jin, Victor X. .
PROTEIN AND PEPTIDE LETTERS, 2011, 18 (01) :7-16
[73]   Descriptor-based protein remote homology identification [J].
Zhang, ZD ;
Kochhar, S ;
Grigorov, MG .
PROTEIN SCIENCE, 2005, 14 (02) :431-444
[74]   Prediction of Lysine Ubiquitylation with Ensemble Classifier and Feature Selection [J].
Zhao, Xiaowei ;
Li, Xiangtao ;
Ma, Zhiqiang ;
Yin, Minghao .
INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2011, 12 (12) :8347-8361
[75]   An improved sequence based prediction protocol for DNA-binding proteins using SVM and comprehensive feature analysis [J].
Zou, Chuanxin ;
Gong, Jiayu ;
Li, Honglin .
BMC BIOINFORMATICS, 2013, 14
[76]   Survey of MapReduce frame operation in bioinformatics [J].
Zou, Quan ;
Li, Xu-Bin ;
Jiang, Wen-Rui ;
Lin, Zi-Yu ;
Li, Gui-Lin ;
Chen, Ke .
BRIEFINGS IN BIOINFORMATICS, 2014, 15 (04) :637-647
[77]  
Zou Q, 2013, BIOMED RES INT-UK, V2013, DOI [10.1155/2013/686090, 10.1155/2013/608430]
[78]   BinMemPredict: a Web Server and Software for Predicting Membrane Protein Types [J].
Zou, Quan ;
Li, Xubin ;
Jiang, Yi ;
Zhao, Yuming ;
Wang, Guohua .
CURRENT PROTEOMICS, 2013, 10 (01) :2-9