Prediction of bioluminescent proteins by using sequence-derived features and lineage-specific scheme

被引:19
作者
Zhang, Jian [1 ,2 ]
Chai, Haiting [1 ]
Yang, Guifu [1 ]
Ma, Zhiqiang [1 ]
机构
[1] Northeast Normal Univ, Sch Comp Sci & Informat Technol, Changchun 130117, Jilin Province, Peoples R China
[2] Xinyang Normal Univ, Sch Comp & Informat Technol, Xinyang 464000, Henan Province, Peoples R China
基金
中国国家自然科学基金;
关键词
Bioluminescent proteins; Sequence-derived; Feature analysis; Lineage-specific; SUPPORT VECTOR MACHINES; COLOR; CLASSIFICATION; RESIDUES;
D O I
10.1186/s12859-017-1709-6
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Bioluminescent proteins (BLPs) widely exist in many living organisms. As BLPs are featured by the capability of emitting lights, they can be served as biomarkers and easily detected in biomedical research, such as gene expression analysis and signal transduction pathways. Therefore, accurate identification of BLPs is important for disease diagnosis and biomedical engineering. In this paper, we propose a novel accurate sequence-based method named PredBLP (Prediction of BioLuminescent Proteins) to predict BLPs. Results: We collect a series of sequence-derived features, which have been proved to be involved in the structure and function of BLPs. These features include amino acid composition, dipeptide composition, sequence motifs and physicochemical properties. We further prove that the combination of four types of features outperforms any other combinations or individual features. To remove potential irrelevant or redundant features, we also introduce Fisher Markov Selector together with Sequential Backward Selection strategy to select the optimal feature subsets. Additionally, we design a lineage-specific scheme, which is proved to be more effective than traditional universal approaches. Conclusion: Experiment on benchmark datasets proves the robustness of PredBLP. We demonstrate that lineagespecific models significantly outperform universal ones. We also test the generalization capability of PredBLP based on independent testing datasets as well as newly deposited BLPs in UniProt. PredBLP is proved to be able to exceed many state-of-art methods. A web server named PredBLP, which implements the proposed method, is free available for academic use.
引用
收藏
页数:13
相关论文
共 56 条
[1]   The effective role of positive charge saturation in bioluminescence color and thermostability of firefly luciferase [J].
Alipour, Bagher Said ;
Hosseinkhani, Saman ;
Ardestani, Sussan K. ;
Moradi, Ali .
PHOTOCHEMICAL & PHOTOBIOLOGICAL SCIENCES, 2009, 8 (06) :847-855
[2]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[3]  
[Anonymous], 2017, BRIEF BIOINFORM
[4]  
[Anonymous], TRAINING
[5]  
[Anonymous], 2014, Camouflage technology
[6]  
[Anonymous], 2011, ACM T INTEL SYST TEC, DOI DOI 10.1145/1961189.1961199
[7]   UniProt: a hub for protein information [J].
Bateman, Alex ;
Martin, Maria Jesus ;
O'Donovan, Claire ;
Magrane, Michele ;
Apweiler, Rolf ;
Alpi, Emanuele ;
Antunes, Ricardo ;
Arganiska, Joanna ;
Bely, Benoit ;
Bingley, Mark ;
Bonilla, Carlos ;
Britto, Ramona ;
Bursteinas, Borisas ;
Chavali, Gayatri ;
Cibrian-Uhalte, Elena ;
Da Silva, Alan ;
De Giorgi, Maurizio ;
Dogan, Tunca ;
Fazzini, Francesco ;
Gane, Paul ;
Cas-tro, Leyla Garcia ;
Garmiri, Penelope ;
Hatton-Ellis, Emma ;
Hieta, Reija ;
Huntley, Rachael ;
Legge, Duncan ;
Liu, Wudong ;
Luo, Jie ;
MacDougall, Alistair ;
Mutowo, Prudence ;
Nightin-gale, Andrew ;
Orchard, Sandra ;
Pichler, Klemens ;
Poggioli, Diego ;
Pundir, Sangya ;
Pureza, Luis ;
Qi, Guoying ;
Rosanoff, Steven ;
Saidi, Rabie ;
Sawford, Tony ;
Shypitsyna, Aleksandra ;
Turner, Edward ;
Volynkin, Vladimir ;
Wardell, Tony ;
Watkins, Xavier ;
Zellner, Hermann ;
Cowley, Andrew ;
Figueira, Luis ;
Li, Weizhong ;
McWilliam, Hamish .
NUCLEIC ACIDS RESEARCH, 2015, 43 (D1) :D204-D212
[8]  
BHASKARAN R, 1988, INT J PEPT PROT RES, V32, P241
[9]   Support Vector Machines for classification and regression [J].
Brereton, Richard G. ;
Lloyd, Gavin R. .
ANALYST, 2010, 135 (02) :230-267
[10]   A tutorial on Support Vector Machines for pattern recognition [J].
Burges, CJC .
DATA MINING AND KNOWLEDGE DISCOVERY, 1998, 2 (02) :121-167