Prediction of bioluminescent proteins by using sequence-derived features and lineage-specific scheme

被引:20
|
作者
Zhang, Jian [1 ,2 ]
Chai, Haiting [1 ]
Yang, Guifu [1 ]
Ma, Zhiqiang [1 ]
机构
[1] Northeast Normal Univ, Sch Comp Sci & Informat Technol, Changchun 130117, Jilin Province, Peoples R China
[2] Xinyang Normal Univ, Sch Comp & Informat Technol, Xinyang 464000, Henan Province, Peoples R China
来源
BMC BIOINFORMATICS | 2017年 / 18卷
基金
中国国家自然科学基金;
关键词
Bioluminescent proteins; Sequence-derived; Feature analysis; Lineage-specific; SUPPORT VECTOR MACHINES; COLOR; CLASSIFICATION; RESIDUES;
D O I
10.1186/s12859-017-1709-6
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Bioluminescent proteins (BLPs) widely exist in many living organisms. As BLPs are featured by the capability of emitting lights, they can be served as biomarkers and easily detected in biomedical research, such as gene expression analysis and signal transduction pathways. Therefore, accurate identification of BLPs is important for disease diagnosis and biomedical engineering. In this paper, we propose a novel accurate sequence-based method named PredBLP (Prediction of BioLuminescent Proteins) to predict BLPs. Results: We collect a series of sequence-derived features, which have been proved to be involved in the structure and function of BLPs. These features include amino acid composition, dipeptide composition, sequence motifs and physicochemical properties. We further prove that the combination of four types of features outperforms any other combinations or individual features. To remove potential irrelevant or redundant features, we also introduce Fisher Markov Selector together with Sequential Backward Selection strategy to select the optimal feature subsets. Additionally, we design a lineage-specific scheme, which is proved to be more effective than traditional universal approaches. Conclusion: Experiment on benchmark datasets proves the robustness of PredBLP. We demonstrate that lineagespecific models significantly outperform universal ones. We also test the generalization capability of PredBLP based on independent testing datasets as well as newly deposited BLPs in UniProt. PredBLP is proved to be able to exceed many state-of-art methods. A web server named PredBLP, which implements the proposed method, is free available for academic use.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] Lineage-specific expansion of proteins exported to erythrocytes in malaria parasites
    Tobias J Sargeant
    Matthias Marti
    Elisabet Caler
    Jane M Carlton
    Ken Simpson
    Terence P Speed
    Alan F Cowman
    Genome Biology, 7
  • [32] Features of lineage-specific hematopoietic metabolism revealed by mitochondrial proteomics
    Billing, Claudia
    Walker, Michael
    Noack, Nicole
    Boehme, Christian
    Ceglarek, Uta
    Niederwieser, Dietger
    Whetton, Anthony
    Cross, Michael
    PROTEOMICS, 2017, 17 (15-16)
  • [33] Lineage-specific expansion of proteins exported to erythrocytes in malaria parasites
    Sargeant, TJ
    Marti, M
    Caler, E
    Carlton, JM
    Simpson, K
    Speed, TP
    Cowman, AF
    GENOME BIOLOGY, 2006, 7 (02)
  • [34] Apicomplexan lineage-specific polytopic membrane proteins in Cryptosporidium parvum
    Rajapandi T.
    Journal of Parasitic Diseases, 2020, 44 (2) : 467 - 471
  • [35] Predicting linear B-cell epitopes by using sequence-derived structural and physicochemical features
    Zhang, Wen
    Liu, Juan
    Zhao, Meng
    Li, Qingjiao
    INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2012, 6 (05) : 557 - 569
  • [36] Predicting Membrane Protein Expression in Yeast from Sequence-Derived Features
    Schulte, Samuel J.
    Saladi, Shyam
    Clemons, William M.
    BIOPHYSICAL JOURNAL, 2017, 112 (03) : 355A - 356A
  • [37] A pursuit of lineage-specific and niche-specific proteome features in the world of archaea
    Chowdhury, Anindya Roy
    Dutta, Chitra
    BMC GENOMICS, 2012, 13
  • [38] A pursuit of lineage-specific and niche-specific proteome features in the world of archaea
    Anindya Roy Chowdhury
    Chitra Dutta
    BMC Genomics, 13
  • [39] CRYSpred: Accurate Sequence-Based Protein Crystallization Propensity Prediction Using Sequence-Derived Structural Characteristics
    Mizianty, Marcin J.
    Kurgan, Lukasz A.
    PROTEIN AND PEPTIDE LETTERS, 2012, 19 (01): : 40 - 49
  • [40] TYLER, a fast method that accurately predicts cyclin-dependent proteins by using computation-based motifs and sequence-derived features
    Zhang, Jian
    Liang, Xingchen
    Zhou, Feng
    Li, Bo
    Li, Yanling
    MATHEMATICAL BIOSCIENCES AND ENGINEERING, 2021, 18 (05) : 6410 - 6429