Prediction of bioluminescent proteins by using sequence-derived features and lineage-specific scheme

被引:20
|
作者
Zhang, Jian [1 ,2 ]
Chai, Haiting [1 ]
Yang, Guifu [1 ]
Ma, Zhiqiang [1 ]
机构
[1] Northeast Normal Univ, Sch Comp Sci & Informat Technol, Changchun 130117, Jilin Province, Peoples R China
[2] Xinyang Normal Univ, Sch Comp & Informat Technol, Xinyang 464000, Henan Province, Peoples R China
来源
BMC BIOINFORMATICS | 2017年 / 18卷
基金
中国国家自然科学基金;
关键词
Bioluminescent proteins; Sequence-derived; Feature analysis; Lineage-specific; SUPPORT VECTOR MACHINES; COLOR; CLASSIFICATION; RESIDUES;
D O I
10.1186/s12859-017-1709-6
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Bioluminescent proteins (BLPs) widely exist in many living organisms. As BLPs are featured by the capability of emitting lights, they can be served as biomarkers and easily detected in biomedical research, such as gene expression analysis and signal transduction pathways. Therefore, accurate identification of BLPs is important for disease diagnosis and biomedical engineering. In this paper, we propose a novel accurate sequence-based method named PredBLP (Prediction of BioLuminescent Proteins) to predict BLPs. Results: We collect a series of sequence-derived features, which have been proved to be involved in the structure and function of BLPs. These features include amino acid composition, dipeptide composition, sequence motifs and physicochemical properties. We further prove that the combination of four types of features outperforms any other combinations or individual features. To remove potential irrelevant or redundant features, we also introduce Fisher Markov Selector together with Sequential Backward Selection strategy to select the optimal feature subsets. Additionally, we design a lineage-specific scheme, which is proved to be more effective than traditional universal approaches. Conclusion: Experiment on benchmark datasets proves the robustness of PredBLP. We demonstrate that lineagespecific models significantly outperform universal ones. We also test the generalization capability of PredBLP based on independent testing datasets as well as newly deposited BLPs in UniProt. PredBLP is proved to be able to exceed many state-of-art methods. A web server named PredBLP, which implements the proposed method, is free available for academic use.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Prediction of bioluminescent proteins by using sequence-derived features and lineage-specific scheme
    Jian Zhang
    Haiting Chai
    Guifu Yang
    Zhiqiang Ma
    BMC Bioinformatics, 18
  • [2] Identification of Mammalian Enzymatic Proteins Based on Sequence-Derived Features and Species-Specific Scheme
    Chai, Haiting
    Zhang, Jian
    IEEE ACCESS, 2018, 6 : 8452 - 8458
  • [3] NetPhosK - Prediction of kinase-specific phosphorylation from sequence and sequence-derived features
    Miller, ML
    Ponten, TS
    Petersen, TN
    Blom, N
    FEBS JOURNAL, 2005, 272 : 111 - 111
  • [4] Prediction of Bacterial sRNAs Using Sequence-Derived Features and Machine Learning
    Jha, Tony
    Mendel, Jovinna
    Cho, Hyuk
    Choudhary, Madhusudan
    BIOINFORMATICS AND BIOLOGY INSIGHTS, 2022, 16
  • [5] Prediction of Bacterial sRNAs Using Sequence-Derived Features and Machine Learning
    Jha, Tony
    Mendel, Jovinna
    Cho, Hyuk
    Choudhary, Madhusudan
    BIOINFORMATICS AND BIOLOGY INSIGHTS, 2022, 16
  • [6] SCAMPER: Accurate Type-Specific Prediction of Calcium-Binding Residues Using Sequence-Derived Features
    Zhang, Jian
    Zhou, Feng
    Liang, Xingchen
    Yang, Guifu
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2023, 20 (02) : 1406 - 1416
  • [7] Transmembrane region prediction by using sequence-derived features and machine learning methods
    Yan, Renxiang
    Wang, Xiaofeng
    Huang, Lanqing
    Tian, Yarong
    Cai, Weiwen
    RSC ADVANCES, 2017, 7 (46) : 29200 - 29211
  • [8] Prediction of novel archaeal enzymes from sequence-derived features
    Jensen, LJ
    Skovgaard, M
    Brunak, S
    PROTEIN SCIENCE, 2002, 11 (12) : 2894 - 2898
  • [9] On structure prediction and characterization of membrane proteins using a set of sequence-derived membranophobicity scales
    Pilpel, T
    Lancet, D
    Ben-Tal, N
    BIOPHYSICAL JOURNAL, 1999, 76 (01) : A123 - A123
  • [10] Prediction of protein solvent accessibility using PSO-SVR with multiple sequence-derived features and weighted sliding window scheme
    Zhang, Jian
    Chen, Wenhan
    Sun, Pingping
    Zhao, Xiaowei
    Ma, Zhiqiang
    BIODATA MINING, 2015, 8