Prediction of bioluminescent proteins by using sequence-derived features and lineage-specific scheme

被引:20
|
作者
Zhang, Jian [1 ,2 ]
Chai, Haiting [1 ]
Yang, Guifu [1 ]
Ma, Zhiqiang [1 ]
机构
[1] Northeast Normal Univ, Sch Comp Sci & Informat Technol, Changchun 130117, Jilin Province, Peoples R China
[2] Xinyang Normal Univ, Sch Comp & Informat Technol, Xinyang 464000, Henan Province, Peoples R China
来源
BMC BIOINFORMATICS | 2017年 / 18卷
基金
中国国家自然科学基金;
关键词
Bioluminescent proteins; Sequence-derived; Feature analysis; Lineage-specific; SUPPORT VECTOR MACHINES; COLOR; CLASSIFICATION; RESIDUES;
D O I
10.1186/s12859-017-1709-6
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Bioluminescent proteins (BLPs) widely exist in many living organisms. As BLPs are featured by the capability of emitting lights, they can be served as biomarkers and easily detected in biomedical research, such as gene expression analysis and signal transduction pathways. Therefore, accurate identification of BLPs is important for disease diagnosis and biomedical engineering. In this paper, we propose a novel accurate sequence-based method named PredBLP (Prediction of BioLuminescent Proteins) to predict BLPs. Results: We collect a series of sequence-derived features, which have been proved to be involved in the structure and function of BLPs. These features include amino acid composition, dipeptide composition, sequence motifs and physicochemical properties. We further prove that the combination of four types of features outperforms any other combinations or individual features. To remove potential irrelevant or redundant features, we also introduce Fisher Markov Selector together with Sequential Backward Selection strategy to select the optimal feature subsets. Additionally, we design a lineage-specific scheme, which is proved to be more effective than traditional universal approaches. Conclusion: Experiment on benchmark datasets proves the robustness of PredBLP. We demonstrate that lineagespecific models significantly outperform universal ones. We also test the generalization capability of PredBLP based on independent testing datasets as well as newly deposited BLPs in UniProt. PredBLP is proved to be able to exceed many state-of-art methods. A web server named PredBLP, which implements the proposed method, is free available for academic use.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] Prediction of protein-coding small ORFs in multi-species using integrated sequence-derived features and the random forest model
    Yu, Jiafeng
    Jiang, Wenwen
    Zhu, Sen-Bin
    Liao, Zhen
    Dou, Xianghua
    Liu, Jian
    Guo, Feng-Biao
    Dong, Chuan
    METHODS, 2023, 210 : 10 - 19
  • [42] Sequence-Derived Markers of Drug Targets and Potentially Druggable Human Proteins
    Ghadermarzi, Sina
    Li, Xingyi
    Li, Min
    Kurgan, Lukasz
    FRONTIERS IN GENETICS, 2019, 10
  • [43] Prediction of neddylation sites from protein sequences and sequence-derived properties
    Ahmet Sinan Yavuz
    Namık Berk Sözer
    Osman Uğur Sezerman
    BMC Bioinformatics, 16
  • [44] Protein fold recognition using sequence-derived predictions
    Fischer, D
    Eisenberg, D
    PROTEIN SCIENCE, 1996, 5 (05) : 947 - 955
  • [45] Accurate in silico identification of species-specific acetylation sites by integrating protein sequence-derived and functional features
    Yuan Li
    Mingjun Wang
    Huilin Wang
    Hao Tan
    Ziding Zhang
    Geoffrey I. Webb
    Jiangning Song
    Scientific Reports, 4
  • [46] APLpred: A machine learning-based tool for accurate prediction and characterization of asparagine peptide lyases using sequence-derived optimal features
    Malik, Adeel
    Kamli, Majid Rasool
    Sabir, Jamal S. M.
    Rather, Irfan A.
    Phan, Le Thi
    Kim, Chang-Bae
    Manavalan, Balachandran
    METHODS, 2024, 229 : 133 - 146
  • [47] Accurate in silico identification of species-specific acetylation sites by integrating protein sequence-derived and functional features
    Li, Yuan
    Wang, Mingjun
    Wang, Huilin
    Tan, Hao
    Zhang, Ziding
    Webb, Geoffrey I.
    Song, Jiangning
    SCIENTIFIC REPORTS, 2014, 4
  • [48] Prediction of neddylation sites from protein sequences and sequence-derived properties
    Yavuz, Ahmet Sinan
    Sozer, Namik Berk
    Sezerman, Osman Ugur
    BMC BIOINFORMATICS, 2015, 16
  • [49] Conserved processes and lineage-specific proteins in fungal cell wall evolution
    Coronado, Juan E.
    Mneimneh, Saad
    Epstein, Susan L.
    Qiu, Wei-Gang
    Lipke, Peter N.
    EUKARYOTIC CELL, 2007, 6 (12) : 2269 - 2277
  • [50] Screening and discovery of lineage-specific mitosomal membrane proteins in Entamoeba histolytica
    Santos, Herbert J.
    Imai, Kenichiro
    Hanadate, Yuki
    Fukasawa, Yoshinori
    Oda, Toshiyuki
    Mi-ichi, Fumika
    Nozaki, Tomoyoshi
    MOLECULAR AND BIOCHEMICAL PARASITOLOGY, 2016, 209 (1-2) : 10 - 17