ASAP: a machine learning framework for local protein properties

被引:10
作者
Brandes, Nadav [1 ]
Ofer, Dan [1 ]
Linial, Michal [1 ]
机构
[1] Hebrew Univ Jerusalem, Alexander Silberman Inst Life Sci, Dept Biol Chem, IL-91904 Jerusalem, Israel
来源
DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION | 2016年
关键词
NEUROPEPTIDE CLEAVAGE SITES; PREDICTION; BIOINFORMATICS; IDENTIFICATION; PRECURSORS; SEQUENCES; PEPTIDES; DATABASE; EXPASY; SERVER;
D O I
10.1093/database/baw133
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Determining residue-level protein properties, such as sites of post-translational modifications (PTMs), is vital to understanding protein function. Experimental methods are costly and time-consuming, while traditional rule-based computational methods fail to annotate sites lacking substantial similarity. Machine Learning (ML) methods are becoming fundamental in annotating unknown proteins and their heterogeneous properties. We present ASAP (Amino-acid Sequence Annotation Prediction), a universal ML framework for predicting residue-level properties. ASAP extracts numerous features from raw sequences, and supports easy integration of external features such as secondary structure, solvent accessibility, intrinsically disorder or PSSM profiles. Features are then used to train ML classifiers. ASAP can create new classifiers within minutes for a variety of tasks, including PTM prediction (e.g. cleavage sites by convertase, phosphoserine modification). We present a detailed case study for ASAP: CleavePred, an ASAP-based model to predict protein precursor cleavage sites, with state-of-the-art results. Protein cleavage is a PTM shared by a wide variety of proteins sharing minimal sequence similarity. Current rule-based methods suffer from high false positive rates, making them suboptimal. The high performance of CleavePred makes it suitable for analyzing new proteomes at a genomic scale. The tool is attractive to protein design, mass spectrometry search engines and the discovery of new bioactive peptides from precursors. ASAP functions as a baseline approach for residue-level protein sequence prediction. CleavePred is freely accessible as a web-based application. Both ASAP and CleavePred are open-source with a flexible Python API.
引用
收藏
页数:10
相关论文
共 56 条
  • [21] Ascaris suum draft genome
    Jex, Aaron R.
    Liu, Shiping
    Li, Bo
    Young, Neil D.
    Hall, Ross S.
    Li, Yingrui
    Yang, Linfeng
    Zeng, Na
    Xu, Xun
    Xiong, Zijun
    Chen, Fangyuan
    Wu, Xuan
    Zhang, Guojie
    Fang, Xiaodong
    Kang, Yi
    Anderson, Garry A.
    Harris, Todd W.
    Campbell, Bronwyn E.
    Vlaminck, Johnny
    Wang, Tao
    Cantacessi, Cinzia
    Schwarz, Erich M.
    Ranganathan, Shoba
    Geldhof, Peter
    Nejsum, Peter
    Sternberg, Paul W.
    Yang, Huanming
    Wang, Jun
    Wang, Jian
    Gasser, Robin B.
    [J]. NATURE, 2011, 479 (7374) : 529 - U257
  • [22] A multi-scale strategy for discovery of novel endogenous neuropeptides in the crustacean nervous system
    Jia, Chenxi
    Lietz, Christopher B.
    Ye, Hui
    Hui, Limei
    Yu, Qing
    Yoo, Sujin
    Li, Lingjun
    [J]. JOURNAL OF PROTEOMICS, 2013, 91 : 1 - 12
  • [23] An expanded evaluation of protein function prediction methods shows an improvement in accuracy
    Jiang, Yuxiang
    Oron, Tal Ronnen
    Clark, Wyatt T.
    Bankapur, Asma R.
    D'Andrea, Daniel
    Lepore, Rosalba
    Funk, Christopher S.
    Kahanda, Indika
    Verspoor, Karin M.
    Ben-Hur, Asa
    Koo, Da Chen Emily
    Penfold-Brown, Duncan
    Shasha, Dennis
    Youngs, Noah
    Bonneau, Richard
    Lin, Alexandra
    Sahraeian, Sayed M. E.
    Martelli, Pier Luigi
    Profiti, Giuseppe
    Casadio, Rita
    Cao, Renzhi
    Zhong, Zhaolong
    Cheng, Jianlin
    Altenhoff, Adrian
    Skunca, Nives
    Dessimoz, Christophe
    Dogan, Tunca
    Hakala, Kai
    Kaewphan, Suwisa
    Mehryary, Farrokh
    Salakoski, Tapio
    Ginter, Filip
    Fang, Hai
    Smithers, Ben
    Oates, Matt
    Gough, Julian
    Toronen, Petri
    Koskinen, Patrik
    Holm, Liisa
    Chen, Ching-Tai
    Hsu, Wen-Lian
    Bryson, Kevin
    Cozzetto, Domenico
    Minneci, Federico
    Jones, David T.
    Chapman, Samuel
    Dukka, B. K. C.
    Khan, Ishita K.
    Kihara, Daisuke
    Ofer, Dan
    [J]. GENOME BIOLOGY, 2016, 17
  • [24] DISOPRED3: precise disordered region predictions with annotated protein-binding activity
    Jones, David T.
    Cozzetto, Domenico
    [J]. BIOINFORMATICS, 2015, 31 (06) : 857 - 863
  • [25] Prediction, conservation analysis, and structural characterization of mammalian mucin-type O-glycosylation sites
    Julenius, K
    Molgaard, A
    Gupta, R
    Brunak, S
    [J]. GLYCOBIOLOGY, 2005, 15 (02) : 153 - 164
  • [26] NeuroPID: a classifier of neuropeptide precursors
    Karsenty, Solange
    Rappoport, Nadav
    Ofer, Dan
    Zair, Adva
    Linial, Michal
    [J]. NUCLEIC ACIDS RESEARCH, 2014, 42 (W1) : W182 - W186
  • [27] NeuroPedia: neuropeptide database and spectral library
    Kim, Yoona
    Bark, Steven
    Hook, Vivian
    Bandeira, Nuno
    [J]. BIOINFORMATICS, 2011, 27 (19) : 2772 - 2773
  • [28] NON-DARWINIAN EVOLUTION
    KING, JL
    JUKES, TH
    [J]. SCIENCE, 1969, 164 (3881) : 788 - +
  • [29] Predicting proteolytic sites in extracellular proteins: only halfway there
    Kliger, Yossef
    Gofer, Eyal
    Wool, Assaf
    Toporik, Amir
    Apatoff, Avihay
    Olshansky, Moshe
    [J]. BIOINFORMATICS, 2008, 24 (08) : 1049 - 1055
  • [30] Prediction of Protein Cleavage Site with Feature Selection by Random Forest
    Li, Bi-Qing
    Cai, Yu-Dong
    Feng, Kai-Yan
    Zhao, Gui-Jun
    [J]. PLOS ONE, 2012, 7 (09):