ASAP: a machine learning framework for local protein properties

被引:10
作者
Brandes, Nadav [1 ]
Ofer, Dan [1 ]
Linial, Michal [1 ]
机构
[1] Hebrew Univ Jerusalem, Alexander Silberman Inst Life Sci, Dept Biol Chem, IL-91904 Jerusalem, Israel
来源
DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION | 2016年
关键词
NEUROPEPTIDE CLEAVAGE SITES; PREDICTION; BIOINFORMATICS; IDENTIFICATION; PRECURSORS; SEQUENCES; PEPTIDES; DATABASE; EXPASY; SERVER;
D O I
10.1093/database/baw133
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Determining residue-level protein properties, such as sites of post-translational modifications (PTMs), is vital to understanding protein function. Experimental methods are costly and time-consuming, while traditional rule-based computational methods fail to annotate sites lacking substantial similarity. Machine Learning (ML) methods are becoming fundamental in annotating unknown proteins and their heterogeneous properties. We present ASAP (Amino-acid Sequence Annotation Prediction), a universal ML framework for predicting residue-level properties. ASAP extracts numerous features from raw sequences, and supports easy integration of external features such as secondary structure, solvent accessibility, intrinsically disorder or PSSM profiles. Features are then used to train ML classifiers. ASAP can create new classifiers within minutes for a variety of tasks, including PTM prediction (e.g. cleavage sites by convertase, phosphoserine modification). We present a detailed case study for ASAP: CleavePred, an ASAP-based model to predict protein precursor cleavage sites, with state-of-the-art results. Protein cleavage is a PTM shared by a wide variety of proteins sharing minimal sequence similarity. Current rule-based methods suffer from high false positive rates, making them suboptimal. The high performance of CleavePred makes it suitable for analyzing new proteomes at a genomic scale. The tool is attractive to protein design, mass spectrometry search engines and the discovery of new bioactive peptides from precursors. ASAP functions as a baseline approach for residue-level protein sequence prediction. CleavePred is freely accessible as a web-based application. Both ASAP and CleavePred are open-source with a flexible Python API.
引用
收藏
页数:10
相关论文
共 56 条
  • [1] Bridging neuropeptidomics and genomics with bioinformatics: Prediction of mammalian neuropeptide prohormone processing
    Amare, A
    Hummon, AB
    Southey, BR
    Zimmerman, TA
    Rodriguez-Zas, SL
    Sweedler, JV
    [J]. JOURNAL OF PROTEOME RESEARCH, 2006, 5 (05) : 1162 - 1167
  • [2] BioCreative III interactive task: an overview
    Arighi, Cecilia N.
    Roberts, Phoebe M.
    Agarwal, Shashank
    Bhattacharya, Sanmitra
    Cesareni, Gianni
    Chatr-aryamontri, Andrew
    Clematide, Simon
    Gaudet, Pascale
    Giglio, Michelle Gwinn
    Harrow, Ian
    Huala, Eva
    Krallinger, Martin
    Leser, Ulf
    Li, Donghui
    Liu, Feifan
    Lu, Zhiyong
    Maltais, Lois J.
    Okazaki, Naoaki
    Perfetto, Livia
    Rinaldi, Fabio
    Saetre, Rune
    Salgado, David
    Srinivasan, Padmini
    Thomas, Philippe E.
    Toldo, Luca
    Hirschman, Lynette
    Wu, Cathy H.
    [J]. BMC BIOINFORMATICS, 2011, 12
  • [3] ExPASy: SIB bioinformatics resource portal
    Artimo, Panu
    Jonnalagedda, Manohar
    Arnold, Konstantin
    Baratin, Delphine
    Csardi, Gabor
    de Castro, Edouard
    Duvaud, Severine
    Flegel, Volker
    Fortier, Arnaud
    Gasteiger, Elisabeth
    Grosdidier, Aurelien
    Hernandez, Celine
    Ioannidis, Vassilios
    Kuznetsov, Dmitry
    Liechti, Robin
    Moretti, Sebastien
    Mostaguir, Khaled
    Redaschi, Nicole
    Rossier, Gregoire
    Xenarios, Ioannis
    Stockinger, Heinz
    [J]. NUCLEIC ACIDS RESEARCH, 2012, 40 (W1) : W597 - W603
  • [4] Solving the protein sequence metric problem
    Atchley, WR
    Zhao, JP
    Fernandes, AD
    Drüke, T
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2005, 102 (18) : 6395 - 6400
  • [5] Machine learning approach to predict protein phosphorylation sites by incorporating evolutionary information
    Biswas, Ashis Kumer
    Noman, Nasimul
    Sikder, Abdur Rahman
    [J]. BMC BIOINFORMATICS, 2010, 11
  • [6] Boutet Emmanuel, 2007, V406, P89
  • [7] Breiman L., 2001, Machine Learning, V45, P5
  • [8] SVM-Prot: web-based support vector machine software for functional classification of a protein from its primary sequence
    Cai, CZ
    Han, LY
    Ji, ZL
    Chen, X
    Chen, YZ
    [J]. NUCLEIC ACIDS RESEARCH, 2003, 31 (13) : 3692 - 3697
  • [9] TOP-IDP-scale: A new amino acid scale measuring propensity for intrinsic disorder
    Campen, Andrew
    Williams, Ryan M.
    Brown, Celeste J.
    Meng, Jingwei
    Uversky, Vladimir N.
    Dunker, A. Keith
    [J]. PROTEIN AND PEPTIDE LETTERS, 2008, 15 (09) : 956 - 963
  • [10] SCRATCH: a protein structure and structural feature prediction server
    Cheng, J
    Randall, AZ
    Sweredoski, MJ
    Baldi, P
    [J]. NUCLEIC ACIDS RESEARCH, 2005, 33 : W72 - W76