Feature Extractions for Computationally Predicting Protein Post-Translational Modifications

被引:25
作者
Huang, Guohua [1 ]
Li, Jincheng [1 ]
机构
[1] Shaoyang Univ, Dept Informat Engn, Shaoyang 422000, Hunan, Peoples R China
基金
湖南省自然科学基金; 中国国家自然科学基金;
关键词
Machine learning; feature extraction; PSSM; PTMs; pseudo amino acid composition; position-specific amino acid propensity; composition of K-spaced amino acid pairs; auto correlation functions; INTRINSICALLY UNSTRUCTURED PROTEINS; LYSINE SUCCINYLATION SITES; ACCESSIBLE SURFACE-AREA; PHOSPHORYLATION SITES; METHYLATION SITES; PUPYLATION SITES; UBIQUITINATION SITES; MAMMALIAN PROTEINS; ONLINE PREDICTION; O-GLYCOSYLATION;
D O I
10.2174/1574893612666170707094916
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Post-translational modifications (PTMs) are a key regulating mechanism in the cellular process. It is of importance to quickly and accurately identify PTMs. Both next generation sequencing as well as bioinformatics techniques greatly facilitated discovery of PTMs. Most bioinformatics techniques followed the machine learning framework where feature extraction occupies a key position. Conclusion: The article focuses mainly on reviewing various feature extractions from protein sequence, structure, function, physicochemical and biochemical property and evolution conservation, which were used for predicting PTMs in the machine learning-based methods. The binary encoding, amino acid composition, pseudo amino acid composition, composition of K-spaced amino acid pairs, auto correlation functions, position weight amino acids composition and position-specific amino acid propensity extracted features directly from protein sequences. Encoding based on grouped weight is a hybrid way of feature extraction integrating information both on physicochemical and biochemical property and on sequences. The information on protein structure, especially secondary structure, accessible surface and disorder was used for encoding proteins. The feature extraction from the evolution conservation included position-specific scoring matrix and k-nearest neighbor score. In addition, we discussed some existing problems in the feature extractions.
引用
收藏
页码:387 / 395
页数:9
相关论文
共 127 条
[1]   RVP-net: online prediction of real valued accessible surface area of proteins from single sequences [J].
Ahmad, S ;
Gromiha, MM ;
Sarai, A .
BIOINFORMATICS, 2003, 19 (14) :1849-1851
[2]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[3]   Update on activities at the Universal Protein Resource (UniProt) in 2013 [J].
Apweiler, Rolf ;
Martin, Maria Jesus ;
O'Donovan, Claire ;
Magrane, Michele ;
Alam-Faruque, Yasmin ;
Alpi, Emanuela ;
Antunes, Ricardo ;
Arganiska, Joanna ;
Casanova, Elisabet Barrera ;
Bely, Benoit ;
Bingley, Mark ;
Bonilla, Carlos ;
Britto, Ramona ;
Bursteinas, Borisas ;
Chan, Wei Mun ;
Chavali, Gayatri ;
Cibrian-Uhalte, Elena ;
Da Silva, Alan ;
De Giorgi, Maurizio ;
Dimmer, Emily ;
Fazzini, Francesco ;
Gane, Paul ;
Fedotov, Alexander ;
Castro, Leyla Garcia ;
Garmiri, Penelope ;
Hatton-Ellis, Emma ;
Hieta, Reija ;
Huntley, Rachael ;
Jacobsen, Julius ;
Jones, Rachel ;
Legge, Duncan ;
Liu, Wudong ;
Luo, Jie ;
MacDougall, Alistair ;
Mutowo, Prudence ;
Nightingale, Andrew ;
Orchard, Sandra ;
Patient, Samuel ;
Pichler, Klemens ;
Poggioli, Diego ;
Pundir, Sangya ;
Pureza, Luis ;
Qi, Guoying ;
Rosanoff, Steven ;
Sawford, Tony ;
Sehra, Harminder ;
Turner, Edward ;
Volynkin, Vladimir ;
Wardell, Tony ;
Watkins, Xavier .
NUCLEIC ACIDS RESEARCH, 2013, 41 (D1) :D43-D47
[4]   Solving the protein sequence metric problem [J].
Atchley, WR ;
Zhao, JP ;
Fernandes, AD ;
Drüke, T .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2005, 102 (18) :6395-6400
[5]   AMS 3.0: prediction of post-translational modifications [J].
Basu, Subhadip ;
Plewczynski, Dariusz .
BMC BIOINFORMATICS, 2010, 11
[6]  
Bateman A, 2004, NUCLEIC ACIDS RES, V32, pD138, DOI [10.1093/nar/gkr1065, 10.1093/nar/gkh121, 10.1093/nar/gkp985]
[7]   ProGlycProt: a repository of experimentally characterized prokaryotic glycoproteins [J].
Bhat, Aadil H. ;
Mondal, Homchoru ;
Chauhan, Jagat S. ;
Raghava, Gajendra P. S. ;
Methi, Amrish ;
Rao, Alka .
NUCLEIC ACIDS RESEARCH, 2012, 40 (D1) :D388-D393
[8]   Machine learning approach to predict protein phosphorylation sites by incorporating evolutionary information [J].
Biswas, Ashis Kumer ;
Noman, Nasimul ;
Sikder, Abdur Rahman .
BMC BIOINFORMATICS, 2010, 11
[9]   Application of SVM to predict membrane protein types [J].
Cai, YD ;
Ricardo, PW ;
Jen, CH ;
Chou, KC .
JOURNAL OF THEORETICAL BIOLOGY, 2004, 226 (04) :373-376
[10]   Predicting N-terminal acetylation based on feature selection method [J].
Cai, Yu-Dong ;
Lu, Lin .
BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS, 2008, 372 (04) :862-865