Integration of A Deep Learning Classifier with A Random Forest Approach for Predicting Malonylation Sites

被引:75
作者
Chen, Zhen [1 ]
He, Ningning [1 ]
Huang, Yu [2 ]
Qin, Wen Tao [3 ]
Liu, Xuhan [4 ]
Li, Lei [1 ,2 ,5 ]
机构
[1] Qingdao Univ, Sch Basic Med, Qingdao 266021, Peoples R China
[2] Qingdao Univ, Sch Data Sci & Software Engn, Qingdao 266021, Peoples R China
[3] Univ Western Ontario, Schulich Sch Med & Dent, Dept Biochem, London, ON N6A 5C1, Canada
[4] Beijing Oriental Yamei Gene Technol Inst Co Ltd, Dept Informat Technol, Beijing 100078, Peoples R China
[5] Qingdao Univ, Qingdao Canc Inst, Qingdao 266021, Peoples R China
基金
中国国家自然科学基金;
关键词
Deep learning; Recurrent neural network; LSTM; Malonylation; Random forest; LYSINE MALONYLATION; UBIQUITINATION SITES; NEURAL-NETWORKS; PROTEIN; SUCCINYLATION; SETS;
D O I
10.1016/j.gpb.2018.08.004
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
As a newly-identified protein post-translational modification, malonylation is involved in a variety of biological functions. Recognizing malonylation sites in substrates represents an initial but crucial step in elucidating the molecular mechanisms underlying protein malonylation. In this study, we constructed a deep learning (DL) network classifier based on long short-term memory (LSTM) with word embedding (LSTMWE) for the prediction of mammalian malonylation sites. LSTMWE performs better than traditional classifiers developed with common pre-defined feature encodings or a DL classifier based on LSTM with a one-hot vector. The performance of LSTM(WE )is sensitive to the size of the training set, but this limitation can be overcome by integration with a traditional machine learning (ML) classifier. Accordingly, an integrated approach called LEMP was developed, which includes LSTMWE and the random forest classifier with a novel encoding of enhanced amino acid content. LEMP performs not only better than the individual classifiers but also superior to the currently-available malonylation predictors. Additionally, it demonstrates a promising performance with a low false positive rate, which is highly useful in the prediction application. Overall, LEMP is a useful tool for easily identifying malonylation sites with high confidence. LEMP is available at http://www.bioinfogo.org/lemp.
引用
收藏
页码:451 / 459
页数:9
相关论文
共 44 条
[11]   Emerging Trends Word2Vec [J].
Church, Kenneth Ward .
NATURAL LANGUAGE ENGINEERING, 2017, 23 (01) :155-162
[12]   Proteomic and Biochemical Studies of Lysine Malonylation Suggest Its Malonic Aciduria-associated Regulatory Role in Mitochondrial Function and Fatty Acid Oxidation [J].
Colak, Gozde ;
Pougovkina, Olga ;
Dai, Lunzhi ;
Tan, Minjia ;
te Brinke, Heleen ;
Huang, He ;
Cheng, Zhongyi ;
Park, Jeongsoon ;
Wan, Xuelian ;
Liu, Xiaojing ;
Yue, Wyatt W. ;
Wanders, Ronald J. A. ;
Locasale, Jason W. ;
Lombard, David B. ;
de Boer, Vincent C. J. ;
Zhao, Yingming .
MOLECULAR & CELLULAR PROTEOMICS, 2015, 14 (11) :3056-3071
[13]   The ins and outs of signalling [J].
Downward, J .
NATURE, 2001, 411 (6839) :759-762
[14]   Prediction of Protein Lysine Acylation by Integrating Primary Sequence Information with Multiple Functional Features [J].
Du, Yipeng ;
Zhai, Zichao ;
Li, Ying ;
Lu, Ming ;
Cai, Tanxi ;
Zhou, Bo ;
Huang, Lei ;
Wei, Taotao ;
Li, Tingting .
JOURNAL OF PROTEOME RESEARCH, 2016, 15 (12) :4234-4244
[15]   Prediction of protein kinase-specific phosphorylation sites in hierarchical structure using functional information and random forest [J].
Fan, Wenwen ;
Xu, Xiaoyi ;
Shen, Yi ;
Feng, Huanqing ;
Li, Ao ;
Wang, Minghui .
AMINO ACIDS, 2014, 46 (04) :1069-1078
[16]   Improving protein disorder prediction by deep bidirectional long short-term memory recurrent neural networks [J].
Hanson, Jack ;
Yang, Yuedong ;
Paliwal, Kuldip ;
Zhou, Yaoqi .
BIOINFORMATICS, 2017, 33 (05) :685-692
[17]  
He F, 2017, IEEE INT C BIOINFORM, P108, DOI 10.1109/BIBM.2017.8217634
[18]   Capturing non-local interactions by long short-term memory bidirectional recurrent neural networks for improving prediction of protein secondary structure, backbone angles, contact numbers and solvent accessibility [J].
Heffernan, Rhys ;
Yang, Yuedong ;
Paliwal, Kuldip ;
Zhou, Yaoqi .
BIOINFORMATICS, 2017, 33 (18) :2842-2849
[19]   Metabolic Regulation by Lysine Malonylation, Succinylation, and Glutarylation [J].
Hirschey, Matthew D. ;
Zhao, Yingming .
MOLECULAR & CELLULAR PROTEOMICS, 2015, 14 (09) :2308-2315
[20]  
Hochreiter S, 1997, NEURAL COMPUT, V9, P1735, DOI [10.1162/neco.1997.9.8.1735, 10.1007/978-3-642-24797-2, 10.1162/neco.1997.9.1.1]