CPPred-FL: a sequence-based predictor for large-scale identification of cell-penetrating peptides by feature representation learning

被引:113
作者
Qiang, Xiaoli [1 ]
Zhou, Chen [2 ]
Ye, Xiucai [3 ]
Du, Pu-feng [4 ]
Su, Ran [5 ]
Wei, Leyi [2 ]
机构
[1] Guangzhou Univ, Inst Comp Sci & Technol, Guangzhou, Guangdong, Peoples R China
[2] Tianjin Univ, Sch Comp Sci &Technol, Tianjin 300000, Peoples R China
[3] Univ Tsukuba, Dept Comp Sci, Tsukuba Sci City, Tsukuba, Ibaraki, Japan
[4] Tianjin Univ, Coll Intelligence & Comp, Sch Comp Sci & Technol, Tianjin, Peoples R China
[5] Tianjin Univ, Sch Comp Software, Tianjin, Peoples R China
关键词
cell-penetrating peptide; feature representation learning; machine learning; sequence analysis; FEATURE-SELECTION; WEB SERVER; PROTEIN; SITES; SPOTS; DNA;
D O I
10.1093/bib/bby091
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Cell-penetrating peptides (CPPs) have been shown to be a transport vehicle for delivering cargoes into live cells, offering great potential as future therapeutics. It is essential to identify CPPs for better understanding of their functional mechanisms. Machine learning-based methods have recently emerged as a main approach for computational identification of CPPs. However, one of the main challenges and difficulties is to propose an effective feature representation model that sufficiently exploits the inner difference and relevance between CPPs and non-CPPs, in order to improve the predictive performance. In this paper, we have developed CPPred-FL, a powerful bioinformatics tool for fast, accurate and large-scale identification of CPPs. In our predictor, we introduce a new feature representation learning scheme that enables one to learn feature representations from totally 45 well-trained random forest models with multiple feature descriptors from different perspectives, such as compositional information, position-specific information and physicochemical properties, etc. We integrate class and probabilistic information into our feature representations. To improve the feature representation ability, we further remove redundant and irrelevant features by feature space optimization. Benchmarking experiments showed that CPPred-FL, using 19 informative features only, is able to achieve better performance than the state-of-the-art predictors. We anticipate that CPPred-FL will be a powerful tool for large-scale identification of CPPs, facilitating the characterization of their functional mechanisms and accelerating their applications in clinical therapy.
引用
收藏
页码:11 / 23
页数:13
相关论文
共 57 条
[1]  
[Anonymous], NUCL ACIDS RES
[2]  
[Anonymous], P 12 INT S BIOINF RE
[3]  
[Anonymous], BIOINFORMATICS
[4]  
[Anonymous], 2018, BIOINFORMATCIS
[5]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[6]   Prediction and analysis of cell-penetrating peptides using pseudo-amino acid composition and random forest models [J].
Chen, Lei ;
Chu, Chen ;
Huang, Tao ;
Kong, Xiangyin ;
Cai, Yu-Dong .
AMINO ACIDS, 2015, 47 (07) :1485-1493
[7]   iRNA-Methyl: Identifying N6-methyladenosine sites using pseudo nucleotide composition [J].
Chen, Wei ;
Feng, Pengmian ;
Ding, Hui ;
Lin, Hao ;
Chou, Kuo-Chen .
ANALYTICAL BIOCHEMISTRY, 2015, 490 :26-33
[8]   Prediction of bacteriophage proteins located in the host cell using hybrid features [J].
Cheng, Jing-Hui ;
Yang, Hui ;
Liu, Meng-Lu ;
Su, Wei ;
Feng, Peng-Mian ;
Ding, Hui ;
Chen, Wei ;
Lin, Hao .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2018, 180 :64-69
[9]   PredHS: a web server for predicting protein-protein interaction hot spots by using structural neighborhood properties [J].
Deng, Lei ;
Zhang, Qiangfeng Cliff ;
Chen, Zhigang ;
Meng, Yang ;
Guan, Jihong ;
Zhou, Shuigeng .
NUCLEIC ACIDS RESEARCH, 2014, 42 (W1) :W290-W295
[10]   Effective Design of Multifunctional Peptides by Combining Compatible Functions [J].
Diener, Christian ;
Ramos Martinez, Georgina Garza ;
Moreno Blas, Daniel ;
Castillo Gonzalez, David A. ;
Corzo, Gerardo ;
Castro-Obregon, Susana ;
Del Rio, Gabriel .
PLOS COMPUTATIONAL BIOLOGY, 2016, 12 (04)