T3SEpp: an Integrated Prediction Pipeline for Bacterial Type III Secreted Effectors

被引:24
|
作者
Hui, Xinjie [1 ]
Chen, Zewei [1 ]
Lin, Mingxiong [2 ]
Zhang, Junya [1 ]
Hu, Yueming [1 ]
Zeng, Yingying [1 ]
Cheng, Xi [1 ]
Le Ou-Yang [2 ]
Sun, Ming-an [3 ]
White, Aaron P. [4 ]
Wang, Yejun [1 ]
机构
[1] Shenzhen Univ Hlth Sci, Sch Basic Med, Dept Cell Biol & Genet, Shenzhen, Peoples R China
[2] Shenzhen Univ, Coll Informat Engn, Guangdong Key Lab Intelligent Informat Proc, Shenzhen Key Lab Media Secur, Shenzhen, Peoples R China
[3] Yangzhou Univ, Coll Vet Med, Yangzhou, Jiangsu, Peoples R China
[4] Univ Saskatchewan, VIDO InterVac, Saskatoon, SK, Canada
关键词
effector; machine learning; prediction; T3SEpp; T3SS; type III secretion system; HIDDEN MARKOV MODEL; VIRULENCE FACTORS; SYSTEM; IDENTIFICATION; PROTEINS; TRANSLOCATION; TYPHIMURIUM; BINDING; IV;
D O I
10.1128/mSystems.00288-20
中图分类号
Q93 [微生物学];
学科分类号
071005 ; 100705 ;
摘要
Many Gram-negative bacteria infect hosts and cause diseases by translocating a variety of type III secreted effectors (T3SEs) into the host cell cytoplasm. However, despite a dramatic increase in the number of available whole-genome sequences, it remains challenging for accurate prediction of T3SEs. Traditional prediction models have focused on atypical sequence features buried in the N-terminal peptides of T3SEs, but unfortunately, these models have had high false-positive rates. In this research, we integrated promoter information along with characteristic protein features for signal regions, chaperone-binding domains, and effector domains for T3SE prediction. Machine learning algorithms, including deep learning, were adopted to predict the atypical features mainly buried in signal sequences of T3SEs, followed by development of a voting-based ensemble model integrating the individual prediction results. We assembled this into a unified T3SE prediction pipeline, T3SEpp, which integrated the results of individual modules, resulting in high accuracy (i.e., similar to 0.94) and >1-fold reduction in the false-positive rate compared to that of state-of-the-art software tools. The T3SEpp pipeline and sequence features observed here will facilitate the accurate identification of new T3SEs, with numerous benefits for future studies on host-pathogen interactions. IMPORTANCE Type III secreted effector (T3SE) prediction remains a big computational challenge. In practical applications, current software tools often suffer problems of high false-positive rates. One of the causal factors could be the relatively unitary type of biological features used for the design and training of the models. In this research, we made a comprehensive survey on the sequence-based features of T3SEs, including signal sequences, chaperone-binding domains, effector domains, and transcription factor binding promoter sites, and assembled a unified prediction pipeline integrating multi-aspect biological features within homology-based and multiple machine learning models. To our knowledge, we have compiled the most comprehensive biological sequence feature analysis for T3SEs in this research. The T3SEpp pipeline integrating the variety of features and assembling different models showed high accuracy, which should facilitate more accurate identification of T3SEs in new and existing bacterial whole-genome sequences.
引用
收藏
页数:19
相关论文
共 50 条
  • [1] Prediction of New Bacterial Type III Secreted Effectors with a Recursive Hidden Markov Model Profile-Alignment Strategy
    Guo, Zhirong
    Cheng, Xi
    Hui, Xinjie
    Shu, Xingsheng
    White, Aaron P.
    Hu, Yueming
    Wang, Yejun
    CURRENT BIOINFORMATICS, 2018, 13 (03) : 280 - 289
  • [2] A new feature selection method for computational prediction of type III secreted effectors
    Yang, Yang
    Qi, Sihui
    INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2014, 10 (04) : 440 - 454
  • [3] Prediction of bacterial type IV secreted effectors by C-terminal features
    Wang, Yejun
    Wei, Xiaowei
    Bao, Hongxia
    Liu, Shu-Lin
    BMC GENOMICS, 2014, 15
  • [4] BEAN 2.0: an integrated web resource for the identification and functional analysis of type III secreted effectors
    Dong, Xiaobao
    Lu, Xiaotian
    Zhang, Ziding
    DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2015,
  • [5] Prediction of bacterial type IV secreted effectors by C-terminal features
    Yejun Wang
    Xiaowei Wei
    Hongxia Bao
    Shu-Lin Liu
    BMC Genomics, 15
  • [6] High-accuracy prediction of bacterial type III secreted effectors based on position-specific amino acid composition profiles
    Wang, Yejun
    Zhang, Qing
    Sun, Ming-an
    Guo, Dianjing
    BIOINFORMATICS, 2011, 27 (06) : 777 - 784
  • [7] Feature Reduction Using a Topic Model for the Prediction of Type III Secreted Effectors
    Qi, Sihui
    Yang, Yang
    Song, Anjun
    NEURAL INFORMATION PROCESSING, PT I, 2011, 7062 : 155 - +
  • [8] SMOPredT4SE: An Effective Prediction of Bacterial Type IV Secreted Effectors Using SVM Training With SMO
    Yan, Zihao
    Chen, Dong
    Teng, Zhixia
    Wang, Donghua
    Li, Yanjuan
    IEEE ACCESS, 2020, 8 : 25570 - 25578
  • [9] Targeting effectors: the molecular recognition of Type III secreted proteins
    Arnold, Roland
    Jehl, Andre
    Rattei, Thomas
    MICROBES AND INFECTION, 2010, 12 (05) : 346 - 358
  • [10] Effective prediction of bacterial type IV secreted effectors by combined features of both C-termini and N-termini
    Wang, Yu
    Guo, Yanzhi
    Pu, Xuemei
    Li, Menglong
    JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, 2017, 31 (11) : 1029 - 1038