T3SEpp: an Integrated Prediction Pipeline for Bacterial Type III Secreted Effectors

被引:24
|
作者
Hui, Xinjie [1 ]
Chen, Zewei [1 ]
Lin, Mingxiong [2 ]
Zhang, Junya [1 ]
Hu, Yueming [1 ]
Zeng, Yingying [1 ]
Cheng, Xi [1 ]
Le Ou-Yang [2 ]
Sun, Ming-an [3 ]
White, Aaron P. [4 ]
Wang, Yejun [1 ]
机构
[1] Shenzhen Univ Hlth Sci, Sch Basic Med, Dept Cell Biol & Genet, Shenzhen, Peoples R China
[2] Shenzhen Univ, Coll Informat Engn, Guangdong Key Lab Intelligent Informat Proc, Shenzhen Key Lab Media Secur, Shenzhen, Peoples R China
[3] Yangzhou Univ, Coll Vet Med, Yangzhou, Jiangsu, Peoples R China
[4] Univ Saskatchewan, VIDO InterVac, Saskatoon, SK, Canada
关键词
effector; machine learning; prediction; T3SEpp; T3SS; type III secretion system; HIDDEN MARKOV MODEL; VIRULENCE FACTORS; SYSTEM; IDENTIFICATION; PROTEINS; TRANSLOCATION; TYPHIMURIUM; BINDING; IV;
D O I
10.1128/mSystems.00288-20
中图分类号
Q93 [微生物学];
学科分类号
071005 ; 100705 ;
摘要
Many Gram-negative bacteria infect hosts and cause diseases by translocating a variety of type III secreted effectors (T3SEs) into the host cell cytoplasm. However, despite a dramatic increase in the number of available whole-genome sequences, it remains challenging for accurate prediction of T3SEs. Traditional prediction models have focused on atypical sequence features buried in the N-terminal peptides of T3SEs, but unfortunately, these models have had high false-positive rates. In this research, we integrated promoter information along with characteristic protein features for signal regions, chaperone-binding domains, and effector domains for T3SE prediction. Machine learning algorithms, including deep learning, were adopted to predict the atypical features mainly buried in signal sequences of T3SEs, followed by development of a voting-based ensemble model integrating the individual prediction results. We assembled this into a unified T3SE prediction pipeline, T3SEpp, which integrated the results of individual modules, resulting in high accuracy (i.e., similar to 0.94) and >1-fold reduction in the false-positive rate compared to that of state-of-the-art software tools. The T3SEpp pipeline and sequence features observed here will facilitate the accurate identification of new T3SEs, with numerous benefits for future studies on host-pathogen interactions. IMPORTANCE Type III secreted effector (T3SE) prediction remains a big computational challenge. In practical applications, current software tools often suffer problems of high false-positive rates. One of the causal factors could be the relatively unitary type of biological features used for the design and training of the models. In this research, we made a comprehensive survey on the sequence-based features of T3SEs, including signal sequences, chaperone-binding domains, effector domains, and transcription factor binding promoter sites, and assembled a unified prediction pipeline integrating multi-aspect biological features within homology-based and multiple machine learning models. To our knowledge, we have compiled the most comprehensive biological sequence feature analysis for T3SEs in this research. The T3SEpp pipeline integrating the variety of features and assembling different models showed high accuracy, which should facilitate more accurate identification of T3SEs in new and existing bacterial whole-genome sequences.
引用
收藏
页数:19
相关论文
共 50 条
  • [21] Effective prediction of bacterial type IV secreted effectors by combined features of both C-termini and N-termini
    Yu Wang
    Yanzhi Guo
    Xuemei Pu
    Menglong Li
    Journal of Computer-Aided Molecular Design, 2017, 31 : 1029 - 1038
  • [22] A New Means To Identify Type 3 Secreted Effectors: Functionally Interchangeable Class IB Chaperones Recognize a Conserved Sequence
    Costa, Sonia C. P.
    Schmitz, Alexa M.
    Jahufar, Fathima F.
    Boyd, Justin D.
    Cho, Min Y.
    Glicksman, Marcie A.
    Lesser, Cammie F.
    MBIO, 2012, 3 (01): : 1 - 10
  • [23] Pattern recognition receptors and their interactions with bacterial type III effectors in plants
    Lee, Jae Hoon
    Kim, Hyoungseok
    Chae, Won Byoung
    Oh, Man-Ho
    GENES & GENOMICS, 2019, 41 (05) : 499 - 506
  • [24] T3DB: an integrated database for bacterial type III secretion system
    Wang, Yejun
    Huang, He
    Sun, Ming'an
    Zhang, Qing
    Guo, Dianjing
    BMC BIOINFORMATICS, 2012, 13
  • [25] Bastion3: a two-layer ensemble predictor of type III secreted effectors
    Wang, Jiawei
    Li, Jiahui
    Yang, Bingjiao
    Xie, Ruopeng
    Marquez-Lago, Tatiana T.
    Leier, Andre
    Hayashida, Morihiro
    Akutsu, Tatsuya
    Zhang, Yanju
    Chou, Kuo-Chen
    Selkrig, Joel
    Zhou, Tieli
    Song, Jiangning
    Lithgow, Trevor
    BIOINFORMATICS, 2019, 35 (12) : 2017 - 2028
  • [26] T4SE-XGB: Interpretable Sequence-Based Prediction of Type IV Secreted Effectors Using eXtreme Gradient Boosting Algorithm
    Chen, Tianhang
    Wang, Xiangeng
    Chu, Yanyi
    Wang, Yanjing
    Jiang, Mingming
    Wei, Dong-Qing
    Xiong, Yi
    FRONTIERS IN MICROBIOLOGY, 2020, 11
  • [27] Computational prediction of type III secreted proteins from gram-negative bacteria
    Yang, Yang
    Zhao, Jiayuan
    Morgan, Robyn L.
    Ma, Wenbo
    Jiang, Tao
    BMC BIOINFORMATICS, 2010, 11
  • [28] PredT4SE-Stack: Prediction of Bacterial Type IV Secreted Effectors From Protein Sequences Using a Stacked Ensemble Method
    Xiong, Yi
    Wang, Qiankun
    Yang, Junchen
    Zhu, Xiaolei
    Weil, Dong-Qing
    FRONTIERS IN MICROBIOLOGY, 2018, 9
  • [29] Expression of Enteropathogenic Escherichia coli Map Is Significantly Different than That of Other Type III Secreted Effectors In Vivo
    Nguyen, Mai
    Rizvi, Jason
    Hecht, Gail
    INFECTION AND IMMUNITY, 2015, 83 (01) : 130 - 137
  • [30] Pattern recognition receptors and their interactions with bacterial type III effectors in plants
    Jae Hoon Lee
    Hyoungseok Kim
    Won Byoung Chae
    Man-Ho Oh
    Genes & Genomics, 2019, 41 : 499 - 506