Optimized Data Set and Feature Construction for Substrate Prediction of Membrane Transporters

被引:2
作者
Denger, Andreas [1 ]
Helms, Volkhard [1 ]
机构
[1] Saarland Univ, Ctr Bioinformat, D-66123 Saarbrucken, Germany
关键词
PROTEIN; GLUTAMINE-DUMPER1; CLASSIFICATION; BLAST; HIT;
D O I
10.1021/acs.jcim.2c00850
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
alpha-Helical transmembrane proteins termed mem-brane transporters mediate the passage of small hydrophilic substrate molecules across biological lipid bilayer membranes. Annotating the specific substrates of the dozens to hundreds of individual transporters of an organism is an important task. In the past, machine learning classifiers have been successfully trained on pan-organism data sets to predict putative substrates of trans-porters. Here, we critically examine the selection of an optimal data set of protein sequence features for the classification task. We focus on membrane transporters of the three model organisms Escherichia coli, Arabidopsis thaliana, and Saccharomyces cerevisiae, as well as human. We show that organism-specific classifiers can be robustly trained if at least 20 samples are available for each substrate class. If information from position-specific scoring matrices is included, such classifiers have F1 scores between 0.85 and 1.00. For the largest data set (A. thaliana), a 4-class classifier yielded an F -score of 0.97. On a pan-organism data set composed of transporters of all four organisms, amino acid and sugar transporters were predicted with an F1 score of 0.91.
引用
收藏
页码:6242 / 6257
页数:16
相关论文
共 32 条
  • [1] TranCEP: Predicting the substrate class of transmembrane transport proteins using compositional, evolutionary, and positional information
    Alballa, Munira
    Aplop, Faizah
    Butler, Gregory
    [J]. PLOS ONE, 2020, 15 (01):
  • [2] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [3] Transferring functional annotations of membrane transporters on the basis of sequence similarity and sequence motifs
    Barghash, Ahmad
    Helms, Volkhard
    [J]. BMC BIOINFORMATICS, 2013, 14
  • [4] Insertion of X-ray structures of proteins in membranes
    Basyn, F
    Spies, B
    Bouffioux, O
    Thomas, A
    Brasseur, R
    [J]. JOURNAL OF MOLECULAR GRAPHICS & MODELLING, 2003, 22 (01) : 11 - 21
  • [5] UniProt: a worldwide hub of protein knowledge
    Bateman, Alex
    Martin, Maria-Jesus
    Orchard, Sandra
    Magrane, Michele
    Alpi, Emanuele
    Bely, Benoit
    Bingley, Mark
    Britto, Ramona
    Bursteinas, Borisas
    Busiello, Gianluca
    Bye-A-Jee, Hema
    Da Silva, Alan
    De Giorgi, Maurizio
    Dogan, Tunca
    Castro, Leyla Garcia
    Garmiri, Penelope
    Georghiou, George
    Gonzales, Daniel
    Gonzales, Leonardo
    Hatton-Ellis, Emma
    Ignatchenko, Alexandr
    Ishtiaq, Rizwan
    Jokinen, Petteri
    Joshi, Vishal
    Jyothi, Dushyanth
    Lopez, Rodrigo
    Luo, Jie
    Lussi, Yvonne
    MacDougall, Alistair
    Madeira, Fabio
    Mahmoudy, Mahdi
    Menchi, Manuela
    Nightingale, Andrew
    Onwubiko, Joseph
    Palka, Barbara
    Pichler, Klemens
    Pundir, Sangya
    Qi, Guoying
    Raj, Shriya
    Renaux, Alexandre
    Lopez, Milagros Rodriguez
    Saidi, Rabie
    Sawford, Tony
    Shypitsyna, Aleksandra
    Speretta, Elena
    Turner, Edward
    Tyagi, Nidhi
    Vasudev, Preethi
    Volynkin, Vladimir
    Wardell, Tony
    [J]. NUCLEIC ACIDS RESEARCH, 2019, 47 (D1) : D506 - D515
  • [6] The Protein Data Bank
    Berman, HM
    Westbrook, J
    Feng, Z
    Gilliland, G
    Bhat, TN
    Weissig, H
    Shindyalov, IN
    Bourne, PE
    [J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 235 - 242
  • [7] Machine learning techniques for protein function prediction
    Bonetta, Rosalin
    Valentino, Gianluca
    [J]. PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2020, 88 (03) : 397 - 413
  • [8] BLAST plus : architecture and applications
    Camacho, Christiam
    Coulouris, George
    Avagyan, Vahram
    Ma, Ning
    Papadopoulos, Jason
    Bealer, Kevin
    Madden, Thomas L.
    [J]. BMC BIOINFORMATICS, 2009, 10
  • [9] The Gene Ontology Resource: 20 years and still GOing strong
    Carbon, S.
    Douglass, E.
    Dunn, N.
    Good, B.
    Harris, N. L.
    Lewis, S. E.
    Mungall, C. J.
    Basu, S.
    Chisholm, R. L.
    Dodson, R. J.
    Hartline, E.
    Fey, P.
    Thomas, P. D.
    Albou, L. P.
    Ebert, D.
    Kesling, M. J.
    Mi, H.
    Muruganujian, A.
    Huang, X.
    Poudel, S.
    Mushayahama, T.
    Hu, J. C.
    LaBonte, S. A.
    Siegele, D. A.
    Antonazzo, G.
    Attrill, H.
    Brown, N. H.
    Fexova, S.
    Garapati, P.
    Jones, T. E. M.
    Marygold, S. J.
    Millburn, G. H.
    Rey, A. J.
    Trovisco, V.
    dos Santos, G.
    Emmert, D. B.
    Falls, K.
    Zhou, P.
    Goodman, J. L.
    Strelets, V. B.
    Thurmond, J.
    Courtot, M.
    Osumi-Sutherland, D.
    Parkinson, H.
    Roncaglia, P.
    Acencio, M. L.
    Kuiper, M.
    Laegreid, A.
    Logie, C.
    Lovering, R. C.
    [J]. NUCLEIC ACIDS RESEARCH, 2019, 47 (D1) : D330 - D338
  • [10] Prediction of transporter targets using efficient RBF networks with PSSM profiles and biochemical properties
    Chen, Shu-An
    Ou, Yu-Yen
    Lee, Tzong-Yi
    Gromiha, M. Michael
    [J]. BIOINFORMATICS, 2011, 27 (15) : 2062 - 2067