TooT-BERT-C: A study on discriminating ion channels from membrane proteins based on the primary sequence's contextual representation from BERT models

被引:1
作者
Ghazikhani, Hamed [1 ]
Butler, Gregory [1 ]
机构
[1] Concordia Univ, Dept Comp Sci & Software Engn, Montreal, PQ, Canada
来源
2022 9TH INTERNATIONAL CONFERENCE ON BIOINFORMATICS RESEARCH AND APPLICATIONS, ICBRA 2022 | 2022年
关键词
ion channel; BERT; transformers; contextual embeddings; neural networks;
D O I
10.1145/3569192.3569196
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
While ion channels play a critical role in a variety of physiological processes and are a frequent therapeutic target, their function that contributes to disease remains unknown. Computational techniques have emerged as crucial and indispensable tools for understanding ion channels and their function in recent years. This is because their mechanism of action is complex, and a static representation of an ion channel is frequently insufficient to comprehend the underlying process. This article introduces TooT-BERT-C, a technique that utilizes the BERT contextual representation to assess and discriminate ion channels from membrane proteins via a Logistic Regression classifier. Additionally, we compare two alternative BERT models' frozen and fine-tuned representations, namely ProtBERT-BFD and MembraneBERT. When compared to leading deep learning prediction algorithms, TooT-BERT-C has the highest accuracy of 98.24 percent and MCC of 0.85.
引用
收藏
页码:23 / 29
页数:7
相关论文
共 30 条
[1]   Integrative approach for detecting membrane proteins [J].
Alballa, Munira ;
Butler, Gregory .
BMC BIOINFORMATICS, 2020, 21 (Suppl 19)
[2]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[3]  
ALTSCHUL SF, 1990, J MOL BIOL, V215, P403, DOI 10.1006/jmbi.1990.9999
[4]  
Apweiler R, 2004, NUCLEIC ACIDS RES, V32, pD115, DOI [10.1093/nar/gkw1099, 10.1093/nar/gkh131]
[6]   Exploring the dark foldable proteome by considering hydrophobic amino acids topology [J].
Bitard-Feildel, Tristan ;
Callebaut, Isabelle .
SCIENTIFIC REPORTS, 2017, 7
[7]   ProteinBERT: a universal deep-learning model of protein sequence and function [J].
Brandes, Nadav ;
Ofer, Dan ;
Peleg, Yam ;
Rappoport, Nadav ;
Linial, Michal .
BIOINFORMATICS, 2022, 38 (08) :2102-2110
[8]   The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation [J].
Chicco, Davide ;
Jurman, Giuseppe .
BMC GENOMICS, 2020, 21 (01)
[9]  
Devlin J, 2019, Arxiv, DOI [arXiv:1810.04805, 10.48550/arXiv.1810.04805]
[10]  
Elnaggar A., 2021, IEEE T PATTERN ANAL, DOI 10.1101/2020.07.12.199554