Protein-Protein Interaction Network Extraction Using Text Mining Methods Adds Insight into Autism Spectrum Disorder

被引:1
作者
Nezamuldeen, Leena [1 ,2 ]
Jafri, Mohsin Saleet [1 ,3 ]
机构
[1] George Mason Univ, Sch Syst Biol, Fairfax, VA 22030 USA
[2] King Abdulaziz Univ, King Fahd Med Res Ctr, Jeddah 21589, Saudi Arabia
[3] Univ Maryland, Ctr Biomed Engn & Technol, Sch Med, Baltimore, MD 21201 USA
来源
BIOLOGY-BASEL | 2023年 / 12卷 / 10期
关键词
artificial intelligence; PPI; protein-protein interaction; text mining; BiLSTM; recurrent neural network; FILAMIN; GENE; PHOSPHORYLATION; ACTIVATION; COMPLEX; SITE; RSK;
D O I
10.3390/biology12101344
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Simple Summary Research on proteins and their interactions with other proteins yields many new findings that help explain how diseases emerge. However, manual curation of scientific literature delays new discoveries in the field. Artificial intelligence and deep learning techniques have played a significant part in information extraction from textual forms. In this study, we used text mining and artificial intelligence techniques to address the issue of extracting protein-protein interaction networks from the vast amount of scientific research literature. We have created an automated system consisting of three models using deep learning and natural language processing methods. The accuracy of our first model, which employs recurrent neural networks using sentiment analysis, was 95%. Additionally, the accuracy of our second model, which employs the named entity recognition technique in NLP, was effective and achieved an accuracy of 98%. In comparison to the protein interaction network, we discovered by manual curation of more than 30 articles on Autism Spectrum Disorder, that the automated system testing on 6027 abstracts was successful in developing the network of interactions and provided an improved view. Discovering these networks will greatly help physicians and scientists understand how these molecules interact for physiological, pharmacological, and pathological insight.Abstract Text mining methods are being developed to assimilate the volume of biomedical textual materials that are continually expanding. Understanding protein-protein interaction (PPI) deficits would assist in explaining the genesis of diseases. In this study, we designed an automated system to extract PPIs from the biomedical literature that uses a deep learning sentence classification model, a pretrained word embedding, and a BiLSTM recurrent neural network with additional layers, a conditional random field (CRF) named entity recognition (NER) model, and shortest-dependency path (SDP) model using the SpaCy library in Python. The automated system ensures that it targets sentences that contain PPIs and not just these proteins mentioned in the framework of disease discovery or other context. Our first model achieved 13% greater precision on the Aimed/BioInfr benchmark corpus than the previous state-of-the-art BiLSTM neural network models. The NER model presented in this study achieved 98% precision on the Aimed/BioInfr corpus over previous models. In order to facilitate the production of an accurate representation of the PPI network, the processes were developed to systematically map the protein interactions in the texts. Overall, evaluating our system through the use of 6027 abstracts pertaining to seven proteins associated with Autism Spectrum Disorder completed the manually curated PPI network for these proteins. When it comes to complicated diseases, these networks would assist in understanding how PPI deficits contribute to disease development while also emphasizing the influence of interactions on protein function and biological processes.
引用
收藏
页数:20
相关论文
共 65 条
  • [1] All-paths graph kernel for protein-protein interaction extraction with evaluation of cross-corpus learning
    Airola, Antti
    Pyysalo, Sampo
    Bjoerne, Jari
    Pahikkala, Tapio
    Ginter, Filip
    Salakoski, Tapio
    [J]. BMC BIOINFORMATICS, 2008, 9 (Suppl 11)
  • [2] Whole exome sequencing reveals inherited and de novo variants in autism spectrum disorder: a trio study from Saudi families
    Al-Mubarak, Bashayer
    Abouelhoda, Mohamed
    Omar, Aisha
    AlDhalaan, Hesham
    Aldosari, Mohammed
    Nester, Michael
    Alshamrani, Hussain. A.
    El-Kalioby, Mohamed
    Goljan, Ewa
    Albar, Renad
    Subhani, Shazia
    Tahir, Asma
    Asfahani, Sultana
    Eskandrani, Alaa
    Almusaiab, Ahmed
    Magrashi, Amna
    Shinwari, Jameela
    Monies, Dorota
    Al Tassan, Nada
    [J]. SCIENTIFIC REPORTS, 2017, 7
  • [3] Alberts B., 2002, MOL BIOL CELL
  • [4] LitVar: a semantic search engine for linking genomic variant data in PubMed and PMC
    Allot, Alexis
    Peng, Yifan
    Wei, Chih-Hsuan
    Lee, Kyubum
    Phan, Lon
    Lu, Zhiyong
    [J]. NUCLEIC ACIDS RESEARCH, 2018, 46 (W1) : W530 - W536
  • [5] Causal interactions from proteomic profiles: Molecular data meet pathway knowledge
    Babur, Ozgun
    Luna, Augustin
    Korkut, Anil
    Durupinar, Funda
    Siper, Metin Can
    Dogrusoz, Ugur
    Jacome, Alvaro Sebastian Vaca
    Peckner, Ryan
    Christianson, Karen E.
    Jaffe, Jacob D.
    Spellman, Paul T.
    Aslan, Joseph E.
    Sander, Chris
    Demir, Emek
    [J]. PATTERNS, 2021, 2 (06):
  • [6] Bird S., 2009, Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit, DOI DOI 10.5555/1717171
  • [7] USP9X deubiquitylating enzyme maintains RAPTOR protein levels, mTORC1 signalling and proliferation in neural progenitors
    Bridges, Caitlin R.
    Tan, Men-Chee
    Premarathne, Susitha
    Nanayakkara, Devathri
    Bellette, Bernadette
    Zencak, Dusan
    Domingo, Deepti
    Gecz, Jozef
    Murtaza, Mariyam
    Jolly, Lachlan A.
    Wood, Stephen A.
    [J]. SCIENTIFIC REPORTS, 2017, 7
  • [8] A hybrid approach to extract protein-protein interactions
    Bui, Quoc-Chinh
    Katrenko, Sophia
    Sloot, Peter M. A.
    [J]. BIOINFORMATICS, 2011, 27 (02) : 259 - 265
  • [9] A Stacked BiLSTM Neural Network Based on Coattention Mechanism for Question Answering
    Cai, Linqin
    Zhou, Sitong
    Yan, Xun
    Yuan, Rongdi
    [J]. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2019, 2019
  • [10] MutationFinder: a high-performance system for extracting point mutation mentions from text
    Caporaso, J. Gregory
    Baumgartner, William A., Jr.
    Randolph, David A.
    Cohen, K. Bretonnel
    Hunter, Lawrence
    [J]. BIOINFORMATICS, 2007, 23 (14) : 1862 - 1865