BactInt: A domain driven transfer learning approach for extracting inter-bacterial associations from biomedical text

被引:1
作者
Das Baksi, Krishanu [1 ]
Pokhrel, Vatsala [1 ]
Pudavar, Anand Eruvessi [1 ]
Mande, Sharmila S. [1 ]
Kuntal, Bhusan K. [1 ]
机构
[1] Tata Consultancy Serv Ltd, TCS Res, Pune 411057, India
关键词
Bacterial association; Microbiome; Text mining; Entity relationship; Biomedical text; Transfer learning; Bioinformatics; MICROBIOME;
D O I
10.1016/j.compbiolchem.2023.108012
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: The healthy as well as dysbiotic state of an ecosystem like human body is known to be influenced not only by the presence of the bacterial groups in it, but also with respect to the associations within themselves. Evidence reported in biomedical text serves as a reliable source for identifying and ascertaining such inter bacterial associations. However, the complexity of the reported text as well as the ever-increasing volume of information necessitates development of methods for automated and accurate extraction of such knowledge. Methods: A BioBERT (biomedical domain specific language model) based information extraction model for bacterial associations is presented that utilizes learning patterns from other publicly available datasets. Additionally, a specialized sentence corpus has been developed to significantly improve the prediction accuracy of the 'transfer learned' model using a fine-tuning approach. Results: The final model was seen to outperform all other variations (non-transfer learned and non-fine-tuned models) as well as models trained on BioGPT (a domain trained Generative Pre-trained Transformer). To further demonstrate the utility, a case study was performed using bacterial association network data obtained from experimental studies. Conclusion: This study attempts to demonstrate the applicability of transfer learning in a niche field of life sciences where understanding of inter bacterial relationships is crucial to obtain meaningful insights in comprehending microbial community structures across different ecosystems. The study further discusses how such a model can be further improved by fine tuning using limited training data. The results presented and the datasets made available are expected to be a valuable addition in the field of medical informatics and bioinformatics.
引用
收藏
页数:10
相关论文
共 43 条
  • [1] Ahmed S.T., 2005, P ACL ISMB WORKSHOP, P54
  • [2] LitSense: making sense of biomedical literature at sentence level
    Allot, Alexis
    Chen, Qingyu
    Kim, Sun
    Alvarez, Roberto Vera
    Comeau, Donald C.
    Wilbur, W. John
    Lu, Zhiyong
    [J]. NUCLEIC ACIDS RESEARCH, 2019, 47 (W1) : W594 - W599
  • [3] Berg G, 2014, FRONT MICROBIOL, V5, DOI [10.3389/fmicb.2014.00148, 10.3389/fmicb.2014.00491]
  • [4] Bossy R., 2019, P 5 WORKSHOP BIONLP, P121, DOI 10.18653/v1/D19-5719
  • [5] Comparative experiments on learning information extractors for proteins and their interactions
    Bunescu, R
    Ge, RF
    Kate, RJ
    Marcotte, EM
    Mooney, RJ
    Ramani, AK
    Wong, YW
    [J]. ARTIFICIAL INTELLIGENCE IN MEDICINE, 2005, 33 (02) : 139 - 155
  • [6] Chiticariu L., 2013, P 2013 C EMPIRI CA, P827
  • [7] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
  • [8] The Microbiome and Sustainable Healthcare
    Dietert, Rodney R.
    Dietert, Janice M.
    [J]. HEALTHCARE, 2015, 3 (01): : 100 - 129
  • [9] Ding J, 2002, Pac Symp Biocomput, P326
  • [10] dos Santos CN, 2015, PROCEEDINGS OF THE 53RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 7TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1, P626