A novel approach to unlocking the synergy of large language models and chemical knowledge in biomedical signal applications

被引:0
|
作者
Yin, Zilong [1 ]
Wang, Haoyu [1 ]
Chen, Bin [1 ,4 ]
Sun, Hangling [2 ]
Li, Anji [3 ]
Zhou, Chenyu [5 ,6 ]
机构
[1] Univ Shanghai Sci & Technol, Shanghai, Peoples R China
[2] Hengtu Imalligent Technol Shanghai Co Ltd, Shanghai, Peoples R China
[3] Abbott Labs Shanghai Co Ltd, Shanghai, Peoples R China
[4] Shanghai Jiao Tong Univ, Shanghai, Peoples R China
[5] Xinjiang Univ, Urumqi, Peoples R China
[6] Tsinghua Univ, Beijing, Peoples R China
关键词
Biomedical signal processing; Supervised chemical knowledge; Large language models; Molecular property prediction; PREDICTION; EXPLORATION; ENTITIES; DATABASE;
D O I
10.1016/j.bspc.2024.107388
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
This work explores the potential of using the pre-trained large language model Llama2 to address challenges in biomedical signal processing and control (BSPC), particularly in predicting the electronic and functional properties of organic molecules, an area of growing importance infields such as drug discovery and materials science. Current approaches in BSPC often rely on specialized graph neural network models, which can be limited in their ability to capture the complex relationships inherent in molecular structures. To address this, we demonstrate that a fine-tuned Llama2 model can accurately predict the highest occupied molecular orbital (HOMO) and lowest unoccupied molecular orbital (LUMO) energies of organic semiconductor molecules, with performance comparable to state-of-the-art specialized models. To further enhance the model's robustness and generalization, we propose several key innovations, including optimized simplified molecular input line entry system (SMILES) tokenization, incorporation of chemical knowledge as auxiliary supervised tasks, and a low-rank adaptation (LORA) based fine-tuning strategy. These techniques enable the language model to simultaneously learn SMILES prediction and acquire relevant chemical knowledge, while also improving its handling of incomplete structural information and ability to generalize to "unseen" molecular classes. The work also discusses the limitations of using large language models for molecular property prediction, such as the lack of interpretability and the need for improved handling of non-standard SMILES representations, highlighting the potential of this approach in BSPC while identifying areas for further improvement.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] Novel applications of large language models in clinical research
    Abers, Michael S.
    Mathias, Rasika A.
    JOURNAL OF ALLERGY AND CLINICAL IMMUNOLOGY, 2025, 155 (03) : 813 - 814
  • [2] Unlocking the Potentials of Large Language Models in Orthodontics: A Scoping Review
    Zheng, Jie
    Ding, Xiaoqian
    Pu, Jingya Jane
    Chung, Sze Man
    Ai, Qi Yong H.
    Hung, Kuo Feng
    Shan, Zhiyi
    BIOENGINEERING-BASEL, 2024, 11 (11):
  • [3] Enhancing Biomedical Question Answering with Large Language Models
    Yang, Hua
    Li, Shilong
    Goncalves, Teresa
    INFORMATION, 2024, 15 (08)
  • [4] Industrial applications of large language models
    Mubashar Raza
    Zarmina Jahangir
    Muhammad Bilal Riaz
    Muhammad Jasim Saeed
    Muhammad Awais Sattar
    Scientific Reports, 15 (1)
  • [5] Large language models and their applications in bioinformatics
    Sarumi, Oluwafemi A.
    Heider, Dominik
    COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2024, 23 : 3498 - 3505
  • [6] Large Language Models and Applications: The Rebirth of Enterprise Knowledge Management and the Rise of Prompt Libraries
    O'Leary, Daniel E.
    IEEE INTELLIGENT SYSTEMS, 2024, 39 (02) : 72 - 75
  • [7] Knowledge management in organization and the large language models
    Zelenkov, Yu. A.
    ROSSIISKII ZHURNAL MENEDZHMENTA-RUSSIAN MANAGEMENT JOURNAL, 2024, 22 (03): : 573 - 601
  • [8] Evaluating Intelligence and Knowledge in Large Language Models
    Bianchini, Francesco
    TOPOI-AN INTERNATIONAL REVIEW OF PHILOSOPHY, 2025, 44 (01): : 163 - 173
  • [9] Knowledge Editing for Large Language Models: A Survey
    Wang, Song
    Zhu, Yaochen
    Liu, Haochen
    Zheng, Zaiyi
    Chen, Chen
    Li, Jundong
    ACM COMPUTING SURVEYS, 2025, 57 (03)
  • [10] Scientific Large Language Models: A Survey on Biological & Chemical Domains
    Zhang, Qiang
    Ding, Keyan
    Lv, Tianwen
    Wang, Xinda
    Yin, Qingyu
    Zhang, Yiwen
    Yu, Jing
    Wang, Yuhao
    Li, Xiaotong
    Xiang, Zhuoyi
    Zhuang, Xiang
    Wang, Zeyuan
    Qin, Ming
    Zhang, Mengyao
    Zhang, Jinlu
    Cui, Jiyu
    Xu, Renjun
    Chen, Hongyang
    Fan, Xiaohui
    Xing, Huabin
    Chen, Huajun
    ACM COMPUTING SURVEYS, 2025, 57 (06)