A novel approach to unlocking the synergy of large language models and chemical knowledge in biomedical signal applications

被引：0

作者：

Yin, Zilong ^{[1
]}

Wang, Haoyu ^{[1
]}

Chen, Bin ^{[1
,4
]}

Sun, Hangling ^{[2
]}

Li, Anji ^{[3
]}

Zhou, Chenyu ^{[5
,6
]}

机构：

[1] Univ Shanghai Sci & Technol, Shanghai, Peoples R China

[2] Hengtu Imalligent Technol Shanghai Co Ltd, Shanghai, Peoples R China

[3] Abbott Labs Shanghai Co Ltd, Shanghai, Peoples R China

[4] Shanghai Jiao Tong Univ, Shanghai, Peoples R China

[5] Xinjiang Univ, Urumqi, Peoples R China

[6] Tsinghua Univ, Beijing, Peoples R China

来源：

BIOMEDICAL SIGNAL PROCESSING AND CONTROL | 2025年 / 103卷

关键词：

Biomedical signal processing; Supervised chemical knowledge; Large language models; Molecular property prediction; PREDICTION; EXPLORATION; ENTITIES; DATABASE;

D O I：

10.1016/j.bspc.2024.107388

中图分类号：

R318 [生物医学工程];

学科分类号：

0831 ;

摘要：

This work explores the potential of using the pre-trained large language model Llama2 to address challenges in biomedical signal processing and control (BSPC), particularly in predicting the electronic and functional properties of organic molecules, an area of growing importance infields such as drug discovery and materials science. Current approaches in BSPC often rely on specialized graph neural network models, which can be limited in their ability to capture the complex relationships inherent in molecular structures. To address this, we demonstrate that a fine-tuned Llama2 model can accurately predict the highest occupied molecular orbital (HOMO) and lowest unoccupied molecular orbital (LUMO) energies of organic semiconductor molecules, with performance comparable to state-of-the-art specialized models. To further enhance the model's robustness and generalization, we propose several key innovations, including optimized simplified molecular input line entry system (SMILES) tokenization, incorporation of chemical knowledge as auxiliary supervised tasks, and a low-rank adaptation (LORA) based fine-tuning strategy. These techniques enable the language model to simultaneously learn SMILES prediction and acquire relevant chemical knowledge, while also improving its handling of incomplete structural information and ability to generalize to "unseen" molecular classes. The work also discusses the limitations of using large language models for molecular property prediction, such as the lack of interpretability and the need for improved handling of non-standard SMILES representations, highlighting the potential of this approach in BSPC while identifying areas for further improvement.

引用

页数：16

共 50 条

[1] Novel applications of large language models in clinical research
Abers, Michael S.
Mathias, Rasika A.
JOURNAL OF ALLERGY AND CLINICAL IMMUNOLOGY, 2025, 155 (03) : 813 - 814
[2] Unlocking the Potentials of Large Language Models in Orthodontics: A Scoping Review
Zheng, Jie
Ding, Xiaoqian
Pu, Jingya Jane
Chung, Sze Man
Ai, Qi Yong H.
Hung, Kuo Feng
Shan, Zhiyi
BIOENGINEERING-BASEL, 2024, 11 (11):
[3] Enhancing Biomedical Question Answering with Large Language Models
Yang, Hua
Li, Shilong
Goncalves, Teresa
INFORMATION, 2024, 15 (08)
[4] Industrial applications of large language models
Mubashar Raza
Zarmina Jahangir
Muhammad Bilal Riaz
Muhammad Jasim Saeed
Muhammad Awais Sattar
Scientific Reports, 15 (1)
[5] Large language models and their applications in bioinformatics
Sarumi, Oluwafemi A.
Heider, Dominik
COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2024, 23 : 3498 - 3505
[6] Large Language Models and Applications: The Rebirth of Enterprise Knowledge Management and the Rise of Prompt Libraries
O'Leary, Daniel E.
IEEE INTELLIGENT SYSTEMS, 2024, 39 (02) : 72 - 75
[7] Knowledge management in organization and the large language models
Zelenkov, Yu. A.
ROSSIISKII ZHURNAL MENEDZHMENTA-RUSSIAN MANAGEMENT JOURNAL, 2024, 22 (03): : 573 - 601
[8] Evaluating Intelligence and Knowledge in Large Language Models
Bianchini, Francesco
TOPOI-AN INTERNATIONAL REVIEW OF PHILOSOPHY, 2025, 44 (01): : 163 - 173
[9] Knowledge Editing for Large Language Models: A Survey
Wang, Song
Zhu, Yaochen
Liu, Haochen
Zheng, Zaiyi
Chen, Chen
Li, Jundong
ACM COMPUTING SURVEYS, 2025, 57 (03)
[10] Scientific Large Language Models: A Survey on Biological & Chemical Domains
Zhang, Qiang
Ding, Keyan
Lv, Tianwen
Wang, Xinda
Yin, Qingyu
Zhang, Yiwen
Yu, Jing
Wang, Yuhao
Li, Xiaotong
Xiang, Zhuoyi
Zhuang, Xiang
Wang, Zeyuan
Qin, Ming
Zhang, Mengyao
Zhang, Jinlu
Cui, Jiyu
Xu, Renjun
Chen, Hongyang
Fan, Xiaohui
Xing, Huabin
Chen, Huajun
ACM COMPUTING SURVEYS, 2025, 57 (06)

← 1 2 3 4 5 →