Foundation models for bioinformatics

被引:3
作者
Chen, Ziyu [1 ,2 ,3 ]
Wei, Lin [1 ,2 ,3 ]
Gao, Ge [1 ,2 ,3 ]
机构
[1] Peking Univ, Biomed Pioneering Innovat Ctr BIOP, Sch Life Sci, State Key Lab Prot & Plant Gene Res, Beijing, Peoples R China
[2] Peking Univ, Beijing Adv Innovat Ctr Genom ICG, Ctr Bioinformat CBI, Beijing, Peoples R China
[3] Changping Lab, Beijing, Peoples R China
关键词
ChatGPT; foundation models; large language models; transformer;
D O I
10.1002/qub2.69
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Transformer-based foundation models such as ChatGPTs have revolutionized our daily life and affected many fields including bioinformatics. In this perspective, we first discuss about the direct application of textual foundation models on bioinformatics tasks, focusing on how to make the most out of canonical large language models and mitigate their inherent flaws. Meanwhile, we go through the transformer-based, bioinformatics-tailored foundation models for both sequence and non-sequence data. In particular, we envision the further development directions as well as challenges for bioinformatics foundation models.
引用
收藏
页码:339 / 344
页数:6
相关论文
共 80 条
[1]  
Abnar S, 2020, 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), P4190
[2]   A comprehensive evaluation of large language models in mining gene relations and pathway knowledge [J].
Azam, Muhammad ;
Chen, Yibo ;
Arowolo, Micheal Olaolu ;
Liu, Haowang ;
Popescu, Mihail ;
Xu, Dong .
QUANTITATIVE BIOLOGY, 2024, 12 (04) :360-374
[3]  
Bahdanau D, 2016, Arxiv, DOI [arXiv:1409.0473, 10.48550/arXiv.1409.0473, DOI 10.48550/ARXIV.1409.0473]
[4]   DNA language models are powerful predictors of genome-wide variant effects [J].
Benegas, Gonzalo ;
Batra, Sanjit Singh ;
Song, Yun S. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2023, 120 (44)
[5]  
Bommasani R., 2021, PREPRINT
[6]  
Borgeaud S., 2021, PREPRINT
[7]  
Brown T. B., 2020, PREPRINT
[8]  
Chase H., 2022, LangChain
[9]  
Chen J., 2022, PREPRINT
[10]  
Chen M, 2020, PR MACH LEARN RES, V119