BioCoder: a benchmark for bioinformatics code generation with large language models

被引：1

作者：

Tang, Xiangru ^{[1
]}

Qian, Bill ^{[1
]}

Gao, Rick ^{[1
]}

Chen, Jiakang ^{[1
]}

Chen, Xinyun ^{[2
]}

Gerstein, Mark B. ^{[1
,3
,4
,5
,6
]}

机构：

[1] Yale Univ, Dept Comp Sci, New Haven, CT 06520 USA

[2] Google Deepmind, Mountain View, CA 94043 USA

[3] Yale Univ, Program Computat Biol & Bioinformat, New Haven, CT 06520 USA

[4] Yale Univ, Dept Mol Biophys & Biochem, New Haven, CT 06520 USA

[5] Yale Univ, Dept Stat & Data Sci, New Haven, CT 06520 USA

[6] Yale Univ, Dept Biomed Informat & Data Sci, New Haven, CT 06520 USA

来源：

BIOINFORMATICS | 2024年 / 40卷

关键词：

D O I：

10.1093/bioinformatics/btae230

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Pretrained large language models (LLMs) have significantly improved code generation. As these models scale up, there is an increasing need for the output to handle more intricate tasks and to be appropriately specialized to particular domains. Here, we target bioinformatics due to the amount of domain knowledge, algorithms, and data operations this discipline requires. We present BioCoder, a benchmark developed to evaluate LLMs in generating bioinformatics-specific code. BioCoder spans much of the field, covering cross-file dependencies, class declarations, and global variables. It incorporates 1026 Python functions and 1243 Java methods extracted from GitHub, along with 253 examples from the Rosalind Project, all pertaining to bioinformatics. Using topic modeling, we show that the overall coverage of the included code is representative of the full spectrum of bioinformatics calculations. BioCoder incorporates a fuzz-testing framework for evaluation. We have applied it to evaluate various models including InCoder, CodeGen, CodeGen2, SantaCoder, StarCoder, StarCoder+, InstructCodeT5+, GPT-3.5, and GPT-4. Furthermore, we fine-tuned one model (StarCoder), demonstrating that our training dataset can enhance the performance on our testing benchmark (by >15% in terms of Pass@K under certain prompt configurations and always >3%). The results highlight two key aspects of successful models: (i) Successful models accommodate a long prompt (>2600 tokens) with full context, including functional dependencies. (ii) They contain domain-specific knowledge of bioinformatics, beyond just general coding capability. This is evident from the performance gain of GPT-3.5/4 compared to the smaller models on our benchmark (50% versus up to 25%).

引用

页码：i266 / i276

页数：12

共 50 条

[1] JavaBench: A Benchmark of Object-Oriented Code Generation for Evaluating Large Language Models
Cao, Jialun
Chen, Zhiyong
Wu, Jiarong
Cheung, Shing-Chi
Xu, Chang
Proceedings - 2024 39th ACM/IEEE International Conference on Automated Software Engineering, ASE 2024, : 870 - 882
[2] Java']JavaBench: A Benchmark of Object-Oriented Code Generation for Evaluating Large Language Models
Cao, Jialun
Chen, Zhiyong
Wu, Jiarong
Cheung, Shing-Chi
Xu, Chang
PROCEEDINGS OF 2024 39TH ACM/IEEE INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING, ASE 2024, 2024, : 870 - 882
[3] A Comparative Analysis of Large Language Models for Code Documentation Generation
Dvivedi, Shubhang Shekhar
Vijay, Vyshnav
Pujari, Sai Leela Rahul
Lodh, Shoumik
Kumar, Dhruv
PROCEEDINGS OF THE 1ST ACM INTERNATIONAL CONFERENCE ON AI-POWERED SOFTWARE, AIWARE 2024, 2024, : 65 - 73
[4] Knowledge-Aware Code Generation with Large Language Models
Huang, Tao
Sun, Zhihong
Jin, Zhi
Li, Ge
Lyu, Chen
PROCEEDINGS 2024 32ND IEEE/ACM INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION, ICPC 2024, 2024, : 52 - 63
[5] Self-Planning Code Generation with Large Language Models
Jiang, Xue
Dong, Yihong
Wang, Lecheng
Fang, Zheng
Shang, Qiwei
Li, Ge
Jin, Zhi
Jiao, Wenpin
ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2024, 33 (07)
[6] Framework for evaluating code generation ability of large language models
Yeo, Sangyeop
Ma, Yu-Seung
Kim, Sang Cheol
Jun, Hyungkook
Kim, Taeho
ETRI JOURNAL, 2024, 46 (01) : 106 - 117
[7] Large language models and their applications in bioinformatics
Sarumi, Oluwafemi A.
Heider, Dominik
COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2024, 23 : 3498 - 3505
[8] CodeT5+: Open Code Large Language Models for Code Understanding and Generation
Wang, Yue
Le, Hung
Gotmare, Akhilesh Deepak
Bui, Nghi D. Q.
Li, Junnan
Hoi, Steven C. H.
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 1069 - 1088
[9] GREEN-CODE: Optimizing Energy Efficiency in Large Language Models for Code Generation
Ilager, Shashikant
Briem, Lukas Florian
Brandic, Ivona
arXiv,
[10] Automatic Unit Test Code Generation Using Large Language Models
Ocal, Akdeniz Kutay
Keskinoz, Mehmet
32ND IEEE SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU 2024, 2024,

← 1 2 3 4 5 →