Open-source large language models in action: A bioinformatics chatbot for PRIDE database

被引:5
|
作者
Bai, Jingwen [1 ]
Kamatchinathan, Selvakumar [1 ]
Kundu, Deepti J. [1 ]
Bandla, Chakradhar [1 ]
Vizcaino, Juan Antonio [1 ]
Perez-Riverol, Yasset [1 ,2 ]
机构
[1] European Mol Biol Lab European Bioinformat Inst EM, Wellcome Trust Genome Campus, Cambridge, England
[2] European Mol Biol Lab European Bioinformat Inst EM, Wellcome Trust Genome Campus, Cambridge CB10 1SD, England
基金
英国惠康基金; 英国生物技术与生命科学研究理事会;
关键词
bioinformatics; dataset discoverability; documentation; large language models; proteomics; public data; software architectures; training; SPECTROMETRY-BASED PROTEOMICS;
D O I
10.1002/pmic.202400005
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
We here present a chatbot assistant infrastructure () that simplifies user interactions with the PRIDE database's documentation and dataset search functionality. The framework utilizes multiple Large Language Models (LLM): llama2, chatglm, mixtral (mistral), and openhermes. It also includes a web service API (Application Programming Interface), web interface, and components for indexing and managing vector databases. An Elo-ranking system-based benchmark component is included in the framework as well, which allows for evaluating the performance of each LLM and for improving PRIDE documentation. The chatbot not only allows users to interact with PRIDE documentation but can also be used to search and find PRIDE datasets using an LLM-based recommendation system, enabling dataset discoverability. Importantly, while our infrastructure is exemplified through its application in the PRIDE database context, the modular and adaptable nature of our approach positions it as a valuable tool for improving user experiences across a spectrum of bioinformatics and proteomics tools and resources, among other domains. The integration of advanced LLMs, innovative vector-based construction, the benchmarking framework, and optimized documentation collectively form a robust and transferable chatbot assistant infrastructure. The framework is open-source ().
引用
收藏
页数:7
相关论文
共 50 条
  • [21] OpenMedLM: prompt engineering can out-perform fine-tuning in medical question-answering with open-source large language models
    Maharjan, Jenish
    Garikipati, Anurag
    Singh, Navan Preet
    Cyrus, Leo
    Sharma, Mayank
    Ciobanu, Madalina
    Barnes, Gina
    Thapa, Rahul
    Mao, Qingqing
    Das, Ritankar
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [22] Open Generative Large Language Models for Galician
    Gamallo, Pablo
    Rodriguez, Pablo
    de-Dios-Flores, Iria
    Sotelo, Susana
    Paniagua, Silvia
    Bardanca, Daniel
    Ramom Pichel, Jose
    Garcia, Marcos
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2024, (73): : 259 - 270
  • [23] A Comparative Study of Chatbot Response Generation: Traditional Approaches Versus Large Language Models
    McTear, Michael
    Marokkie, Sheen Varghese
    Bi, Yaxin
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT II, KSEM 2023, 2023, 14118 : 70 - 79
  • [24] Integrating Large Language Models in Bioinformatics Education for Medical Students: Opportunities and Challenges
    Kang, Kai
    Yang, Yuqi
    Wu, Yijun
    Luo, Ren
    ANNALS OF BIOMEDICAL ENGINEERING, 2024, 52 (09) : 2311 - 2315
  • [25] Effect of large language models artificial intelligence chatgpt chatbot on achievement of computer education students
    Mohammed, Ibrahim Abba
    Bello, Ahmed
    Ayuba, Bala
    EDUCATION AND INFORMATION TECHNOLOGIES, 2025,
  • [26] Large Language Models for Software Engineering: Survey and Open Problems
    Fan, Angela
    Gokkaya, Beliz
    Harman, Mark
    Lyubarskiy, Mitya
    Sengupta, Shubho
    Yoo, Shin
    Zhang, Jie M.
    2023 IEEE/ACM INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: FUTURE OF SOFTWARE ENGINEERING, ICSE-FOSE, 2023, : 31 - 53
  • [27] Facilitating university admission using a chatbot based on large language models with retrieval-augmented generation
    Chen, Zheng
    Zou, Di
    Xie, Haoran
    Lou, Huajie
    Pang, Zhiyuan
    EDUCATIONAL TECHNOLOGY & SOCIETY, 2024, 27 (04): : 454 - 470
  • [28] The performance of artificial intelligence chatbot large language models to address skeletal biology and bone health queries
    Cung, Michelle
    Sosa, Branden
    Yang, He S.
    McDonald, Michelle M.
    Matthews, Brya G.
    Vlug, Annegreet G.
    Imel, Erik A.
    Wein, Marc N.
    Stein, Emily Margaret
    Greenblatt, Matthew B.
    JOURNAL OF BONE AND MINERAL RESEARCH, 2024, 39 (02) : 106 - 115
  • [29] Clinfo.ai: An Open-Source Retrieval-Augmented Large Language Model System for Answering Medical Questions using Scientific Literature
    Lozano, Alejandro
    Fleming, Scott L.
    Chiang, Chia-Chun
    Shah, Nigam
    BIOCOMPUTING 2024, PSB 2024, 2024, : 8 - 23
  • [30] Walert: Putting Conversational Search Knowledge into Action by Building and Evaluating a Large Language Model-Powered Chatbot
    Cherumanal, Sachin Pathiyan
    Tian, Lin
    Abushaqra, Futoon M.
    Felipe Magnossao de Paula, Angel
    Ji, Kaixin
    Ali, Halil
    Hettiachchi, Danula
    Trippas, Johanne R.
    Scholer, Falk
    Spina, Damiano
    PROCEEDINGS OF THE 2024 CONFERENCE ON HUMAN INFORMATION INTERACTION AND RETRIEVAL, CHIIR 2024, 2024, : 401 - 405