In-context learning enables multimodal large language models to classify cancer pathology images

被引:4
|
作者
Ferber, Dyke [1 ,2 ,3 ]
Woelflein, Georg [4 ]
Wiest, Isabella C. [3 ,5 ]
Ligero, Marta [3 ]
Sainath, Srividhya [3 ]
Ghaffari Laleh, Narmin [3 ]
El Nahhas, Omar S. M. [3 ]
Mueller-Franzes, Gustav [6 ]
Jaeger, Dirk [1 ,2 ]
Truhn, Daniel [6 ]
Kather, Jakob Nikolas [1 ,2 ,3 ,7 ]
机构
[1] Heidelberg Univ Hosp, Natl Ctr Tumor Dis NCT, Heidelberg, Germany
[2] Heidelberg Univ Hosp, Dept Med Oncol, Heidelberg, Germany
[3] Tech Univ Dresden, Else Kroener Fresenius Ctr Digital Hlth, Dresden, Germany
[4] Univ St Andrews, Sch Comp Sci, St Andrews, Scotland
[5] Heidelberg Univ, Med Fac Mannheim, Dept Med 2, Mannheim, Germany
[6] Univ Hosp Aachen, Dept Diagnost & Intervent Radiol, Aachen, Germany
[7] Univ Hosp Dresden, Dept Med 1, Dresden, Germany
关键词
D O I
10.1038/s41467-024-51465-9
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Medical image classification requires labeled, task-specific datasets which are used to train deep learning networks de novo, or to fine-tune foundation models. However, this process is computationally and technically demanding. In language processing, in-context learning provides an alternative, where models learn from within prompts, bypassing the need for parameter updates. Yet, in-context learning remains underexplored in medical image analysis. Here, we systematically evaluate the model Generative Pretrained Transformer 4 with Vision capabilities (GPT-4V) on cancer image processing with in-context learning on three cancer histopathology tasks of high importance: Classification of tissue subtypes in colorectal cancer, colon polyp subtyping and breast tumor detection in lymph node sections. Our results show that in-context learning is sufficient to match or even outperform specialized neural networks trained for particular tasks, while only requiring a minimal number of samples. In summary, this study demonstrates that large vision language models trained on non-domain specific data can be applied out-of-the box to solve medical image-processing tasks in histopathology. This democratizes access of generalist AI models to medical experts without technical background especially for areas where annotated data is scarce. Medical image classification remains a challenging process in deep learning. Here, the authors evaluate a large vision language foundation model (GPT-4V) with in-context learning for cancer image processing and show that such models can learn from examples and reach performance similar to specialized neural networks while reducing the gap to current state-of-the art pathology foundation models.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] Customizing Language Model Responses with Contrastive In-Context Learning
    Gao, Xiang
    Das, Kamalika
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 16, 2024, : 18039 - 18046
  • [32] From System Models to Class Models: An In-Context Learning Paradigm
    Forgione, Marco
    Pura, Filippo
    Piga, Dario
    IEEE CONTROL SYSTEMS LETTERS, 2023, 7 : 3513 - 3518
  • [33] Improving Grapheme-to-Phoneme Conversion through In-Context Knowledge Retrieval with Large Language Models
    Han, Dongrui
    Cui, Mingyu
    Kang, Jiawen
    Wu, Xixin
    Liu, Xunying
    Meng, Helen
    2024 IEEE 14TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, ISCSLP 2024, 2024, : 631 - 635
  • [34] Explore the Textual Perception Ability on the Images for Multimodal Large Language Models
    Kuang, Jiayi
    Ouyang, Jiarui
    Shen, Ying
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT V, NLPCC 2024, 2025, 15363 : 300 - 311
  • [35] Capability of multimodal large language models to interpret pediatric radiological images
    Reith, Thomas P.
    D'Alessandro, Donna M.
    D'Alessandro, Michael P.
    PEDIATRIC RADIOLOGY, 2024, : 1729 - 1737
  • [36] Diluie: constructing diverse demonstrations of in-context learning with large language model for unified information extraction
    Guo Q.
    Guo Y.
    Zhao J.
    Neural Computing and Applications, 2024, 36 (22) : 13491 - 13512
  • [37] In-context Learning for Few-shot Multimodal Named Entity Recognition
    Cai, Chenran
    Wang, Qianlong
    Liang, Bin
    Qin, Bing
    Yang, Min
    Wong, Kam-Fai
    Xu, Ruifeng
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 2969 - 2979
  • [38] Multimodal large language models for inclusive collaboration learning tasks
    Lewis, Armanda
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES: PROCEEDINGS OF THE STUDENT RESEARCH WORKSHOP, 2022, : 202 - 210
  • [39] Meta-in-context learning in large language models
    Coda-Forno, Julian
    Binz, Marcel
    Akata, Zeynep
    Botvinick, Matthew
    Wang, Jane X.
    Schulz, Eric
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [40] UniCode: Learning a Unified Codebook for Multimodal Large Language Models
    Zheng, Sipeng
    Zhou, Bohan
    Feng, Yicheng
    Wang, Ye
    Lu, Zongqing
    COMPUTER VISION - ECCV 2024, PT VIII, 2025, 15066 : 426 - 443