Realizing Visual Question Answering for Education: GPT-4V as a Multimodal AI

被引:0
作者
Lee, Gyeonggeon [1 ,2 ]
Zhai, Xiaoming [2 ,3 ,4 ]
机构
[1] Natl Inst Educ, Nat Sci & Sci Educ Dept, Nat Sci & Sci Educ, 1 Nanyang Walk, Singapore 637616, Singapore
[2] Univ Georgia, AI4STEM Educ Ctr, 110 Carlton St, Athens, GA 30602 USA
[3] Univ Georgia, Natl GENIUS Ctr, 110 Carlton St, Athens, GA 30602 USA
[4] Univ Georgia, Dept Math Sci & Social Studies Educ, 110 Carlton St, Athens, GA 30602 USA
基金
美国国家科学基金会;
关键词
Artificial intelligence (AI); GPT-4V(ision); Visual question answering; Vision language model; Multimodality;
D O I
10.1007/s11528-024-01035-z
中图分类号
G40 [教育学];
学科分类号
040101 ; 120403 ;
摘要
Educators and researchers have analyzed various image data acquired from teaching and learning, such as images of learning materials, classroom dynamics, students' drawings, etc. However, this approach is labour-intensive and time-consuming, limiting its scalability and efficiency. The recent development in the Visual Question Answering (VQA) technique has streamlined this process by allowing users to posing questions about the images and receive accurate and automatic answers, both in natural language, thereby enhancing efficiency and reducing the time required for analysis. State-of-the-art Vision Language Models (VLMs) such as GPT-4V(ision) have extended the applications of VQA to a wide range of educational purposes. This report employs GPT-4V as an example to demonstrate the potential of VLM in enabling and advancing VQA for education. Specifically, we demonstrated that GPT-4V enables VQA for educational scholars without requiring technical expertise, thereby reducing accessibility barriers for general users. In addition, we contend that GPT-4V spotlights the transformative potential of VQA for educational research, representing a milestone accomplishment for visual data analysis in education.
引用
收藏
页码:271 / 287
页数:17
相关论文
共 35 条
  • [31] DSAF: A Dual-Stage Attention Based Multimodal Fusion Framework for Medical Visual Question Answering
    K. Mukesh
    S. L. Jayaprakash
    R. Prasanna Kumar
    SN Computer Science, 6 (4)
  • [32] Application of a Neural Network-based Visual Question Answering System in Preschool Language Education
    Cheng Y.
    IEIE Transactions on Smart Processing and Computing, 2023, 12 (05) : 419 - 427
  • [33] Visual question answering model based on the fusion of multimodal features by a two-wav co-attention mechanism
    Sharma, Himanshu
    Srivastava, Swati
    IMAGING SCIENCE JOURNAL, 2021, 69 (1-4) : 177 - 189
  • [34] LIT-4-RSVQA: LIGHTWEIGHT TRANSFORMER-BASED VISUAL QUESTION ANSWERING IN REMOTE SENSING
    Hackel, Leonard
    Clasen, Kai Norman
    Ravanbakhsh, Mahdyar
    Demir, Beguem
    IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 2231 - 2234
  • [35] Assessing the performance of zero-shot visual question answering in multimodal large language models for 12-lead ECG image interpretation
    Seki, Tomohisa
    Kawazoe, Yoshimasa
    Ito, Hiromasa
    Akagi, Yu
    Takiguchi, Toru
    Ohe, Kazuhiko
    FRONTIERS IN CARDIOVASCULAR MEDICINE, 2025, 12