Realizing Visual Question Answering for Education: GPT-4V as a Multimodal AI

被引：0

作者：

Lee, Gyeonggeon ^{[1
,2
]}

Zhai, Xiaoming ^{[2
,3
,4
]}

机构：

[1] Natl Inst Educ, Nat Sci & Sci Educ Dept, Nat Sci & Sci Educ, 1 Nanyang Walk, Singapore 637616, Singapore

[2] Univ Georgia, AI4STEM Educ Ctr, 110 Carlton St, Athens, GA 30602 USA

[3] Univ Georgia, Natl GENIUS Ctr, 110 Carlton St, Athens, GA 30602 USA

[4] Univ Georgia, Dept Math Sci & Social Studies Educ, 110 Carlton St, Athens, GA 30602 USA

来源：

TECHTRENDS | 2025年

基金：

美国国家科学基金会;

关键词：

Artificial intelligence (AI); GPT-4V(ision); Visual question answering; Vision language model; Multimodality;

D O I：

10.1007/s11528-024-01035-z

中图分类号：

G40 [教育学];

学科分类号：

040101 ; 120403 ;

摘要：

Educators and researchers have analyzed various image data acquired from teaching and learning, such as images of learning materials, classroom dynamics, students' drawings, etc. However, this approach is labour-intensive and time-consuming, limiting its scalability and efficiency. The recent development in the Visual Question Answering (VQA) technique has streamlined this process by allowing users to posing questions about the images and receive accurate and automatic answers, both in natural language, thereby enhancing efficiency and reducing the time required for analysis. State-of-the-art Vision Language Models (VLMs) such as GPT-4V(ision) have extended the applications of VQA to a wide range of educational purposes. This report employs GPT-4V as an example to demonstrate the potential of VLM in enabling and advancing VQA for education. Specifically, we demonstrated that GPT-4V enables VQA for educational scholars without requiring technical expertise, thereby reducing accessibility barriers for general users. In addition, we contend that GPT-4V spotlights the transformative potential of VQA for educational research, representing a milestone accomplishment for visual data analysis in education.

引用

页码：271 / 287

页数：17

共 35 条

[21] Multimodal Bi-direction Guided Attention Networks for Visual Question Answering
Cai, Linqin
Xu, Nuoying
Tian, Hang
Chen, Kejia
Fan, Haodu
NEURAL PROCESSING LETTERS, 2023, 55 (09) : 11921 - 11943
[22] DMRFNet: Deep Multimodal Reasoning and Fusion for Visual Question Answering and explanation generation
Zhang, Weifeng
Yu, Jing
Zhao, Wenhong
Ran, Chuan
INFORMATION FUSION, 2021, 72 : 70 - 79
[23] Question guided multimodal receptive field reasoning network for fact-based visual question answering
Zicheng Zuo
Yanhan Sun
Zhenfang Zhu
Mei Wu
Hui Zhao
Multimedia Tools and Applications, 2025, 84 (12) : 11063 - 11078
[24] VISUAL QUESTION ANSWERING IN REMOTE SENSING WITH CROSS-ATTENTION AND MULTIMODAL INFORMATION BOTTLENECK
Songara, Jayesh
Pande, Shivam
Choudhury, Shabnam
Banerjee, Biplab
Velmurugan, Rajbabu
IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 6278 - 6281
[25] Adapting Visual Question Answering Models for Enhancing Multimodal Community Q&A Platforms
Srivastava, Avikalp
Liu, Hsin-Wen
Fujita, Sumio
PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM '19), 2019, : 1421 - 1430
[26] A Neuro-Symbolic AI System for Visual Question Answering in Pedestrian Video Sequences
Park, Jaeil
Bu, Seok-Jun
Cho, Sung-Bac
HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, HAIS 2022, 2022, 13469 : 443 - 454
[27] Evaluating Bard Gemini Pro and GPT-4 Vision Against Student Performance in Medical Visual Question Answering: Comparative Case Study
Roos, Jonas
Martin, Ron
Kaczmarczyk, Robert
JMIR FORMATIVE RESEARCH, 2024, 8
[28] Multimodal Natural Language Explanation Generation for Visual Question Answering Based on Multiple Reference Data
Zhu, He
Togo, Ren
Ogawa, Takahiro
Haseyama, Miki
ELECTRONICS, 2023, 12 (10)
[29] Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
Goyal, Yash
Khot, Tejas
Agrawal, Aishwarya
Summers-Stay, Douglas
Batra, Dhruv
Parikh, Devi
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2019, 127 (04) : 398 - 414
[30] Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
Yash Goyal
Tejas Khot
Aishwarya Agrawal
Douglas Summers-Stay
Dhruv Batra
Devi Parikh
International Journal of Computer Vision, 2019, 127 : 398 - 414

← 1 2 3 4 →