A Schema-Based Approach to the Linkage of Multimodal Learning Sources with Generative AI

被引：1

作者：

Kwon, Christine ^{[1
]}

King, James ^{[2
]}

Carney, John ^{[2
]}

Stamper, John ^{[1
]}

机构：

[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA

[2] MARi LLC, Alexandria, VA 22314 USA

来源：

ARTIFICIAL INTELLIGENCE IN EDUCATION: POSTERS AND LATE BREAKING RESULTS, WORKSHOPS AND TUTORIALS, INDUSTRY AND INNOVATION TRACKS, PRACTITIONERS, DOCTORAL CONSORTIUM AND BLUE SKY, AIED 2024 | 2024年 / 2151卷

关键词：

Generative AI; Large Language Models (LLMs); Multimodal Learning; Video Training;

D O I：

10.1007/978-3-031-64312-5_1

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Learning how to execute a complex, hands-on task in a domain such as auto maintenance, cooking, or guitar playing while relying exclusively on text instruction from a manual is often frustrating and ineffective. Despite the need for multimedia instruction to enable the learning of complex, manual tasks, learners often rely exclusively on text instruction. However, through widespread usage of user-generated content platforms, such as YouTube and TikTok, learners are no longer limited to standard text and are able to watch videos from easily accessible platforms to learn such procedural tasks. As YouTube consists of a large corpus of diverse instructional videos, the accuracy of videos on sensitive and complex tasks has yet to be validated in comparison to "golden standard" manuals. Our work provides a unique LLM-based multimodal pipeline to interpret and verify task-related key steps in a video within organized knowledge schemas, in which demonstrated video steps are automatically extracted, systematized, and validated in comparison to a text manual of official steps. Applied to a dataset of twenty-four videos on the task of flat tire replacement on a car, the LLM-based pipeline achieved high performance on our metrics, identifying an average of 98% of key task steps, with 86% precision and 92% recall across all videos.

引用

页码：3 / 10

页数：8

共 16 条

[1] Unsupervised Learning from Narrated Instruction Videos [J].

Alayrac, Jean-Baptiste ;

Bojanowski, Piotr ;

Agrawal, Nishant ;

Sivic, Josef ;

Laptev, Ivan ;

Lacoste-Julien, Simon .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :4575-4583

[2]

Ampel BM, 2024, Arxiv, DOI [arXiv:2312.17278, 10.1145/3682069, DOI 10.1145/3682069]

[3] Video- or text-based e-learning when teaching clinical procedures? A randomized controlled trial [J].

Buch, Steen Vigh ;

Treschow, Frederik Philip ;

Svendsen, Jesper Brink ;

Worm, Bjarne Skjodt .

ADVANCES IN MEDICAL EDUCATION AND PRACTICE, 2014, 5 :257-262

[4]

Chase H, LangChain

[5]

Dennen V.P., 2008, Handbook of research on educational communications and technology, V3, P425

[6]

Goel A, 2023, PR MACH LEARN RES, V225, P82

[7]

Malmaud J, 2015, Arxiv, DOI arXiv:1503.01558

[8]

Manju A., 2015, 2015 International Conference on Soft-Computing and Networks Security (ICSNS), P1, DOI DOI 10.1109/ICSNS.2015.7292370

[9]

Navarrete E, 2023, Arxiv, DOI arXiv:2301.13617

[10]

Stamper J, 2010, LECT NOTES COMPUT SC, V6095, P31

← 1 2 →