Finding video shots for immersive journalism through text-to-video search

被引:0
作者
Nixon, Lyndon [1 ]
Galanopoulos, Damianos [2 ]
Mezaris, Vasileios [2 ]
机构
[1] Modul Technol, Vienna, Austria
[2] CERTH ITI, Thessaloniki, Greece
来源
2024 INTERNATIONAL CONFERENCE ON CONTENT-BASED MULTIMEDIA INDEXING, CBMI | 2024年
关键词
video segmentation; video embedding; multimodal search; Generative AI; video to 3D models; VIRTUAL-REALITY;
D O I
10.1109/CBMI62980.2024.10859220
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video assets from archives or online platforms can provide relevant content for embedding into immersive scenes or for generation of 3D objects or scenes. However, XR content creators lack tools to find relevant video segments for their chosen topic. In this paper, we explore the use case of journalists creating immersive experiences for news stories and their need to find related video material to create and populate a 3D scene. An innovative approach creates text and video embeddings and matches textual input queries to relevant video shots. This is provided via a Web dashboard for search and retrieval across video collections, with selected shots forming the input to content creation tools to generate and populate an immersive scene, meaning journalists do not need specialist knowledge to communicate stories via XR.
引用
收藏
页码:269 / 274
页数:6
相关论文
共 22 条
[21]   A Field Analysis of Immersive Technologies and Their Impact on Journalism: Technologist Perspectives on the Potential Transformation of the Journalistic Field [J].
Wu, Shangyuan .
JOURNALISM STUDIES, 2023, 24 (03) :387-402
[22]   MSR-VTT: A Large Video Description Dataset for Bridging Video and Language [J].
Xu, Jun ;
Mei, Tao ;
Yao, Ting ;
Rui, Yong .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :5288-5296