A Dual Rig Approach for Multi-View Video and Spatialized Audio Capture in Medical Training

被引:0
作者
Maraval, Joshua [1 ]
Wei, Bangning [1 ,2 ]
Pesce, David [1 ]
Gayral, Yann [1 ]
Outtas, Meriem [1 ,2 ]
Ramin, Nicolas [1 ]
Zhang, Lu [1 ,2 ]
机构
[1] IRT B Com, Rennes, France
[2] Univ Rennes, INSA Rennes, CNRS, IETR UMR 6164, Rennes, France
来源
2024 16TH INTERNATIONAL CONFERENCE ON QUALITY OF MULTIMEDIA EXPERIENCE, QOMEX 2024 | 2024年
关键词
database; multi-view video; spatial audio; free navigation; volumetric video; QoE;
D O I
10.1109/QoMEX61742.2024.10598273
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
We present a multi-view camera and spatialized audio microphone capture system designed for computer vision applications in free navigation immersive experiences. We propose a dataset of two long and complex in-situ training situations in the medical field. The scenarios in the dataset feature precise gestures for the learner to reproduce during complex situations with multiple simultaneous visual and auditory cues important for training. 3D computer vision techniques are used to reconstruct a 4D scene model from a set of videos to render novel views from unseen viewpoints. However, the quality of the rendered objects is directly dependent on the density of coverage by reference views. To ensure maximum Quality of Experience, we propose a dual rig of cameras, a central rig that captures the details of the gesture zone of the training scenarios and a peripheral rig that captures the environment of the room and the interactions occurring around the gesture zone. The central rig provides dense coverage of the central content, facilitating high-quality reconstruction on novel views of the captured gestures. Recordings include audio interactions of multiple actors, captured by Ambisonic microphones spatially distributed around the scene. The captured scenes are real-world educational content for medical courses, so this dataset provides a rare opportunity to assess the Quality of Experience of volumetric video techniques on realistic content, and to compare their pedagogical capabilities with standard multi-view video content.
引用
收藏
页码:274 / 277
页数:4
相关论文
共 10 条
[1]   Immersive Light Field Video with a Layered Mesh Representation [J].
Broxton, Michael ;
Flynn, John ;
Overbeck, Ryan ;
Erickson, Daniel ;
Hedman, Peter ;
Duvall, Matthew ;
Dourgarian, Jason ;
Busch, Jay ;
Whalen, Matt ;
Debevec, Paul .
ACM TRANSACTIONS ON GRAPHICS, 2020, 39 (04)
[2]   Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments [J].
Ionescu, Catalin ;
Papava, Dragos ;
Olaru, Vlad ;
Sminchisescu, Cristian .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2014, 36 (07) :1325-1339
[3]   Panoptic Studio: A Massively Multiview System for Social Motion Capture [J].
Joo, Hanbyul ;
Liu, Hao ;
Tan, Lei ;
Gui, Lin ;
Nabbe, Bart ;
Matthews, Iain ;
Kanade, Takeo ;
Nobuhara, Shohei ;
Sheikh, Yaser .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :3334-3342
[4]   3D Gaussian Splatting for Real-Time Radiance Field Rendering [J].
Kerbl, Bernhard ;
Kopanas, Georgios ;
Leimkuehler, Thomas ;
Drettakis, George .
ACM TRANSACTIONS ON GRAPHICS, 2023, 42 (04)
[5]   Neural 3D Video Synthesis from Multi-view Video [J].
Li, Tianye ;
Slavcheva, Mira ;
Zollhoefer, Michael ;
Green, Simon ;
Lassner, Christoph ;
Kim, Changil ;
Schmidt, Tanner ;
Lovegrove, Steven ;
Goesele, Michael ;
Newcombe, Richard ;
Lv, Zhaoyang .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :5511-5521
[6]  
Mildenhall B, 2022, COMMUN ACM, V65, P99, DOI 10.1145/3503250
[7]   4D-OR: Semantic Scene Graphs for OR Domain Modeling [J].
Ozsoy, Ege ;
Ornek, Evin Pinar ;
Eck, Ulrich ;
Czempiel, Tobias ;
Tombari, Federico ;
Navab, Nassir .
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2022, PT VII, 2022, 13437 :475-485
[8]   Dataset and Pipeline for Multi-View Light-Field Video [J].
Sabater, N. ;
Boisson, G. ;
Vandame, B. ;
Kerbiriou, P. ;
Babon, F. ;
Hog, M. ;
Gendrot, R. ;
Langlois, T. ;
Bureller, O. ;
Schubert, A. ;
Allie, V. .
2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2017, :1743-1753
[9]  
SHAPOVALOV Roman, 2023, P IEEE CVF INT C COM, P20338
[10]   NeuralDome: A Neural Modeling Pipeline on Multi-View Human-Object Interactions [J].
Zhang, Juze ;
Luo, Haimin ;
Yang, Hongdi ;
Xu, Xinru ;
Wu, Qianyang ;
Shi, Ye ;
Yu, Jingyi ;
Xu, Lan ;
Wang, Jingya .
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, :8834-8845