Immersive audio-visual scene reproduction using semantic scene reconstruction from 360 cameras

被引：5

作者：

Kim, Hansung ^{[1
]}

Remaggi, Luca ^{[2
]}

Dourado, Aloisio ^{[3
]}

de Campos, Teofilo ^{[3
]}

Jackson, Philip J. B. ^{[4
]}

Hilton, Adrian ^{[4
]}

机构：

[1] Univ Southampton, ECS, Southampton, Hants, England

[2] Creat Labs UK, London, England

[3] Univ Brasilia, Brasilia, DF, Brazil

[4] Univ Surrey, CVSSP, Guildford, Surrey, England

来源：

VIRTUAL REALITY | 2022年 / 26卷 / 03期

基金：

英国工程与自然科学研究理事会;

关键词：

Audio-visual scene reproduction; Scene understanding; 3D reconstruction and completion; Spatial audio; VIRTUAL-REALITY; IMPLEMENTATION; PERCEPTION; FUTURE;

D O I：

10.1007/s10055-021-00594-3

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

As personalised immersive display systems have been intensely explored in virtual reality (VR), plausible 3D audio corresponding to the visual content is required to provide more realistic experiences to users. It is well known that spatial audio synchronised with visual information improves a sense of immersion but limited research progress has been achieved in immersive audio-visual content production and reproduction. In this paper, we propose an end-to-end pipeline to simultaneously reconstruct 3D geometry and acoustic properties of the environment from a pair of omnidirectional panoramic images. A semantic scene reconstruction and completion method using a deep convolutional neural network is proposed to estimate the complete semantic scene geometry in order to adapt spatial audio reproduction to the scene. Experiments provide objective and subjective evaluations of the proposed pipeline for plausible audio-visual VR reproduction of real scenes.

引用

页码：823 / 838

页数：16

共 14 条

[1] Immersive audio-visual scene reproduction using semantic scene reconstruction from 360 cameras
Hansung Kim
Luca Remaggi
Aloisio Dourado
Teofilo de Campos
Philip J. B. Jackson
Adrian Hilton
Virtual Reality, 2022, 26 : 823 - 838
[2] AVSU: Workshop on Audio-Visual Scene Understanding for Immersive Multimedia
Hilton, Adrian
Kang, Hong-Goo
Kim, Hansung
Sohn, Kwanghoon
PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 2122 - 2124
[3] Effect of Acoustic Scene Complexity and Visual Scene Representation on Auditory Perception in Virtual Audio-Visual Environments
Fichna, Stefan
Biberger, Thomas
Seeber, Bernhard U.
Ewert, Stephan D.
2021 IMMERSIVE AND 3D AUDIO: FROM ARCHITECTURE TO AUTOMOTIVE (I3DA), 2021,
[4] Unsupervised Synthetic Acoustic Image Generation for Audio-Visual Scene Understanding
Sanguineti, Valentina
Morerio, Pietro
Del Bue, Alessio
Murino, Vittorio
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 7102 - 7115
[5] Semantic Scene Reconstruction using the DenseCRF Model
Ma, Zhixin
Cao, Chong
Shen, Xukun
2017 INTERNATIONAL CONFERENCE ON VIRTUAL REALITY AND VISUALIZATION (ICVRV 2017), 2017, : 456 - 457
[6] Audio-visual scene understanding utilizing text information for a cooking support robot
Kojima, Ryosuke
Sugiyama, Osamu
Nakadai, Kazuhiro
2015 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2015, : 4210 - 4215
[7] Audio-visual speech scene analysis: Characterization of the dynamics of unbinding and rebinding the McGurk effect
Nahorna, Olha
Berthommier, Frederic
Schwartz, Jean-Luc
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2015, 137 (01) : 362 - 377
[8] Semantic Scene Completion from a Single 360-Degree Image and Depth Map
Dourado, Aloisio
Kim, Hansung
de Campos, Teofilo E.
Hilton, Adrian
PROCEEDINGS OF THE 15TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS, VOL 5: VISAPP, 2020, : 36 - 46
[9] Importance of binaural cues of depth in low-resolution audio-visual 3D scene reproductions
Salvati, Daniele
Drioli, Carlo
Fontana, Federico
Foresti, Gian Luca
2018 IEEE 4TH VR WORKSHOP ON SONIC INTERACTIONS FOR VIRTUAL ENVIRONMENTS (SIVE), 2018,
[10] Modelling human visual navigation using multi-view scene reconstruction
Lyndsey C. Pickup
Andrew W. Fitzgibbon
Andrew Glennerster
Biological Cybernetics, 2013, 107 : 449 - 464

← 1 2 →