Perceptual Audio Object Coding Using Adaptive Subband Grouping with CNN and Residual Block

被引:1
作者
Wu, Yulin [1 ]
Hu, Ruimin [1 ]
Wang, Xiaochen [1 ]
机构
[1] Wuhan Univ, Natl Engn Res Ctr Multimedia Software, Sch Comp Sci, Hubei Key Lab Multimedia & Network Commun Engn, Wuhan, Peoples R China
来源
2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME | 2023年
关键词
spatial audio object coding (SAOC); perceptual coding; adaptive subband grouping; aliasing distortion;
D O I
10.1109/ICME55011.2023.00433
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Spatial audio content is becoming increasingly popular and is regarded as a set of object signals with associated metadata. The object-based content representation is independent of loudspeaker layouts and provides high spatial resolution when reproduced on more loudspeakers. The audio quality of the traditional spatial audio object coding (SAOC) method has severe aliasing distortion, which impairs the immersive listening experience. In this study, we reduce aliasing distortion by perceptual adaptive subband grouping strategy and use the convolutional neural network (CNN) and residual block to build the side information compressing model. Both objective and subjective experiments on benchmark datasets with different bitrates show that the proposed method achieves favorable performance against state-of-the-art methods.
引用
收藏
页码:2543 / 2548
页数:6
相关论文
共 23 条
  • [1] Conversion of Multichannel Sound Signal Maintaining Physical Properties of Sound in Reproduced Sound Field
    Ando, Akio
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (06): : 1467 - 1475
  • [2] [Anonymous], 2005, RECOMMENDATION ITU R
  • [3] [Anonymous], 2012, Introduction to digital audio coding and standards
  • [4] MULTICHANNEL-BASED LEARNING FOR AUDIO OBJECT EXTRACTION
    Arteaga, Daniel
    Pons, Jordi
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 206 - 210
  • [5] Bosi M, 1997, J AUDIO ENG SOC, V45, P789
  • [6] Dolby Laboratories, 2016, Dolby Atmos
  • [7] Gelfand S.A., 2001, Essentials of Audiology, V2nd
  • [8] Herre J, 2007, 2007 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-5, P1894
  • [9] MPEG-H 3D Audio-The New Standard for Coding of Immersive Spatial Audio
    Herre, Juergen
    Hilpert, Johannes
    Kuntz, Achim
    Plogsties, Jan
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2015, 9 (05) : 770 - 779
  • [10] ISO/IEC, 2019, 230083 ISOIEC