Audio Mixing Inversion via Embodied Self-supervised Learning

被引：1

作者：

Zhou, Haotian ^{[1
,2
]}

Yu, Feng ^{[1
,2
]}

Wu, Xihong ^{[1
,2
,3
]}

机构：

[1] Cent Conservatory Mus, Dept AI Mus & Mus Informat Technol, Beijing 100031, Peoples R China

[2] Minist Educ, Lab Mus Artificial Intelligence, Lab Philosophy & Social Sci, Beijing 100031, Peoples R China

[3] Peking Univ, Sch Intelligence Sci & Technol, Beijing 100871, Peoples R China

来源：

MACHINE INTELLIGENCE RESEARCH | 2024年 / 21卷 / 01期

关键词：

Audio mixing inversion; intelligent audio mixing; self-supervised learning; audio signal processing; deep learning; SEPARATION;

D O I：

10.1007/s11633-023-1441-9

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Audio mixing is a crucial part of music production. For analyzing or recreating audio mixing, it is of great importance to conduct research on estimating mixing parameters used to create mixdowns from music recordings, i.e., audio mixing inversion. However, approaches of audio mixing inversion are rarely explored. A method of estimating mixing parameters from raw tracks and a stereo mixdown via embodied self-supervised learning is presented. In this work, several commonly used audio effects including gain, pan, equalization, reverb, and compression, are taken into consideration. This method is able to learn an inference neural network that takes a stereo mixdown and the raw audio sources as input and estimate mixing parameters used to create the mixdown by iteratively sampling and training. During the sampling step, the inference network predicts a set of mixing parameters, which is sampled and fed to an audio-processing framework to generate audio data for the training step. During the training step, the same network used in the sampling step is optimized with the sampled data generated from the sampling step. This method is able to explicitly model the mixing process in an interpretable way instead of using a black-box neural network model. A set of objective measures are used for evaluation. The experimental results show that this method has better performance than current state-of-the-art methods.

引用

页码：55 / 62

页数：8

共 28 条

[1] Barchiesi D, 2010, J AUDIO ENG SOC, V58, P563
[2] Bello J.P., 2014, ISMIR
[3] A blind source separation technique using second-order statistics
Belouchrani, A
AbedMeraim, K
Cardoso, JF
Moulines, E
[J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1997, 45 (02) : 434 - 444
[4] Bittner R. M., 2016, P INT C MUS INF RETR
[5] Braun D, 2021, Arxiv, DOI arXiv:2111.09931
[6] Colonel J. T., 2022, Approximating ballistics in a differentiable dynamic range compressor
[7] Colonel J. T., 2022, Reverse engineering memoryless distortion effects with differentiable waveshapers
[8] Reverse engineering of a recording mix with differentiable digital signal processinga)
Colonel, Joseph T.
Reiss, Joshua
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2021, 150 (01) : 608 - 619
[9] Engel J. H., 2019, P 8 INT C LEARN REPR
[10] Gillet O., 2006, P 7 INT C MUS INF RE, P156

← 1 2 3 →