A MULTI-DILATION AND MULTI-RESOLUTION FULLY CONVOLUTIONAL NETWORK FOR SINGING MELODY EXTRACTION

被引：0

作者：

Gao, Ping ^{[1
]}

You, Cheng-You ^{[1
]}

Chi, Tai-Shih ^{[1
]}

机构：

[1] Natl Chiao Tung Univ, Dept Elect & Comp Engn, Hsinchu 300, Taiwan

来源：

2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING | 2020年

关键词：

Melody extraction; multi-resolution; fully convolutional network;

D O I：

10.1109/icassp40776.2020.9053059

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Each human cognitive function involves bottom-up and top-down processes. Several methods have been proposed for singing melody extraction by emphasizing either the bottom-up or top-down processes. For hearing, the bottom-up processes include spectral and spectro-temporal decomposition of the sound by the cochlea and the auditory cortex. In this paper, we propose a neural network, which includes spectro-temporal multi-resolution decomposition of the log-spectrogram of the sound and a semantic segmentation model to respectively address the bottom-up and top-down processing of hearing, for singing melody extraction. Simulation results show the proposed model outperforms all previously proposed methods, emphasizing either bottom-up or top-down processing, in almost all objective evaluation metrics.

引用

页码：551 / 555

页数：5

共 19 条

[1]

[Anonymous], 2018, ISMIR

[2]

Basaran D., 2018, 19 INT SOC MUSIC INF, P82, DOI 10.5281/zenodo.1492349

[3]

Chen MT, 2019, INT CONF ACOUST SPEE, P1005, DOI [10.1109/ICASSP.2019.8683630, 10.1109/icassp.2019.8683630]

[4] Multiresolution spectrotemporal analysis of complex sounds [J].

Chi, T ;

Ru, PW ;

Shamma, SA .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2005, 118 (02) :887-906

[5]

Chou H, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P381, DOI 10.1109/ICASSP.2018.8461483

[6]

Gao P, 2019, ASIAPAC SIGN INFO PR, P1288, DOI [10.1109/apsipaasc47483.2019.9023231, 10.1109/APSIPAASC47483.2019.9023231]

[7]

Hsieh TH, 2019, INT CONF ACOUST SPEE, P156, DOI [10.1109/icassp.2019.8682389, 10.1109/ICASSP.2019.8682389]

[8]

Huang G., 2017, P IEEE C COMP VIS PA, P4700, DOI [10.1109/CVPR.2017.243, DOI 10.1109/CVPR.2017.243]

[9] The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation [J].

Jegou, Simon ;

Drozdzal, Michal ;

Vazquez, David ;

Romero, Adriana ;

Bengio, Yoshua .

2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2017, :1175-1183

[10]

Park H, 2017, INT CONF ACOUST SPEE, P2766, DOI 10.1109/ICASSP.2017.7952660

← 1 2 →