A MULTI-DILATION AND MULTI-RESOLUTION FULLY CONVOLUTIONAL NETWORK FOR SINGING MELODY EXTRACTION

被引:0
作者
Gao, Ping [1 ]
You, Cheng-You [1 ]
Chi, Tai-Shih [1 ]
机构
[1] Natl Chiao Tung Univ, Dept Elect & Comp Engn, Hsinchu 300, Taiwan
来源
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING | 2020年
关键词
Melody extraction; multi-resolution; fully convolutional network;
D O I
10.1109/icassp40776.2020.9053059
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Each human cognitive function involves bottom-up and top-down processes. Several methods have been proposed for singing melody extraction by emphasizing either the bottom-up or top-down processes. For hearing, the bottom-up processes include spectral and spectro-temporal decomposition of the sound by the cochlea and the auditory cortex. In this paper, we propose a neural network, which includes spectro-temporal multi-resolution decomposition of the log-spectrogram of the sound and a semantic segmentation model to respectively address the bottom-up and top-down processing of hearing, for singing melody extraction. Simulation results show the proposed model outperforms all previously proposed methods, emphasizing either bottom-up or top-down processing, in almost all objective evaluation metrics.
引用
收藏
页码:551 / 555
页数:5
相关论文
共 19 条
[1]  
[Anonymous], 2018, ISMIR
[2]  
Basaran D., 2018, 19 INT SOC MUSIC INF, P82, DOI 10.5281/zenodo.1492349
[3]  
Chen MT, 2019, INT CONF ACOUST SPEE, P1005, DOI [10.1109/ICASSP.2019.8683630, 10.1109/icassp.2019.8683630]
[4]   Multiresolution spectrotemporal analysis of complex sounds [J].
Chi, T ;
Ru, PW ;
Shamma, SA .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2005, 118 (02) :887-906
[5]  
Chou H, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P381, DOI 10.1109/ICASSP.2018.8461483
[6]  
Gao P, 2019, ASIAPAC SIGN INFO PR, P1288, DOI [10.1109/apsipaasc47483.2019.9023231, 10.1109/APSIPAASC47483.2019.9023231]
[7]  
Hsieh TH, 2019, INT CONF ACOUST SPEE, P156, DOI [10.1109/icassp.2019.8682389, 10.1109/ICASSP.2019.8682389]
[8]  
Huang G., 2017, P IEEE C COMP VIS PA, P4700, DOI [10.1109/CVPR.2017.243, DOI 10.1109/CVPR.2017.243]
[9]   The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation [J].
Jegou, Simon ;
Drozdzal, Michal ;
Vazquez, David ;
Romero, Adriana ;
Bengio, Yoshua .
2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2017, :1175-1183
[10]  
Park H, 2017, INT CONF ACOUST SPEE, P2766, DOI 10.1109/ICASSP.2017.7952660