FREQUENCY-TEMPORAL ATTENTION NETWORK FOR SINGING MELODY EXTRACTION

被引:17
|
作者
Yu, Shuai [1 ]
Sun, Xiaoheng [1 ]
Yu, Yi [2 ]
Li, Wei [1 ,3 ]
机构
[1] Fudan Univ, Sch Comp Sci & Technol, Shanghai, Peoples R China
[2] Natl Inst Informat NII, Tokyo, Japan
[3] Fudan Univ, Shanghai Key Lab Intelligent Informat Proc, Shanghai, Peoples R China
基金
国家重点研发计划;
关键词
frequency-temporal attention network; singing melody extraction; music information retrieval;
D O I
10.1109/ICASSP39728.2021.9413444
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Musical audio is generally composed of three physical properties: frequency, time and magnitude. Interestingly, human auditory periphery also provides neural codes for each of these dimensions to perceive music. Inspired by these intrinsic characteristics, a frequency-temporal attention network is proposed to mimic human auditory for singing melody extraction. In particular, the proposed model contains frequency-temporal attention modules and a selective fusion module corresponding to these three physical properties. The frequency attention module is used to select the same activation frequency bands as did in cochlear and the temporal attention module is responsible for analyzing temporal patterns. Finally, the selective fusion module is suggested to recalibrate magnitudes and fuse the raw information for prediction. In addition, we propose to use another branch to simultaneously predict the presence of singing voice melody. The experimental results show that the proposed model outperforms existing state-of-the-art methods (1).
引用
收藏
页码:251 / 255
页数:5
相关论文
共 50 条
  • [21] Event Temporal Relation Extraction with Attention Mechanism and Graph Neural Network
    Xu, Xiaoliang
    Gao, Tong
    Wang, Yuxiang
    Xuan, Xinle
    TSINGHUA SCIENCE AND TECHNOLOGY, 2022, 27 (01) : 79 - 90
  • [22] Hierarchic Temporal Convolutional Network with Attention Fusion for Target Speaker Extraction
    Chen, Zihao
    Qiu, Wenbo
    Xu, Haitao
    Hu, Ying
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 827 - 832
  • [23] Event Temporal Relation Extraction with Attention Mechanism and Graph Neural Network
    Xiaoliang Xu
    Tong Gao
    Yuxiang Wang
    Xinle Xuan
    TsinghuaScienceandTechnology, 2022, 27 (01) : 79 - 90
  • [24] MIDI MELODY EXTRACTION BASED ON IMPROVED NEURAL NETWORK
    Li, Jiangtao
    Yang, Xiaohong
    Chen, Qingcai
    PROCEEDINGS OF 2009 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-6, 2009, : 1133 - 1138
  • [25] Melody Extraction Based on Deep Harmonic Neural Network
    Huang, Yuzhi
    Liu, Gang
    PROCEEDINGS OF 2018 INTERNATIONAL CONFERENCE ON NETWORK INFRASTRUCTURE AND DIGITAL CONTENT (IEEE IC-NIDC), 2018, : 174 - 178
  • [26] Frequency-Temporal Correlation of Inhomogeneous Broadening for Different Modes of Excitation of Stimulated Photon Echo
    G. I. Garnaeva
    L. A. Nefediev
    E. N. Ahmedshina
    É. I. Hakimzyanova
    Journal of Applied Spectroscopy, 2015, 81 : 944 - 948
  • [27] Frequency-Temporal Disagreement Adaptation for Robotic Terrain Classification via Vibration in a Dynamic Environment
    Cheng, Chen
    Chang, Ji
    Lv, Wenjun
    Wu, Yuping
    Li, Kun
    Li, Zerui
    Yuan, Chenhui
    Ma, Saifei
    SENSORS, 2020, 20 (22) : 1 - 19
  • [28] Temporal Convolutional Network with Frequency Dimension Adaptive Attention for Speech Enhancement
    Zhang, Qiquan
    Song, Qi
    Nicolson, Aaron
    Lan, Tian
    Li, Haizhou
    INTERSPEECH 2021, 2021, : 166 - 170
  • [29] Estimation of gain by reception reliability in transition to multichannel frequency-temporal systems of data transmission
    Zelinskijg, D.I.
    Problemy Upravleniya I Informatiki (Avtomatika), 1995, (02): : 104 - 107
  • [30] FREQUENCY-ANCHORED DEEP NETWORKS FOR POLYPHONIC MELODY EXTRACTION
    Sharma, Aman Kumar
    Saxena, Kavya Ranjan
    Arora, Vipul
    2021 NATIONAL CONFERENCE ON COMMUNICATIONS (NCC), 2021, : 452 - 456