FREQUENCY-TEMPORAL ATTENTION NETWORK FOR SINGING MELODY EXTRACTION

被引:17
|
作者
Yu, Shuai [1 ]
Sun, Xiaoheng [1 ]
Yu, Yi [2 ]
Li, Wei [1 ,3 ]
机构
[1] Fudan Univ, Sch Comp Sci & Technol, Shanghai, Peoples R China
[2] Natl Inst Informat NII, Tokyo, Japan
[3] Fudan Univ, Shanghai Key Lab Intelligent Informat Proc, Shanghai, Peoples R China
基金
国家重点研发计划;
关键词
frequency-temporal attention network; singing melody extraction; music information retrieval;
D O I
10.1109/ICASSP39728.2021.9413444
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Musical audio is generally composed of three physical properties: frequency, time and magnitude. Interestingly, human auditory periphery also provides neural codes for each of these dimensions to perceive music. Inspired by these intrinsic characteristics, a frequency-temporal attention network is proposed to mimic human auditory for singing melody extraction. In particular, the proposed model contains frequency-temporal attention modules and a selective fusion module corresponding to these three physical properties. The frequency attention module is used to select the same activation frequency bands as did in cochlear and the temporal attention module is responsible for analyzing temporal patterns. Finally, the selective fusion module is suggested to recalibrate magnitudes and fuse the raw information for prediction. In addition, we propose to use another branch to simultaneously predict the presence of singing voice melody. The experimental results show that the proposed model outperforms existing state-of-the-art methods (1).
引用
收藏
页码:251 / 255
页数:5
相关论文
共 50 条
  • [1] Singing Melody Extraction Based on Combined Frequency-Temporal Attention and Attentional Feature Fusion with Self-Attention
    Qi, Xi
    Tian, Lihua
    Li, Chen
    Song, Hui
    Yan, Jiahui
    2022 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM), 2022, : 220 - 227
  • [2] HANME: Hierarchical Attention Network for Singing Melody Extraction
    Yu, Shuai
    Yu, Yi
    Chen, Xi
    Li, Wei
    IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 1006 - 1010
  • [3] Frequency-Temporal Attention Network for Remote Sensing Imagery Change Detection
    Yu, Chunyan
    Li, Haobo
    Hu, Yabin
    Zhang, Qiang
    Song, Meiping
    Wang, Yulei
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21
  • [4] MTANet: Multi-band Time-frequency Attention Network for Singing Melody Extraction from Polyphonic Music
    Gao, Yuan
    Hu, Ying
    Wang, Liusong
    Huang, Hao
    He, Liang
    INTERSPEECH 2023, 2023, : 5396 - 5400
  • [5] HIERARCHICAL GRAPH-BASED NEURAL NETWORK FOR SINGING MELODY EXTRACTION
    Yu, Shuai
    Chen, Xi
    Li, Wei
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 626 - 630
  • [6] A Multi-Scale Fully Convolutional Network for Singing Melody Extraction
    Gao, Ping
    You, Cheng-You
    Chi, Tai-Shih
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 1288 - 1293
  • [7] A HYBRID NEURAL NETWORK BASED ON THE DUPLEX MODEL OF PITCH PERCEPTION FOR SINGING MELODY EXTRACTION
    Chou, Hsin
    Chen, Ming-Tso
    Chi, Tai-Shih
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 381 - 385
  • [8] TONET: TONE-OCTAVE NETWORK FOR SINGING MELODY EXTRACTION FROM POLYPHONIC MUSIC
    Chen, Ke
    Yu, Shuai
    Wang, Cheng-, I
    Li, Wei
    Berg-Kirkpatrick, Taylor
    Dubnov, Shlomo
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 621 - 625
  • [9] A neural harmonic-aware network with gated attentive fusion for singing melody extraction
    Yu, Shuai
    Yu, Yi
    Sun, Xiaoheng
    Li, Wei
    NEUROCOMPUTING, 2023, 521 : 160 - 171
  • [10] Interactive Singing Melody Extraction Based on Active Adaptation
    Saxena, Kavya Ranjan
    Arora, Vipul
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 2729 - 2738