Voice activity detection using a local-global attention model

被引:4
|
作者
Li, Shu [1 ]
Li, Ye [1 ]
Feng, Tao [1 ]
Shi, Jinze [1 ]
Zhang, Peng [1 ]
机构
[1] Qilu Univ Technol, Shandong Acad Sci, Shandong Comp Sci Ctr, Natl Supercomp Ctr Jinan,Shandong Prov Key Lab Com, Jinan 250014, Peoples R China
基金
国家重点研发计划;
关键词
Voice activity detection; Long short-term memory network; Attention mechanism; Deep learning; NOISE;
D O I
10.1016/j.apacoust.2022.108802
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Voice activity detection (VAD) is an essential initial step of speech signal processing, and greatly affects the timeliness and accuracy of the system. Although many novel methods have been proposed to promote the performance of VAD under low signal-to-noise ratios (SNRs), the robustness to very low SNRs and unknown noisy environments has yet to be enhanced. In this paper, we propose a fusion model of local attention and global attention to strengthen the attention mechanism currently applied in VAD methods. First, based on self-attention, the local attention cooperates with long short-term memory networks (LSTMs) to achieve efficient use of local contextual information; and then the global attention measures global contextual information to focus on the most appropriate area of contextual frames. The experimental results show that compared with the state-of-the-art VAD methods, the proposed approach achieves better performance under low SNRs such as-15 dB and non-stationary noisy conditions. (c) 2022 Elsevier Ltd. All rights reserved.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] A Voice Activity Detection Model Composed of Bidirectional LSTM and Attention Mechanism
    Yu, Yeonguk
    Kim, Yoon-Joong
    2018 IEEE 10TH INTERNATIONAL CONFERENCE ON HUMANOID, NANOTECHNOLOGY, INFORMATION TECHNOLOGY, COMMUNICATION AND CONTROL, ENVIRONMENT AND MANAGEMENT (HNICEM), 2018,
  • [2] Voice Activity Detection Using an Adaptive Context Attention Model
    Kim, Juntae
    Hahn, Minsoo
    IEEE SIGNAL PROCESSING LETTERS, 2018, 25 (08) : 1181 - 1185
  • [3] COMPLEX IRM-AWARE TRAINING FOR VOICE ACTIVITY DETECTION USING ATTENTION MODEL
    Zhao, Yifei
    Attabi, Yazid
    Champagne, Benoit
    Zhu, Wei-Ping
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 3698 - 3702
  • [4] Local-Global Attentive Adaptation for Object Detection
    Zhang, Dan
    Li, Jingjing
    Li, Xingpeng
    Du, Zhekai
    Xiong, Lin
    Ye, Mao
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2021, 100
  • [5] Weakly Supervised Local-Global Attention Network for Facial Expression Recognition
    Zhang, Haifeng
    Su, Wen
    Wang, Zengfu
    IEEE ACCESS, 2020, 8 (08): : 37976 - 37987
  • [6] Local-global Semantic Feature Enhancement Model for Remote Sensing Imagery Change Detection
    Gao J.
    Guan H.
    Peng D.
    Xu Z.
    Kang J.
    Ji Y.
    Zhai R.
    Journal of Geo-Information Science, 2023, 25 (03) : 625 - 637
  • [7] Spectro-Temporal Attention-Based Voice Activity Detection
    Lee, Younglo
    Min, Jeongki
    Han, David K.
    Ko, Hanseok
    IEEE SIGNAL PROCESSING LETTERS, 2020, 27 : 131 - 135
  • [8] A novel voice activity detection algorithm using modified global thresholding
    Elton, R. Johny
    Mohanalin, J.
    Vasuki, P.
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2021, 24 (01) : 127 - 142
  • [9] A novel voice activity detection algorithm using modified global thresholding
    R. Johny Elton
    J. Mohanalin
    P. Vasuki
    International Journal of Speech Technology, 2021, 24 : 127 - 142
  • [10] Lung nodule classification using deep Local-Global networks
    Al-Shabi, Mundher
    Lan, Boon Leong
    Chan, Wai Yee
    Ng, Kwan-Hoong
    Tan, Maxine
    INTERNATIONAL JOURNAL OF COMPUTER ASSISTED RADIOLOGY AND SURGERY, 2019, 14 (10) : 1815 - 1819