Voice activity detection using a local-global attention model

被引:6
作者
Li, Shu [1 ]
Li, Ye [1 ]
Feng, Tao [1 ]
Shi, Jinze [1 ]
Zhang, Peng [1 ]
机构
[1] Qilu Univ Technol, Shandong Acad Sci, Shandong Comp Sci Ctr, Natl Supercomp Ctr Jinan,Shandong Prov Key Lab Com, Jinan 250014, Peoples R China
基金
国家重点研发计划;
关键词
Voice activity detection; Long short-term memory network; Attention mechanism; Deep learning; NOISE;
D O I
10.1016/j.apacoust.2022.108802
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Voice activity detection (VAD) is an essential initial step of speech signal processing, and greatly affects the timeliness and accuracy of the system. Although many novel methods have been proposed to promote the performance of VAD under low signal-to-noise ratios (SNRs), the robustness to very low SNRs and unknown noisy environments has yet to be enhanced. In this paper, we propose a fusion model of local attention and global attention to strengthen the attention mechanism currently applied in VAD methods. First, based on self-attention, the local attention cooperates with long short-term memory networks (LSTMs) to achieve efficient use of local contextual information; and then the global attention measures global contextual information to focus on the most appropriate area of contextual frames. The experimental results show that compared with the state-of-the-art VAD methods, the proposed approach achieves better performance under low SNRs such as-15 dB and non-stationary noisy conditions. (c) 2022 Elsevier Ltd. All rights reserved.
引用
收藏
页数:10
相关论文
共 50 条
[31]   Multi-scale local-global architecture for person re-identification [J].
Liu, Jing ;
Tiwari, Prayag ;
Tri Gia Nguyen ;
Gupta, Deepak ;
Band, Shahab S. .
SOFT COMPUTING, 2022, 26 (16) :7967-7977
[32]   Collaborative Local-Global Learning for Temporal Action Proposal [J].
Zhu, Yisheng ;
Han, Hu ;
Liu, Guangcan ;
Liu, Qingshan .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2021, 12 (05)
[33]   Misshapen Pelvis Landmark Detection With Local-Global Feature Learning for Diagnosing Developmental Dysplasia of the Hip [J].
Liu, Chuanbin ;
Xie, Hongtao ;
Zhang, Sicheng ;
Mao, Zhendong ;
Sun, Jun ;
Zhang, Yongdong .
IEEE TRANSACTIONS ON MEDICAL IMAGING, 2020, 39 (12) :3944-3954
[34]   Local-global methods for generalised solar irradiance forecasting [J].
Cargan, Timothy R. ;
Landa-Silva, Dario ;
Triguero, Isaac .
APPLIED INTELLIGENCE, 2024, 54 (02) :2225-2247
[35]   Local-global methods for generalised solar irradiance forecasting [J].
Timothy R. Cargan ;
Dario Landa-Silva ;
Isaac Triguero .
Applied Intelligence, 2024, 54 :2225-2247
[36]   Boundary-enhanced local-global collaborative network for medical image segmentation [J].
Qiu, Haiyan ;
Zhong, Chi ;
Gao, Chengling ;
Huang, Changqin .
SCIENTIFIC REPORTS, 2025, 15 (01)
[37]   Voice activity detection using neural network [J].
Ikedo, J .
IEICE TRANSACTIONS ON COMMUNICATIONS, 1998, E81B (12) :2509-2513
[38]   VOICE ACTIVITY DETECTION USING A PERIODICITY MEASURE [J].
TUCKER, R .
IEE PROCEEDINGS-I COMMUNICATIONS SPEECH AND VISION, 1992, 139 (04) :377-380
[39]   VOICE ACTIVITY DETECTION USING SUBBAND NONCIRCULARITY [J].
Wisdom, Scott ;
Okopal, Greg ;
Atlas, Les ;
Pitton, James .
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, :4505-4509
[40]   Jointly multi-source information and local-global relations of heterogeneous network for rumor detection [J].
Han, Xiaohong ;
Zhao, Mengfan ;
Zhang, Yutao ;
Zhao, Tingzhao .
FRONTIERS IN PHYSICS, 2023, 10