Voice activity detection using a local-global attention model

被引：6

作者：

Li, Shu ^{[1
]}

Li, Ye ^{[1
]}

Feng, Tao ^{[1
]}

Shi, Jinze ^{[1
]}

Zhang, Peng ^{[1
]}

机构：

[1] Qilu Univ Technol, Shandong Acad Sci, Shandong Comp Sci Ctr, Natl Supercomp Ctr Jinan,Shandong Prov Key Lab Com, Jinan 250014, Peoples R China

来源：

APPLIED ACOUSTICS | 2022年 / 195卷

基金：

国家重点研发计划;

关键词：

Voice activity detection; Long short-term memory network; Attention mechanism; Deep learning; NOISE;

D O I：

10.1016/j.apacoust.2022.108802

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Voice activity detection (VAD) is an essential initial step of speech signal processing, and greatly affects the timeliness and accuracy of the system. Although many novel methods have been proposed to promote the performance of VAD under low signal-to-noise ratios (SNRs), the robustness to very low SNRs and unknown noisy environments has yet to be enhanced. In this paper, we propose a fusion model of local attention and global attention to strengthen the attention mechanism currently applied in VAD methods. First, based on self-attention, the local attention cooperates with long short-term memory networks (LSTMs) to achieve efficient use of local contextual information; and then the global attention measures global contextual information to focus on the most appropriate area of contextual frames. The experimental results show that compared with the state-of-the-art VAD methods, the proposed approach achieves better performance under low SNRs such as-15 dB and non-stationary noisy conditions. (c) 2022 Elsevier Ltd. All rights reserved.

引用

页数：10

共 50 条

[21] Enhancing High-Resolution Image Compression Through Local-Global Joint Attention Mechanism [J].

Jiang, Zeyu ;

Liu, Xiaohong ;

Li, Aini ;

Wang, Guangyu .

IEEE SIGNAL PROCESSING LETTERS, 2024, 31 :1044-1048

[22] Dual Attention in Time and Frequency Domain for Voice Activity Detection [J].

Lee, Joohyung ;

Jung, Youngmoon ;

Kim, Hoirin .

INTERSPEECH 2020, 2020, :3670-3674

[23] Voice Activity Detection Optimized by Adaptive Attention Span Transformer [J].

Mu, Wenpeng ;

Liu, Bingshan .

IEEE ACCESS, 2023, 11 :31238-31243

[24] Robust voice activity detection using an auditory-inspired masked modulation encoder based convolutional attention network [J].

Li, Nan ;

Wang, Longbiao ;

Ge, Meng ;

Unoki, Masashi ;

Li, Sheng ;

Dang, Jianwu .

SPEECH COMMUNICATION, 2024, 157

[25] AN EFFICIENT TRANSFORMER-BASED MODEL FOR VOICE ACTIVITY DETECTION [J].

Zhao, Yifei ;

Champagne, Benoit .

2022 IEEE 32ND INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2022,

[26] GMA3D: Local-Global Attention Learning to Estimate Occluded Motions of Scene Flow [J].

Lu, Zhiyang ;

Cheng, Ming .

PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT II, 2024, 14426 :16-27

[27] Underwater image restoration using Joint Local-Global Polarization Complementary Network [J].

Ruan, Rui ;

Zhang, Weidong ;

Liang, Zheng .

IMAGE AND VISION COMPUTING, 2025, 159

[28] A Fusion Model for Robust Voice Activity Detection [J].

Wang, Guan-Bo ;

Zhang, Wei-Qiang .

2019 IEEE 19TH INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY (ISSPIT 2019), 2019,

[29] LGCANet: Local-Global and Change-Aware Network via Segment Anything Model for Remote Sensing Images Change Detection [J].

Jiang, Kaixuan ;

Wu, Chen .

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2025, 63

[30] Lgma-net: liver and tumor segmentation methods based on local-global feature mergence and attention mechanisms [J].

Ren, Wenju ;

Li, Bing ;

Peng, Hong ;

Wang, Jun .

SIGNAL IMAGE AND VIDEO PROCESSING, 2025, 19 (01)

← 1 2 3 4 5 →