Multi-Task Joint-Learning for Robust Voice Activity Detection

被引:0
|
作者
Zhuang, Yimeng [1 ]
Tong, Sibo [1 ]
Yin, Maofan [1 ]
Qian, Yanmin [1 ]
Yu, Kai [1 ]
机构
[1] Shanghai Jiao Tong Univ, Key Lab Shanghai Educ Commiss Intelligent Interac, Brain Sci & Technol Res Ctr, SpeechLab,Dept Comp Sci & Engn, Shanghai, Peoples R China
来源
2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP) | 2016年
关键词
voice activity detection; multi-task learning; multi-frame predictions; deep neural networks;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Model based VAD approaches have been widely used and achieved success in practice. These approaches usually cast VAD as a frame-level classification problem and employ statistical classifiers, such as Gaussian Mixture Model (GMM) or Deep Neural Network (DNN) to assign a speech/silence label for each frame. Due to the frame independent assumption classification, the VAD results tend to be fragile. To address this problem, in this paper, a new structured multi-frame prediction DNN approach is proposed to improve the segment-level VAD performance. During DNN training, VAD labels of multiple consecutive frames are concatenated together as targets and jointly trained with a speech enhancement task to achieve robustness under noisy conditions. During testing, the VAD label for each frame is obtained by merging the prediction results from neighbouring frames. Experiments on an Aurora 4 dataset showed that, conventional DNN based VAD has poor and unstable prediction performance while the proposed multitask trained VAD is much more robust.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] DNN-Based Voice Activity Detection with Multi-Task Learning
    Kang, Tae Gyoon
    Kim, Nam Soo
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2016, E99D (02): : 550 - 553
  • [2] MULTI-TASK LEARNING FOR VOICE TRIGGER DETECTION
    Sigtia, Siddharth
    Clark, Pascal
    Haynes, Rob
    Richards, Hywel
    Bridle, John
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7449 - 7453
  • [3] VOICE TOXICITY DETECTION USING MULTI-TASK LEARNING
    Nandwana, Mahesh Kumar
    He, Yifan
    Liu, Joseph
    Yu, Xiao
    Shang, Charles
    Du Bois, Eloi
    McGuire, Morgan
    Bhat, Kiran
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 331 - 335
  • [4] Adversarial Multi-Task Deep Learning for Noise-Robust Voice Activity Detection with Low Algorithmic Delay
    Larsen, Claus M.
    Koch, Peter
    Tan, Zheng-Hua
    INTERSPEECH 2022, 2022, : 3759 - 3763
  • [5] SPEECH ENHANCEMENT AIDED END-TO-END MULTI-TASK LEARNING FOR VOICE ACTIVITY DETECTION
    Tan, Xu
    Zhang, Xiao-Lei
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6823 - 6827
  • [6] ADAPTIVE AND ROBUST MULTI-TASK LEARNING
    Duan, Yaqi
    Wang, Kaizheng
    ANNALS OF STATISTICS, 2023, 51 (05) : 2015 - 2039
  • [7] Joint Disaster Classification and Victim Detection using Multi-Task Learning
    Tham, Mau-Luen
    Wong, Yi Jie
    Kwan, Ban Hoe
    Owada, Yasunori
    Sein, Myint Myint
    Chang, Yoong Choon
    2021 IEEE 12TH ANNUAL UBIQUITOUS COMPUTING, ELECTRONICS & MOBILE COMMUNICATION CONFERENCE (UEMCON), 2021, : 407 - 412
  • [8] Multi-Task Learning Based Joint Pulse Detection and Modulation Classification
    Akyon, Fatih Cagatay
    Nuhoglu, Mustafa Atahan
    Alp, Yasar Kemal
    Arikan, Orhan
    2019 27TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2019,
  • [9] Multi-task Attribute Joint Feature Learning
    Chang, Lu
    Fang, Yuchun
    Jiang, Xiaoda
    BIOMETRIC RECOGNITION, CCBR 2015, 2015, 9428 : 193 - 200
  • [10] Multi-Task Learning for Voice Related Recognition Tasks
    Montalvo, Ana
    Calvo, Jose R.
    Bonastre, Jean-Francois
    INTERSPEECH 2020, 2020, : 2997 - 3001