Coarse-to-fine Optimization for Speech Enhancement

被引:4
|
作者
Yao, Jian [1 ]
Al-Dahle, Ahmad [1 ]
机构
[1] Apple Inc, Cupertino, CA 95014 USA
来源
关键词
speech enhancement; coarse-to-fine; deep learning; generative model; discriminative model; dynamic perceptual loss;
D O I
10.21437/Interspeech.2019-2792
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
In this paper, we propose the coarse-to-fine optimization for the task of speech enhancement. Cosine similarity loss [1] has proven to be an effective metric to measure similarity of speech signals. However, due to the large variance of the enhanced speech with even the same cosine similarity loss in high dimensional space, a deep neural network learnt with this loss might not be able to predict enhanced speech with good quality. Our coarse-to-fine strategy optimizes the cosine similarity loss for different granularities so that more constraints are added to the prediction from high dimension to relatively low dimension. In this way, the enhanced speech will better resemble the clean speech. Experimental results show the effectiveness of our proposed coarse-to-fine optimization in both discriminative models and generative models. Moreover, we apply the coarse-to-fine strategy to the adversarial loss in generative adversarial network (GAN) and propose dynamic perceptual loss, which dynamically computes the adversarial loss from coarse resolution to fine resolution. Dynamic perceptual loss further improves the accuracy and achieves state-of-the-art results compared with other generative models.
引用
收藏
页码:2743 / 2747
页数:5
相关论文
共 50 条
  • [1] A Coarse-to-Fine Optimization for Hyperspectral Band Selection
    Jiang, Xuefeng
    Lin, Jianzhe
    Liu, Junrui
    Li, Shuying
    Zhang, Yanning
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2019, 16 (04) : 638 - 642
  • [2] A coarse-to-fine deformable contour optimization framework
    Akgul, YS
    Kambhamettu, C
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2003, 25 (02) : 174 - 186
  • [3] Active speech source localization by a dual coarse-to-fine search
    Duraiswami, R
    Dmitry, Z
    Davis, LS
    2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING - VOL IV: SIGNAL PROCESSING FOR COMMUNICATIONS; VOL V: SIGNAL PROCESSING EDUCATION SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO & ELECTROACOUSTICS; VOL VI: SIGNAL PROCESSING THEORY & METHODS STUDENT FORUM, 2001, : 3309 - 3312
  • [4] Coarse-to-fine speech separation method in the time-frequency domain
    Yang, Xue
    Bao, Changchun
    Chen, Xianhong
    SPEECH COMMUNICATION, 2023, 155
  • [5] One-pass coarse-to-fine segmental speech decoding algorithm
    Tang, Yun
    Liu, Wen-Ju
    Zhang, Hua
    Xu, Bo
    Ding, Guo-Hong
    2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13, 2006, : 441 - 444
  • [6] A global cuckoo optimization algorithm using coarse-to-fine search
    Ma, Wei
    Sun, Zheng-Xing
    Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2015, 43 (12): : 2429 - 2439
  • [7] 'Coarse-to-fine' cyclopean processing
    Popple, AV
    Findlay, JM
    PERCEPTION, 1999, 28 (02) : 155 - 165
  • [8] Coarse-to-fine face detection
    Fleuret, F
    Geman, D
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2001, 41 (1-2) : 85 - 107
  • [9] Coarse-to-fine manifold learning
    Castro, R
    Willett, R
    Nowak, R
    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL III, PROCEEDINGS: IMAGE AND MULTIDIMENSIONAL SIGNAL PROCESSING SPECIAL SESSIONS, 2004, : 992 - 995
  • [10] Feature enhancement and coarse-to-fine detection for RGB-D tracking
    Zhu, Xue-Feng
    Xu, Tianyang
    Wu, Xiao-Jun
    Kittler, Josef
    PATTERN RECOGNITION LETTERS, 2024, 179 : 130 - 136