mdctGAN: Taming transformer-based GAN for speech super-resolution with Modified DCT spectra

被引:2
|
作者
Shuai, Chenhao [1 ,3 ,4 ]
Shi, Chaohua [2 ,3 ,4 ]
Gan, Lu [3 ]
Liu, Hongqing [4 ]
机构
[1] Nanyang Technol Univ, Singapore, Singapore
[2] Xidian Univ, Xian, Shaanxi, Peoples R China
[3] Brunel Univ London, London, England
[4] Chongqing Univ Posts & Telecommun, Chongqing, Peoples R China
来源
关键词
speech super-resolution; phase information; GAN;
D O I
10.21437/Interspeech.2023-113
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech super-resolution (SSR) aims to recover a high resolution (HR) speech from its corresponding low resolution (LR) counterpart. Recent SSR methods focus more on the reconstruction of the magnitude spectrogram, ignoring the importance of phase reconstruction, thereby limiting the recovery quality. To address this issue, we propose mdctGAN, a novel SSR framework based on modified discrete cosine transform (MDCT). By adversarial learning in the MDCT domain, our method reconstructs HR speeches in a phase-aware manner without vocoders or additional post-processing. Furthermore, by learning frequency consistent features with self-attentive mechanism, mdctGAN guarantees a high quality speech reconstruction. For VCTK corpus dataset, the experiment results show that our model produces natural auditory quality with high MOS and PESQ scores. It also achieves the state-of-the-art log-spectral-distance (LSD) performance on 48 kHz target resolution from various input rates. Code is available from https://github.com/neoncloud/mdctGAN
引用
收藏
页码:5112 / 5116
页数:5
相关论文
共 50 条
  • [31] S2R: Exploring a Double-Win Transformer-Based Framework for Ideal and Blind Super-Resolution
    She, Minghao
    Mao, Wendong
    Shi, Huihong
    Wang, Zhongfeng
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT VI, 2023, 14259 : 522 - 537
  • [32] Iris Recognition for Biometrics Based on CNN with Super-resolution GAN
    Kashihara, Koji
    2020 IEEE INTERNATIONAL CONFERENCE ON EVOLVING AND ADAPTIVE INTELLIGENT SYSTEMS (EAIS), 2020,
  • [33] Zoom based image super-resolution using DCT with LBP as characteristic model
    Doshi, Meera
    Gajjar, Prakash
    Kothari, Ashish
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2022, 34 (02) : 72 - 85
  • [34] Reference-Based Image Super-Resolution with Deformable Attention Transformer
    Cao, Jiezhang
    Liang, Jingyun
    Zhang, Kai
    Li, Yawei
    Zhang, Yulun
    Wang, Wenguan
    Van Gool, Luc
    COMPUTER VISION - ECCV 2022, PT XVIII, 2022, 13678 : 325 - 342
  • [35] LSwinSR: UAV Imagery Super-Resolution Based on Linear Swin Transformer
    Li, Rui
    Zhao, Xiaowei
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
  • [36] Dual-Path Transformer-Based GAN for Co-speech Gesture Synthesis
    Qian, Xinyuan
    Tang, Hao
    Yang, Jichen
    Zhu, Hongxu
    Yin, Xu-Cheng
    INTERNATIONAL JOURNAL OF SOCIAL ROBOTICS, 2024,
  • [37] ROBUST SUPER-RESOLUTION GAN, WITH MANIFOLD-BASED AND PERCEPTION LOSS
    Upadhyay, Uddeshya
    Awate, Suyash P.
    2019 IEEE 16TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (ISBI 2019), 2019, : 1372 - 1376
  • [38] GAN-Based Image Super-Resolution with a Novel Quality Loss
    Zhu, Xining
    Zhang, Lin
    Zhang, Lijun
    Liu, Xiao
    Shen, Ying
    Zhao, Shengjie
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2020, 2020
  • [39] Image super-resolution reconstruction based on self-attention GAN
    Wang X.-S.
    Chao J.
    Cheng Y.-H.
    Kongzhi yu Juece/Control and Decision, 2021, 36 (06): : 1324 - 1332
  • [40] CBCT Tooth Images Super-Resolution Method Based on GAN Prior
    Song Q.
    Li Y.
    Fan Y.
    Lu S.
    Zhou Y.
    Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2023, 35 (11): : 1751 - 1759