mdctGAN: Taming transformer-based GAN for speech super-resolution with Modified DCT spectra

被引:2
|
作者
Shuai, Chenhao [1 ,3 ,4 ]
Shi, Chaohua [2 ,3 ,4 ]
Gan, Lu [3 ]
Liu, Hongqing [4 ]
机构
[1] Nanyang Technol Univ, Singapore, Singapore
[2] Xidian Univ, Xian, Shaanxi, Peoples R China
[3] Brunel Univ London, London, England
[4] Chongqing Univ Posts & Telecommun, Chongqing, Peoples R China
来源
关键词
speech super-resolution; phase information; GAN;
D O I
10.21437/Interspeech.2023-113
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech super-resolution (SSR) aims to recover a high resolution (HR) speech from its corresponding low resolution (LR) counterpart. Recent SSR methods focus more on the reconstruction of the magnitude spectrogram, ignoring the importance of phase reconstruction, thereby limiting the recovery quality. To address this issue, we propose mdctGAN, a novel SSR framework based on modified discrete cosine transform (MDCT). By adversarial learning in the MDCT domain, our method reconstructs HR speeches in a phase-aware manner without vocoders or additional post-processing. Furthermore, by learning frequency consistent features with self-attentive mechanism, mdctGAN guarantees a high quality speech reconstruction. For VCTK corpus dataset, the experiment results show that our model produces natural auditory quality with high MOS and PESQ scores. It also achieves the state-of-the-art log-spectral-distance (LSD) performance on 48 kHz target resolution from various input rates. Code is available from https://github.com/neoncloud/mdctGAN
引用
收藏
页码:5112 / 5116
页数:5
相关论文
共 50 条
  • [1] Transformer-based image super-resolution and its lightweight
    Zhang, Dongxiao
    Qi, Tangyao
    Gao, Juhao
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (26) : 68625 - 68649
  • [2] A Transformer-Based Model for Super-Resolution of Anime Image
    Xu, Shizhuo
    Dutta, Vibekananda
    He, Xin
    Matsumaru, Takafumi
    SENSORS, 2022, 22 (21)
  • [3] Combined Medical Image Super-Resolution and Modality Translation Using GAN Transformer-Based Model
    Abdollahi, Melika
    Davoudi, Heidar
    Ebrahimi, Mehran
    2023 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE, CSCI 2023, 2023, : 1133 - 1138
  • [4] STGAN: Swin Transformer-Based GAN to Achieve Remote Sensing Image Super-Resolution Reconstruction
    Huo, Wei
    Zhang, Xiaodan
    You, Shaojie
    Zhang, Yongkun
    Zhang, Qiyuan
    Hu, Naihao
    APPLIED SCIENCES-BASEL, 2025, 15 (01):
  • [5] Transformer-Based Selective Super-resolution for Efficient Image Refinement
    Zhang, Tianyi
    Kasichainula, Kishore
    Zhuo, Yaoxin
    Li, Baoxin
    Seo, Jae-Sun
    Cao, Yu
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 7, 2024, : 7305 - 7313
  • [6] Transformer-Based Multistage Enhancement for Remote Sensing Image Super-Resolution
    Lei, Sen
    Shi, Zhenwei
    Mo, Wenjing
    IEEE Transactions on Geoscience and Remote Sensing, 2022, 60
  • [7] Transformer-Based Multistage Enhancement for Remote Sensing Image Super-Resolution
    Lei, Sen
    Shi, Zhenwei
    Mo, Wenjing
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [8] Fusformer: A Transformer-Based Fusion Network for Hyperspectral Image Super-Resolution
    Hu, Jin-Fan
    Huang, Ting-Zhu
    Deng, Liang-Jian
    Dou, Hong-Xia
    Hong, Danfeng
    Vivone, Gemine
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [9] Study on Image Super-Resolution with Transformer-Based Encoder-Decoder Models
    Wang, Qing-You
    Lin, Yih-Lon
    2024 11TH INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS-TAIWAN, ICCE-TAIWAN 2024, 2024, : 213 - 214
  • [10] Super-resolution reconstruction of turbulent flows with a transformer-based deep learning framework
    Xu, Qin
    Zhuang, Zijian
    Pan, Yongcai
    Wen, Binghai
    PHYSICS OF FLUIDS, 2023, 35 (05)