Development of a Large-Scale Mandarin Radio Speech Corpus

被引:0
|
作者
Chang, Yung-hsiang Shawn [1 ]
Liao, Yuan-fu [1 ]
Wang, Sheng-ming [1 ]
Wang, Jenq-haur [1 ]
Wang, Sing-yue [1 ]
Chen, Jhih-wei [1 ]
Chen, You-dian [1 ]
机构
[1] Natl Taipei Univ Technol, Taipei, Taiwan
关键词
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The Taiwan Mandarin Radio Speech Corpus consists of roughly 300 (and growing) hours of audio recordings, selected from Taiwan's National Education Radio (NER) archive. The corpus includes speech from hundreds of speakers and various speech styles (spontaneous conversational and read news). This corpus provides a rich resource for research in speech and automatic speech recognition (ASR). In this paper, we briefly introduce the corpus development approach and report two preliminary experimental results using this corpus.
引用
收藏
页数:2
相关论文
共 50 条
  • [21] Tone Realization in Mandarin Speech: A Large Corpus Based Study of Disyllabic Words
    Wu, Yaru
    Lamel, Lori
    Adda-Decker, Martine
    2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,
  • [22] Exploiting the large-scale German Broadcast Corpus to boost the Fraunhofer IAIS Speech Recognition System
    Stadtschnitzer, Michael
    Schwenninger, Jochen
    Stein, Daniel
    Koehler, Joachim
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 3887 - 3890
  • [23] Large-Scale Multimodal Movie Dialogue Corpus
    Yasuhara, Ryu
    Inoue, Masashi
    Suga, Ikuya
    Kosaka, Tetsuo
    ICMI'16: PROCEEDINGS OF THE 18TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2016, : 414 - 415
  • [24] The Makerere Radio Speech Corpus: A Luganda Radio Corpus for Automatic Speech Recognition
    Mukiibi, Jonathan
    Katumba, Andrew
    Nakatumba-Nabende, Joyce
    Hussein, Ali
    Meyer, Josh
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 1945 - 1954
  • [25] A Phrase Topic Model for Large-scale Corpus
    Li, Baoji
    Xu, Wenhua
    Tian, Yuhui
    Chen, Juan
    2019 IEEE 4TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA ANALYSIS (ICCCBDA), 2019, : 634 - 639
  • [26] Statistical Analyses of Missing Translations in Simultaneous Interpretation Using a Large-scale Bilingual Speech Corpus
    Cai, Zhongxi
    Ryu, Koichiro
    Matsubara, Shigeki
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 4282 - 4288
  • [27] Towards robust spoken dialogue systems using large-scale in-car speech corpus
    Yamaguchi, Yukiko
    Hayashi, Keita
    Ono, Takahiro
    Kato, Shingo
    Irie, Yuki
    Ohno, Tomohiro
    Murao, Hiroya
    Matsubara, Shigeki
    Kawaguchi, Nobuo
    Takeda, Kazuya
    ADVANCES FOR IN-VEHICLE AND MOBILE SYSTEMS: CHALLENGES FOR INTERNATIONAL STANDARDS, 2007, : 211 - 222
  • [28] A Large-Scale Query Spelling Correction Corpus
    Hagen, Matthias
    Potthast, Martin
    Gohsen, Marcel
    Rathgeber, Anja
    Stein, Benno
    SIGIR'17: PROCEEDINGS OF THE 40TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2017, : 1261 - 1264
  • [29] Large-Scale Visual Speech Recognition
    Shillingford, Brendan
    Assael, Yannis
    Hoffman, Matthew W.
    Paine, Thomas
    Hughes, Cian
    Prabhu, Utsav
    Liao, Hank
    Sak, Hasim
    Rao, Kanishka
    Bennett, Lorrayne
    Mulville, Marie
    Denil, Misha
    Coppin, Ben
    Laurie, Ben
    Senior, Andrew
    de Freitas, Nando
    INTERSPEECH 2019, 2019, : 4135 - 4139
  • [30] A Large-Scale Japanese Speech Database
    1600, (The International Society for Computers and Their Applications (ISCA)):