Development of a Large-Scale Mandarin Radio Speech Corpus

被引:0
|
作者
Chang, Yung-hsiang Shawn [1 ]
Liao, Yuan-fu [1 ]
Wang, Sheng-ming [1 ]
Wang, Jenq-haur [1 ]
Wang, Sing-yue [1 ]
Chen, Jhih-wei [1 ]
Chen, You-dian [1 ]
机构
[1] Natl Taipei Univ Technol, Taipei, Taiwan
关键词
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The Taiwan Mandarin Radio Speech Corpus consists of roughly 300 (and growing) hours of audio recordings, selected from Taiwan's National Education Radio (NER) archive. The corpus includes speech from hundreds of speakers and various speech styles (spontaneous conversational and read news). This corpus provides a rich resource for research in speech and automatic speech recognition (ASR). In this paper, we briefly introduce the corpus development approach and report two preliminary experimental results using this corpus.
引用
收藏
页数:2
相关论文
共 50 条
  • [1] DIDISPEECH: A LARGE SCALE MANDARIN SPEECH CORPUS
    Guo, Tingwei
    Wen, Cheng
    Jiang, Dongwei
    Luo, Ne
    Zhang, Ruixiong
    Zhao, Shuaijiang
    Li, Wubo
    Gong, Cheng
    Zou, Wei
    Han, Kun
    Li, Xiangang
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6968 - 6972
  • [2] THE SPEECHTRANSFORMER FOR LARGE-SCALE MANDARIN CHINESE SPEECH RECOGNITION
    Zhao, Yuanyuan
    Li, Jie
    Wang, Xiaorui
    Li, Yan
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7095 - 7099
  • [3] HKUST/MTS: A very large scale Mandarin Telephone Speech Corpus
    Liu, Yi
    Fung, Pascale
    Yang, Yongsheng
    Cieri, Christopher
    Huang, Shudong
    Graff, David
    CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2006, 4274 : 724 - +
  • [4] Problems on large-scale speech corpus and the applications in TTS
    Zhang S.
    Liu L.
    Diao L.-H.
    Jisuanji Xuebao/Chinese Journal of Computers, 2010, 33 (04): : 687 - 696
  • [5] RadioTalk: a large-scale corpus of talk radio transcripts
    Beeferman, Doug
    Brannon, William
    Roy, Deb
    INTERSPEECH 2019, 2019, : 564 - 568
  • [6] Vocal development in a large-scale crosslinguistic corpus
    Cychosz, Margaret
    Cristia, Alejandrina
    Bergelson, Elika
    Casillas, Marisa
    Baudet, Gladys
    Warlaumont, Anne S.
    Scaff, Camila
    Yankowitz, Lisa
    Seidl, Amanda
    DEVELOPMENTAL SCIENCE, 2021, 24 (05)
  • [7] Automatic Speech Recognition of Vietnamese for a New Large-Scale Corpus
    Tran, Linh Thi Thuc
    Kim, Han-Gyu
    La, Hoang Minh
    Pham, Su Van
    ELECTRONICS, 2024, 13 (05)
  • [8] SpeechMatrix: A Large-Scale Mined Corpus of Multilingual Speech-to-Speech Translations
    Duquenne, Paul-Ambroise
    Gong, Hongyu
    Dong, Ning
    Du, Jingfei
    Lee, Ann
    Goswami, Vedanuj
    Wang, Changhan
    Pino, Juan
    Sagot, Benoit
    Schwenk, Holger
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 16251 - 16269
  • [9] A PROGRESS REPORT OF THE TAIWAN MANDARIN RADIO SPEECH CORPUS PROJECT
    Liao, Yuan-fu
    Chang, Yung-hsiang Shawn
    Wang, Sing-yue
    Chen, Jhih-wei
    Wang, Sheng-ming
    Wang, Jenq-haur
    2017 20TH CONFERENCE OF THE ORIENTAL CHAPTER OF THE INTERNATIONAL COORDINATING COMMITTEE ON SPEECH DATABASES AND SPEECH I/O SYSTEMS AND ASSESSMENT (O-COCOSDA), 2017, : 63 - 68
  • [10] An automatic close copy speech synthesis tool for large-scale speech corpus evaluation
    Gibbon, Dafydd
    Bachan, Jolanta
    SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 902 - 907