Development of a Large-Scale Mandarin Radio Speech Corpus

被引:0
|
作者
Chang, Yung-hsiang Shawn [1 ]
Liao, Yuan-fu [1 ]
Wang, Sheng-ming [1 ]
Wang, Jenq-haur [1 ]
Wang, Sing-yue [1 ]
Chen, Jhih-wei [1 ]
Chen, You-dian [1 ]
机构
[1] Natl Taipei Univ Technol, Taipei, Taiwan
关键词
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The Taiwan Mandarin Radio Speech Corpus consists of roughly 300 (and growing) hours of audio recordings, selected from Taiwan's National Education Radio (NER) archive. The corpus includes speech from hundreds of speakers and various speech styles (spontaneous conversational and read news). This corpus provides a rich resource for research in speech and automatic speech recognition (ASR). In this paper, we briefly introduce the corpus development approach and report two preliminary experimental results using this corpus.
引用
收藏
页数:2
相关论文
共 50 条
  • [41] VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation
    Wang, Changhan
    Riviere, Morgane
    Lee, Ann
    Wu, Anne
    Talnikar, Chaitanya
    Haziza, Daniel
    Williamson, Mary
    Pino, Juan
    Dupoux, Emmanuel
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 993 - 1003
  • [42] CSTD-Telugu Corpus: Crowd-Sourced Approach for Large-Scale Speech data collection
    Mirishkar, Ganesh S.
    Raju, Vishnu Vidyadhara V.
    Naroju, Meher Dinesh
    Maity, Sudhamay
    Yalla, Prakash
    Vuppala, Anil Kumar
    2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 511 - 517
  • [43] A study on time-dependent voice quality variation in a large-scale single speaker speech corpus used for speech synthesis
    Kawai, H
    Tsuzaki, M
    PROCEEDINGS OF THE 2002 IEEE WORKSHOP ON SPEECH SYNTHESIS, 2002, : 15 - 18
  • [44] Annotation Tool Development for Large-Scale Corpus Creation Projects at the Linguistic Data Consortium
    Maeda, Kazuaki
    Lee, Haejoong
    Medero, Shawn
    Medero, Julie
    Parker, Robert
    Strassel, Stephanie
    SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 3052 - 3056
  • [45] Build a large-scale syntactically annotated Chinese corpus
    Qiang, Z
    TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2003, 2807 : 106 - 113
  • [46] Mining Preconditions of APIs in Large-Scale Code Corpus
    Hoan Anh Nguyen
    Dyer, Robert
    Nguyen, Tien N.
    Rajan, Hridesh
    22ND ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (FSE 2014), 2014, : 166 - 177
  • [47] A large-scale corpus system for identifying thesaural relations
    Collier, A
    Pacey, M
    CORPUS-BASED STUDIES IN ENGLISH, 1997, (20): : 87 - 100
  • [48] Captioning Videos Using Large-Scale Image Corpus
    Xiao-Yu Du
    Yang Yang
    Liu Yang
    Fu-Min Shen
    Zhi-Guang Qin
    Jin-Hui Tang
    Journal of Computer Science and Technology, 2017, 32 : 480 - 493
  • [49] Captioning Videos Using Large-Scale Image Corpus
    Du, Xiao-Yu
    Yang, Yang
    Yang, Liu
    Shen, Fu-Min
    Qin, Zhi-Guang
    Tang, Jin-Hui
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2017, 32 (03) : 480 - 493
  • [50] New word detection based on large-scale corpus
    Digital Technology Laboratory, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100080, China
    不详
    不详
    Jisuanji Yanjiu yu Fazhan, 2006, 5 (927-932):