Development of a Large-Scale Mandarin Radio Speech Corpus

被引：0

作者：

Chang, Yung-hsiang Shawn ^{[1
]}

Liao, Yuan-fu ^{[1
]}

Wang, Sheng-ming ^{[1
]}

Wang, Jenq-haur ^{[1
]}

Wang, Sing-yue ^{[1
]}

Chen, Jhih-wei ^{[1
]}

Chen, You-dian ^{[1
]}

机构：

[1] Natl Taipei Univ Technol, Taipei, Taiwan

来源：

2017 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS - TAIWAN (ICCE-TW) | 2017年

关键词：

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

The Taiwan Mandarin Radio Speech Corpus consists of roughly 300 (and growing) hours of audio recordings, selected from Taiwan's National Education Radio (NER) archive. The corpus includes speech from hundreds of speakers and various speech styles (spontaneous conversational and read news). This corpus provides a rich resource for research in speech and automatic speech recognition (ASR). In this paper, we briefly introduce the corpus development approach and report two preliminary experimental results using this corpus.

引用

页数：2

共 50 条

[21] Tone Realization in Mandarin Speech: A Large Corpus Based Study of Disyllabic Words
Wu, Yaru
Lamel, Lori
Adda-Decker, Martine
2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,
[22] Exploiting the large-scale German Broadcast Corpus to boost the Fraunhofer IAIS Speech Recognition System
Stadtschnitzer, Michael
Schwenninger, Jochen
Stein, Daniel
Koehler, Joachim
LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 3887 - 3890
[23] Large-Scale Multimodal Movie Dialogue Corpus
Yasuhara, Ryu
Inoue, Masashi
Suga, Ikuya
Kosaka, Tetsuo
ICMI'16: PROCEEDINGS OF THE 18TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2016, : 414 - 415
[24] The Makerere Radio Speech Corpus: A Luganda Radio Corpus for Automatic Speech Recognition
Mukiibi, Jonathan
Katumba, Andrew
Nakatumba-Nabende, Joyce
Hussein, Ali
Meyer, Josh
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 1945 - 1954
[25] A Phrase Topic Model for Large-scale Corpus
Li, Baoji
Xu, Wenhua
Tian, Yuhui
Chen, Juan
2019 IEEE 4TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA ANALYSIS (ICCCBDA), 2019, : 634 - 639
[26] Statistical Analyses of Missing Translations in Simultaneous Interpretation Using a Large-scale Bilingual Speech Corpus
Cai, Zhongxi
Ryu, Koichiro
Matsubara, Shigeki
PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 4282 - 4288
[27] Towards robust spoken dialogue systems using large-scale in-car speech corpus
Yamaguchi, Yukiko
Hayashi, Keita
Ono, Takahiro
Kato, Shingo
Irie, Yuki
Ohno, Tomohiro
Murao, Hiroya
Matsubara, Shigeki
Kawaguchi, Nobuo
Takeda, Kazuya
ADVANCES FOR IN-VEHICLE AND MOBILE SYSTEMS: CHALLENGES FOR INTERNATIONAL STANDARDS, 2007, : 211 - 222
[28] A Large-Scale Query Spelling Correction Corpus
Hagen, Matthias
Potthast, Martin
Gohsen, Marcel
Rathgeber, Anja
Stein, Benno
SIGIR'17: PROCEEDINGS OF THE 40TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2017, : 1261 - 1264
[29] Large-Scale Visual Speech Recognition
Shillingford, Brendan
Assael, Yannis
Hoffman, Matthew W.
Paine, Thomas
Hughes, Cian
Prabhu, Utsav
Liao, Hank
Sak, Hasim
Rao, Kanishka
Bennett, Lorrayne
Mulville, Marie
Denil, Misha
Coppin, Ben
Laurie, Ben
Senior, Andrew
de Freitas, Nando
INTERSPEECH 2019, 2019, : 4135 - 4139
[30] A Large-Scale Japanese Speech Database
1600, (The International Society for Computers and Their Applications (ISCA)):

← 1 2 3 4 5 →