Building MEDISCO: Indonesian Speech Corpus for Medical Domain

被引:0
作者
Qorib, Muhammad Reza [1 ]
Adriani, Mirna [1 ]
机构
[1] Univ Indonesia, Fac Comp Sci, Depok, Indonesia
来源
2018 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP) | 2018年
关键词
Indonesian Automatic Speech Recognition; Medical Speech Corpus; Text Corpus; RECOGNITION;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this paper we report our work of building MEDISCO: Medical Indonesian Speech Corpus. The medical text corpus is collected from five Indonesian online medical consultation websites. From the text corpus, we created a speech corpus that consists of 360 sentences read by 13 speakers. In total, our speech corpus contains 731 medical terms and consists of 4,680 utterances with total duration 10 hours.
引用
收藏
页码:133 / 138
页数:6
相关论文
共 36 条
[1]  
Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265
[2]  
[Anonymous], 2017, ARXIV170806073
[3]  
[Anonymous], 2017, 2017 IEEE INT C MICR, DOI DOI 10.1109/COMCAS.2017.8244731
[4]  
[Anonymous], 2004, LREC
[5]  
[Anonymous], KERAS DEEP SPEECH
[6]  
[Anonymous], 2015 2 INT C ADV INF
[7]  
[Anonymous], 2011, P INT C EL ENG INF B
[8]  
[Anonymous], THESIS
[9]  
[Anonymous], IEEE SIGNAL PROCESSI
[10]  
[Anonymous], ARXIV151202595