Punctuation Prediction Model for Conversational Speech

被引:26
|
作者
Zelasko, Piotr [1 ,2 ]
Szymanski, Piotr [1 ,3 ]
Mizgajski, Jan [1 ]
Szymczak, Adrian [1 ]
Carmiel, Yishay [1 ]
Dehak, Najim [4 ]
机构
[1] Intelligent Wire, Seattle, WA 98121 USA
[2] AGH Univ Sci & Technol, Dept Comp Sci Elect & Telecommun, Al Mickiewicza 30, Krakow, Poland
[3] Wroclaw Univ Technol, Dept Computat Intelligence, Wybrzeze Stanislawa Wyspianskiego 27, PL-50370 Wroclaw, Poland
[4] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD USA
来源
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES | 2018年
关键词
punctuation prediction; speech recognition;
D O I
10.21437/Interspeech.2018-1096
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
An ASR system usually does not predict any punctuation or capitalization. Lack of punctuation causes problems in result presentation and confuses both the human reader and off-the-shelf natural language processing algorithms. To overcome these limitations, we train two variants of Deep Neural Network (DNN) sequence labelling models - a Bidirectional Long Short-Term Memory (BLSTM) and a Convolutional Neural Network (CNN), to predict the punctuation. The models are trained on the Fisher corpus which includes punctuation annotation. In our experiments, we combine time-aligned and punctuated Fisher corpus transcripts using a sequence alignment algorithm. The neural networks are trained on Common Web Crawl GloVe embedding of the words in Fisher transcripts aligned with conversation side indicators and word time infomation. The CNNs yield a better precision and BLSTMs tend to have better recall. While BLSTMs make fewer mistakes overall, the punctuation predicted by the CNN is more accurate especially in the case of question marks. Our results constitute significant evidence that the distribution of words in time, as well as pre-trained embeddings, can be useful in the punctuation prediction task.
引用
收藏
页码:2633 / 2637
页数:5
相关论文
共 50 条
  • [1] Multimodal Semi-supervised Learning Framework for Punctuation Prediction in Conversational Speech
    Sunkara, Monica
    Ronanki, Srikanth
    Bekal, Dhanush
    Bodapati, Sravan
    Kirchhoff, Katrin
    INTERSPEECH 2020, 2020, : 4911 - 4915
  • [2] Leveraging Prosody for Punctuation Prediction of Spontaneous Speech
    Cho, Jenny Yeonjin
    Ng, Sara
    Trang Tran
    Ostendorf, Mari
    INTERSPEECH 2022, 2022, : 555 - 559
  • [3] Joint prediction of punctuation and disfluency in speech transcripts
    Lin, Binghuai
    Wang, Liyuan
    INTERSPEECH 2020, 2020, : 716 - 720
  • [4] Investigating for Punctuation Prediction in Chinese Speech Transcriptions
    Liu, Xin
    Liu, Yi
    Song, Xiao
    2018 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2018, : 74 - 78
  • [5] PUNCTUATION PREDICTION FOR STREAMING ON-DEVICE SPEECH RECOGNITION
    Zhou, Zhikai
    Tan, Tian
    Qian, Yanmin
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7277 - 7281
  • [6] SELF-ATTENTION BASED MODEL FOR PUNCTUATION PREDICTION USING WORD AND SPEECH EMBEDDINGS
    Yi, Jiangyan
    Tao, Jianhua
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7270 - 7274
  • [7] Focal Loss for Punctuation Prediction
    Yi, Jiangyan
    Tao, Jianhua
    Tian, Zhengkun
    Bai, Ye
    Fan, Cunhang
    INTERSPEECH 2020, 2020, : 721 - 725
  • [8] Improved Training for End-to-End Streaming Automatic Speech Recognition Model with Punctuation
    Kim, Hanbyul
    Seo, Seunghyun
    Lee, Lukas
    Baek, Seolki
    INTERSPEECH 2023, 2023, : 1653 - 1657
  • [9] Punctuation Prediction using a Bidirectional Recurrent Neural Network with Part-of-Speech Tagging
    Juin, Chin Char
    Wei, Richard Xiong Jun
    D'Haro, Luis Fernando
    Banchs, Rafael E.
    TENCON 2017 - 2017 IEEE REGION 10 CONFERENCE, 2017, : 1806 - 1811
  • [10] A 43 Language Multilingual Punctuation Prediction Neural Network Model
    Li, Xinxing
    Lin, Edward
    INTERSPEECH 2020, 2020, : 1067 - 1071