Punctuation Prediction Model for Conversational Speech

被引:26
|
作者
Zelasko, Piotr [1 ,2 ]
Szymanski, Piotr [1 ,3 ]
Mizgajski, Jan [1 ]
Szymczak, Adrian [1 ]
Carmiel, Yishay [1 ]
Dehak, Najim [4 ]
机构
[1] Intelligent Wire, Seattle, WA 98121 USA
[2] AGH Univ Sci & Technol, Dept Comp Sci Elect & Telecommun, Al Mickiewicza 30, Krakow, Poland
[3] Wroclaw Univ Technol, Dept Computat Intelligence, Wybrzeze Stanislawa Wyspianskiego 27, PL-50370 Wroclaw, Poland
[4] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD USA
来源
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES | 2018年
关键词
punctuation prediction; speech recognition;
D O I
10.21437/Interspeech.2018-1096
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
An ASR system usually does not predict any punctuation or capitalization. Lack of punctuation causes problems in result presentation and confuses both the human reader and off-the-shelf natural language processing algorithms. To overcome these limitations, we train two variants of Deep Neural Network (DNN) sequence labelling models - a Bidirectional Long Short-Term Memory (BLSTM) and a Convolutional Neural Network (CNN), to predict the punctuation. The models are trained on the Fisher corpus which includes punctuation annotation. In our experiments, we combine time-aligned and punctuated Fisher corpus transcripts using a sequence alignment algorithm. The neural networks are trained on Common Web Crawl GloVe embedding of the words in Fisher transcripts aligned with conversation side indicators and word time infomation. The CNNs yield a better precision and BLSTMs tend to have better recall. While BLSTMs make fewer mistakes overall, the punctuation predicted by the CNN is more accurate especially in the case of question marks. Our results constitute significant evidence that the distribution of words in time, as well as pre-trained embeddings, can be useful in the punctuation prediction task.
引用
收藏
页码:2633 / 2637
页数:5
相关论文
共 50 条
  • [41] Evaluating Spoken Language Model Based on Filler Prediction Model in Speech Recognition
    Ohta, Kengo
    Tsuchiya, Masatoshi
    Nakagawa, Seiichi
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1558 - +
  • [42] Fast and Accurate Capitalization and Punctuation for Automatic Speech Recognition Using Transformer and Chunk Merging
    Binh Nguyen
    Vu Bao Hung Nguyen
    Hien Nguyen
    Pham Ngoc Phuong
    The-Loc Nguyen
    Quoc Truong Do
    Luong Chi Mai
    2019 22ND CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (O-COCOSDA), 2019, : 29 - 33
  • [43] Conversational Speech Recognition Needs Data? Experiments with Austrian German
    Linke, Julian
    Garner, Philip N.
    Kubin, Gernot
    Schuppler, Barbara
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 4684 - 4691
  • [44] EFFECTIVE KEYWORD SEARCH FOR LOW-RESOURCED CONVERSATIONAL SPEECH
    Lileikyte, Rasa
    Fraga-Silva, Thiago
    Lamel, Lori
    Gauvain, Jean-Luc
    Laurent, Antoine
    Huang, Guangpu
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5785 - 5789
  • [45] Techniques for Rapid and Robust Topic Identification of Conversational Telephone Speech
    Wintrode, Jonathan
    Kulp, Scott
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1515 - 1518
  • [46] A combined punctuation generation and speech recognition system and its performance enhancement using prosody
    Kim, JH
    Woodland, PC
    SPEECH COMMUNICATION, 2003, 41 (04) : 563 - 577
  • [47] Analysis of Punctuation Prediction Models for Automated Transcript Generation in MOOC Videos
    Garg, Bhrigu
    Anika
    PROCEEDINGS OF THE 2018 IEEE 6TH INTERNATIONAL CONFERENCE ON MOOCS, INNOVATION AND TECHNOLOGY IN EDUCATION (MITE 2018), 2018, : 19 - 26
  • [48] Gaze-contingent ASR for spontaneous, conversational speech: An evaluation
    Cooke, Neil
    Russell, Martin
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4433 - 4436
  • [49] Speech Intelligibility Prediction for Hearing-Impaired Listeners with the LEAP Model
    Rossbach, Jana
    Huber, Rainer
    Roettges, Saskia
    Hauth, Christopher F.
    Biberger, Thomas
    Brand, Thomas
    Meyer, Bernd T.
    Rennies, Jan
    INTERSPEECH 2022, 2022, : 3498 - 3502
  • [50] Efficient Ensemble of Deep Neural Networks for Multimodal Punctuation Restoration and the Spontaneous Informal Speech Dataset
    Beigi, Homayoon
    Liu, Xing Yi
    ELECTRONICS, 2025, 14 (05):