Punctuation Prediction Model for Conversational Speech

被引：26

作者：

Zelasko, Piotr ^{[1
,2
]}

Szymanski, Piotr ^{[1
,3
]}

Mizgajski, Jan ^{[1
]}

Szymczak, Adrian ^{[1
]}

Carmiel, Yishay ^{[1
]}

Dehak, Najim ^{[4
]}

机构：

[1] Intelligent Wire, Seattle, WA 98121 USA

[2] AGH Univ Sci & Technol, Dept Comp Sci Elect & Telecommun, Al Mickiewicza 30, Krakow, Poland

[3] Wroclaw Univ Technol, Dept Computat Intelligence, Wybrzeze Stanislawa Wyspianskiego 27, PL-50370 Wroclaw, Poland

[4] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD USA

来源：

19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES | 2018年

关键词：

punctuation prediction; speech recognition;

D O I：

10.21437/Interspeech.2018-1096

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

An ASR system usually does not predict any punctuation or capitalization. Lack of punctuation causes problems in result presentation and confuses both the human reader and off-the-shelf natural language processing algorithms. To overcome these limitations, we train two variants of Deep Neural Network (DNN) sequence labelling models - a Bidirectional Long Short-Term Memory (BLSTM) and a Convolutional Neural Network (CNN), to predict the punctuation. The models are trained on the Fisher corpus which includes punctuation annotation. In our experiments, we combine time-aligned and punctuated Fisher corpus transcripts using a sequence alignment algorithm. The neural networks are trained on Common Web Crawl GloVe embedding of the words in Fisher transcripts aligned with conversation side indicators and word time infomation. The CNNs yield a better precision and BLSTMs tend to have better recall. While BLSTMs make fewer mistakes overall, the punctuation predicted by the CNN is more accurate especially in the case of question marks. Our results constitute significant evidence that the distribution of words in time, as well as pre-trained embeddings, can be useful in the punctuation prediction task.

引用

页码：2633 / 2637

页数：5

共 50 条

[41] Evaluating Spoken Language Model Based on Filler Prediction Model in Speech Recognition
Ohta, Kengo
Tsuchiya, Masatoshi
Nakagawa, Seiichi
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1558 - +
[42] Fast and Accurate Capitalization and Punctuation for Automatic Speech Recognition Using Transformer and Chunk Merging
Binh Nguyen
Vu Bao Hung Nguyen
Hien Nguyen
Pham Ngoc Phuong
The-Loc Nguyen
Quoc Truong Do
Luong Chi Mai
2019 22ND CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (O-COCOSDA), 2019, : 29 - 33
[43] Conversational Speech Recognition Needs Data? Experiments with Austrian German
Linke, Julian
Garner, Philip N.
Kubin, Gernot
Schuppler, Barbara
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 4684 - 4691
[44] EFFECTIVE KEYWORD SEARCH FOR LOW-RESOURCED CONVERSATIONAL SPEECH
Lileikyte, Rasa
Fraga-Silva, Thiago
Lamel, Lori
Gauvain, Jean-Luc
Laurent, Antoine
Huang, Guangpu
2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5785 - 5789
[45] Techniques for Rapid and Robust Topic Identification of Conversational Telephone Speech
Wintrode, Jonathan
Kulp, Scott
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1515 - 1518
[46] A combined punctuation generation and speech recognition system and its performance enhancement using prosody
Kim, JH
Woodland, PC
SPEECH COMMUNICATION, 2003, 41 (04) : 563 - 577
[47] Analysis of Punctuation Prediction Models for Automated Transcript Generation in MOOC Videos
Garg, Bhrigu
Anika
PROCEEDINGS OF THE 2018 IEEE 6TH INTERNATIONAL CONFERENCE ON MOOCS, INNOVATION AND TECHNOLOGY IN EDUCATION (MITE 2018), 2018, : 19 - 26
[48] Gaze-contingent ASR for spontaneous, conversational speech: An evaluation
Cooke, Neil
Russell, Martin
2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4433 - 4436
[49] Speech Intelligibility Prediction for Hearing-Impaired Listeners with the LEAP Model
Rossbach, Jana
Huber, Rainer
Roettges, Saskia
Hauth, Christopher F.
Biberger, Thomas
Brand, Thomas
Meyer, Bernd T.
Rennies, Jan
INTERSPEECH 2022, 2022, : 3498 - 3502
[50] Efficient Ensemble of Deep Neural Networks for Multimodal Punctuation Restoration and the Spontaneous Informal Speech Dataset
Beigi, Homayoon
Liu, Xing Yi
ELECTRONICS, 2025, 14 (05):

← 1 2 3 4 5 →