Punctuation Prediction Model for Conversational Speech

被引：26

作者：

Zelasko, Piotr ^{[1
,2
]}

Szymanski, Piotr ^{[1
,3
]}

Mizgajski, Jan ^{[1
]}

Szymczak, Adrian ^{[1
]}

Carmiel, Yishay ^{[1
]}

Dehak, Najim ^{[4
]}

机构：

[1] Intelligent Wire, Seattle, WA 98121 USA

[2] AGH Univ Sci & Technol, Dept Comp Sci Elect & Telecommun, Al Mickiewicza 30, Krakow, Poland

[3] Wroclaw Univ Technol, Dept Computat Intelligence, Wybrzeze Stanislawa Wyspianskiego 27, PL-50370 Wroclaw, Poland

[4] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD USA

来源：

19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES | 2018年

关键词：

punctuation prediction; speech recognition;

D O I：

10.21437/Interspeech.2018-1096

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

An ASR system usually does not predict any punctuation or capitalization. Lack of punctuation causes problems in result presentation and confuses both the human reader and off-the-shelf natural language processing algorithms. To overcome these limitations, we train two variants of Deep Neural Network (DNN) sequence labelling models - a Bidirectional Long Short-Term Memory (BLSTM) and a Convolutional Neural Network (CNN), to predict the punctuation. The models are trained on the Fisher corpus which includes punctuation annotation. In our experiments, we combine time-aligned and punctuated Fisher corpus transcripts using a sequence alignment algorithm. The neural networks are trained on Common Web Crawl GloVe embedding of the words in Fisher transcripts aligned with conversation side indicators and word time infomation. The CNNs yield a better precision and BLSTMs tend to have better recall. While BLSTMs make fewer mistakes overall, the punctuation predicted by the CNN is more accurate especially in the case of question marks. Our results constitute significant evidence that the distribution of words in time, as well as pre-trained embeddings, can be useful in the punctuation prediction task.

引用

页码：2633 / 2637

页数：5

共 50 条

[1] Multimodal Semi-supervised Learning Framework for Punctuation Prediction in Conversational Speech
Sunkara, Monica
Ronanki, Srikanth
Bekal, Dhanush
Bodapati, Sravan
Kirchhoff, Katrin
INTERSPEECH 2020, 2020, : 4911 - 4915
[2] Leveraging Prosody for Punctuation Prediction of Spontaneous Speech
Cho, Jenny Yeonjin
Ng, Sara
Trang Tran
Ostendorf, Mari
INTERSPEECH 2022, 2022, : 555 - 559
[3] Joint prediction of punctuation and disfluency in speech transcripts
Lin, Binghuai
Wang, Liyuan
INTERSPEECH 2020, 2020, : 716 - 720
[4] Investigating for Punctuation Prediction in Chinese Speech Transcriptions
Liu, Xin
Liu, Yi
Song, Xiao
2018 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2018, : 74 - 78
[5] PUNCTUATION PREDICTION FOR STREAMING ON-DEVICE SPEECH RECOGNITION
Zhou, Zhikai
Tan, Tian
Qian, Yanmin
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7277 - 7281
[6] SELF-ATTENTION BASED MODEL FOR PUNCTUATION PREDICTION USING WORD AND SPEECH EMBEDDINGS
Yi, Jiangyan
Tao, Jianhua
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7270 - 7274
[7] Focal Loss for Punctuation Prediction
Yi, Jiangyan
Tao, Jianhua
Tian, Zhengkun
Bai, Ye
Fan, Cunhang
INTERSPEECH 2020, 2020, : 721 - 725
[8] Improved Training for End-to-End Streaming Automatic Speech Recognition Model with Punctuation
Kim, Hanbyul
Seo, Seunghyun
Lee, Lukas
Baek, Seolki
INTERSPEECH 2023, 2023, : 1653 - 1657
[9] Punctuation Prediction using a Bidirectional Recurrent Neural Network with Part-of-Speech Tagging
Juin, Chin Char
Wei, Richard Xiong Jun
D'Haro, Luis Fernando
Banchs, Rafael E.
TENCON 2017 - 2017 IEEE REGION 10 CONFERENCE, 2017, : 1806 - 1811
[10] A 43 Language Multilingual Punctuation Prediction Neural Network Model
Li, Xinxing
Lin, Edward
INTERSPEECH 2020, 2020, : 1067 - 1071

← 1 2 3 4 5 →