Leveraging Prosody for Punctuation Prediction of Spontaneous Speech

被引:1
作者
Cho, Jenny Yeonjin [1 ]
Ng, Sara [2 ]
Trang Tran [3 ]
Ostendorf, Mari [1 ]
机构
[1] Univ Washington, Dept Elect & Comp Engn, Seattle, WA 98195 USA
[2] Univ Washington, Dept Linguist, Seattle, WA 98195 USA
[3] Univ Southern Calif, Inst Creat Technol, Los Angeles, CA 90007 USA
来源
INTERSPEECH 2022 | 2022年
关键词
Automatic punctuation; speech recognition; prosody; RECOGNITION; CUES;
D O I
10.21437/Interspeech.2022-11061
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper introduces a new neural model for punctuation prediction that incorporates prosodic features to improve automatic punctuation prediction in transcriptions of spontaneous speech. We explore the benefit of intonation and energy features over simply using pauses. In addition, the work poses the question of how to represent interruption points associated with disfluencies in spontaneous speech. In experiments on the Switchboard corpus, we find that prosodic information improved punctuation prediction fidelity for both hand transcripts and ASR output. Explicit modeling of interruption points can benefit prediction of standard punctuation, particularly if the convention associates interruptions with commas.
引用
收藏
页码:555 / 559
页数:5
相关论文
共 42 条
  • [31] Can prosody aid the automatic classification of dialog acts in conversational speech?
    Shriberg, E
    Bates, R
    Stolcke, A
    Taylor, P
    Jurafsky, D
    Ries, K
    Coccaro, N
    Martin, R
    Meteer, M
    van Ess-Dykema, C
    [J]. LANGUAGE AND SPEECH, 1998, 41 : 443 - 492
  • [32] Silverman K., 1992, P ICSLP
  • [33] Dialogue act modeling for automatic tagging and recognition of conversational speech
    Stolcke, A
    Ries, K
    Coccaro, N
    Shriberg, E
    Bates, R
    Jurafsky, D
    Taylor, P
    Martin, R
    Van Ess-Dykema, C
    Meteer, M
    [J]. COMPUTATIONAL LINGUISTICS, 2000, 26 (03) : 339 - 373
  • [34] Sunkara M, 2020, NATURAL LANGUAGE PROCESSING FOR MEDICAL CONVERSATIONS, P53
  • [35] Tran T., 2020, THESIS U WASHINGTON
  • [36] Assessing the Use of Prosody in Constituency Parsing of Imperfect Transcripts
    Tran, Trang
    Ostendorf, Mari
    [J]. INTERSPEECH 2021, 2021, : 2626 - 2630
  • [37] Tran Trang, 2018, P 2018 C N AM CHAPT, P69, DOI 10.18653/v1/n18-1007
  • [38] On the Role of Style in Parsing Speech with Neural Models
    Trang Tran
    Yuan, Jiahong
    Liu, Yang
    Ostendorf, Mari
    [J]. INTERSPEECH 2019, 2019, : 4190 - 4194
  • [39] Integrating prosodic and lexical cues for automatic topic segmentation
    Tür, G
    Hakkani-Tür, D
    Stolcke, A
    Shriberg, E
    [J]. COMPUTATIONAL LINGUISTICS, 2001, 27 (01) : 31 - 57
  • [40] Punctuation Prediction Model for Conversational Speech
    Zelasko, Piotr
    Szymanski, Piotr
    Mizgajski, Jan
    Szymczak, Adrian
    Carmiel, Yishay
    Dehak, Najim
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2633 - 2637