Leveraging Prosody for Punctuation Prediction of Spontaneous Speech

被引:1
作者
Cho, Jenny Yeonjin [1 ]
Ng, Sara [2 ]
Trang Tran [3 ]
Ostendorf, Mari [1 ]
机构
[1] Univ Washington, Dept Elect & Comp Engn, Seattle, WA 98195 USA
[2] Univ Washington, Dept Linguist, Seattle, WA 98195 USA
[3] Univ Southern Calif, Inst Creat Technol, Los Angeles, CA 90007 USA
来源
INTERSPEECH 2022 | 2022年
关键词
Automatic punctuation; speech recognition; prosody; RECOGNITION; CUES;
D O I
10.21437/Interspeech.2022-11061
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper introduces a new neural model for punctuation prediction that incorporates prosodic features to improve automatic punctuation prediction in transcriptions of spontaneous speech. We explore the benefit of intonation and energy features over simply using pauses. In addition, the work poses the question of how to represent interruption points associated with disfluencies in spontaneous speech. In experiments on the Switchboard corpus, we find that prosodic information improved punctuation prediction fidelity for both hand transcripts and ASR output. Explicit modeling of interruption points can benefit prediction of standard punctuation, particularly if the convention associates interruptions with commas.
引用
收藏
页码:555 / 559
页数:5
相关论文
共 42 条
  • [1] [Anonymous], 2011, P WORKSHOP ASRU
  • [2] Bahdanau D., 2015, 3 INT C LEARN REPR I
  • [3] Beeferman D, 1998, INT CONF ACOUST SPEE, P689, DOI 10.1109/ICASSP.1998.675358
  • [4] Cho K, 2014, ARXIV14061078, P1724
  • [5] Christensen H., 2001, ISCA TUT RES WORKSH
  • [6] Cieri C., 2004, FISHER ENGLISH TRAIN
  • [7] End-to-end speech-to-dialog-act recognition
    Dang, Viet-Trung
    Zhao, Tianyu
    Ueno, Sei
    Inaguma, Hirofumi
    Kawahara, Tatsuya
    [J]. INTERSPEECH 2020, 2020, : 3910 - 3914
  • [8] Deshmukh N., 1998, P ICSLP
  • [9] Devlin Jacob, 2018, ANN C N AM CHAPTER A
  • [10] Fang M., 2019, 7 INT C LEARN REPR I, P1