An Effective Contextual Language Modeling Framework for Speech Summarization with Augmented Features

被引:0
作者
Weng, Shi-Yan [1 ]
Lo, Tien-Hong [1 ]
Chen, Berlin [2 ]
机构
[1] Natl Taiwan Normal Univ, Taipei, Taiwan
[2] Natl Taiwan Normal Univ, ASUS AICS, Taipei, Taiwan
来源
28TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2020) | 2021年
关键词
Extractive speech summarization; BERT; speech recognition; confidence score;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Tremendous amounts of multimedia associated with speech information are driving an urgent need to develop efficient and effective automatic summarization methods. To this end, we have seen rapid progress in applying supervised deep neural network-based methods to extractive speech summarization. More recently, the Bidirectional Encoder Representations from Transformers (BERT) model was proposed and has achieved record-breaking success on many natural language processing (NLP) tasks such as question answering and language understanding. In view of this, we in this paper contextualize and enhance the state-of-the-art BERT-based model for speech summarization, while its contributions are at least three-fold. First, we explore the incorporation of confidence scores into sentence representations to see if such an attempt could help alleviate the negative effects caused by imperfect automatic speech recognition (ASR). Secondly, we also augment the sentence embeddings obtained from BERT with extra structural and linguistic features, such as sentence position and inverse document frequency (IDF) statistics. Finally, we validate the effectiveness of our proposed method on a benchmark dataset, in comparison to several classic and celebrated speech summarization methods.
引用
收藏
页码:316 / 320
页数:5
相关论文
共 33 条
[1]  
[Anonymous], 2005, International journal of computational linguistics & Chinese language processing
[2]  
[Anonymous], 2016, P SLT
[3]  
[Anonymous], 2015, ARXIV150606726
[4]   An Information Distillation Framework for Extractive Summarization [J].
Chen, Kuan-Yu ;
Liu, Shih-Hung ;
Chen, Berlin ;
Wang, Hsin-Min .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (01) :161-170
[5]  
Cheng JP, 2016, PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, P484
[6]   Hierarchical Pitman-Yor-Dirichlet Language Model [J].
Chien, Jen-Tzung .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (08) :1259-1272
[7]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[8]  
Goyal N., 2019, CORR
[9]  
Gu Yang, 2019, INT J ARTIFICIAL INT, V10, P27
[10]  
Hochreiter S, 1997, NEURAL COMPUT, V9, P1735, DOI [10.1162/neco.1997.9.8.1735, 10.1162/neco.1997.9.1.1, 10.1007/978-3-642-24797-2]