Attention-Based Neural Text Segmentation

被引:27
作者
Badjatiya, Pinkesh [1 ]
Kurisinkel, Litton J. [1 ]
Gupta, Manish [1 ,2 ]
Varma, Vasudeva [1 ]
机构
[1] IIIT H, Hyderabad, India
[2] Microsoft, Redmond, WA USA
来源
ADVANCES IN INFORMATION RETRIEVAL (ECIR 2018) | 2018年 / 10772卷
关键词
D O I
10.1007/978-3-319-76941-7_14
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Text segmentation plays an important role in various Natural Language Processing (NLP) tasks like summarization, context understanding, document indexing and document noise removal. Previous methods for this task require manual feature engineering, huge memory requirements and large execution times. To the best of our knowledge, this paper is the first one to present a novel supervised neural approach for text segmentation. Specifically, we propose an attention-based bidirectional LSTM model where sentence embeddings are learned using CNNs and the segments are predicted based on contextual information. This model can automatically handle variable sized context information. Compared to the existing competitive baselines, the proposed model shows a performance improvement of similar to 7% in WinDiff score on three benchmark datasets.
引用
收藏
页码:180 / 193
页数:14
相关论文
共 33 条
[1]  
[Anonymous], 2012, CoRR
[2]   Statistical models for text segmentation [J].
Beeferman, D ;
Berger, A ;
Lafferty, J .
MACHINE LEARNING, 1999, 34 (1-3) :177-210
[3]  
Du L., 2015, AAAI, P2232
[4]  
Du L., 2013, P 2013 C N AM CHAPT, P190
[5]  
Eisenstein Jacob, 2008, EMNLP 08, P334, DOI DOI 10.3115/1613715.1613760
[6]  
Galley M, 2003, 41ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, P562
[7]  
GROSZ BJ, 1992, P 2 INT C SPOK LANG
[8]  
Hajime Mochizuki., 1998, Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics-Volume, V2, P881
[9]  
Hearst MA, 1997, COMPUT LINGUIST, V23, P33
[10]  
Hochreiter S, 1997, NEURAL COMPUT, V9, P1735, DOI [10.1162/neco.1997.9.8.1735, 10.1007/978-3-642-24797-2, 10.1162/neco.1997.9.1.1]