Persian Automatic Text Summarization Based on Named Entity Recognition

被引:10
作者
Khademi, Mohammad Ebrahim [1 ]
Fakhredanesh, Mohammad [1 ]
机构
[1] Malek Ashtar Univ Technol, Tehran, Iran
关键词
Extractive summarization; Named entity recognition; Continuous vector space; Word embedding;
D O I
10.1007/s40998-020-00352-2
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this paper, we propose an unsupervised method for summarizing Farsi texts based on our neural named entity recognition (NER) system. This method consists of three phases: training a supervised NER model, recognizing named entities of the text, and generating a summary. The proposed method is an unsupervised extractive single-document summarization method. Although the proposed method is language independent, we focus on Farsi text summarization in this work. Firstly, we produce a word embedding based on Hamshahri2 corpus. Secondly, we train a neural network on Arman NER corpus. Then, the proposed algorithm ranks the sentences of the text based on the named entities in each sentence and produces the summary. Finally, the proposed method is evaluated on Pasokh single-document data set using the ROUGE evaluation measure. Without using any handcrafted features, our proposed method achieves state-of-the-art results. We compared our unsupervised method with the best supervised Farsi methods, and we achieved an overall improvement of ROUGE-2 recall score of 10.2%.
引用
收藏
页数:12
相关论文
共 52 条
[1]   Hamshahri: A standard Persian text collection [J].
AleAhmad, Abolfazl ;
Amiri, Hadi ;
Darrudi, Ehsan ;
Rahgozar, Masoud ;
Oroumchian, Farhad .
KNOWLEDGE-BASED SYSTEMS, 2009, 22 (05) :382-387
[2]  
[Anonymous], JIA
[3]  
[Anonymous], 2011, 12 ANN C INT SPEECH
[4]  
Asef P, 2014, SDP J, V11, P33
[5]   DBpedia: A nucleus for a web of open data [J].
Auer, Soeren ;
Bizer, Christian ;
Kobilarov, Georgi ;
Lehmann, Jens ;
Cyganiak, Richard ;
Ives, Zachary .
SEMANTIC WEB, PROCEEDINGS, 2007, 4825 :722-+
[6]   MACHINE-MADE INDEX FOR TECHNICAL LITERATURE - AN EXPERIMENT [J].
BAXENDALE, PB .
IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 1958, 2 (04) :354-361
[7]  
Bengio Y, 2001, ADV NEUR IN, V13, P932
[8]  
Berger A, 2000, 38TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, P294
[9]  
Brants T., 2007, Large Language Models in Machine Translation
[10]  
Chen ZG, 2015, PROCEEDINGS OF THE 53RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 7TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1, P106