Using Unsupervised Deep Learning for Automatic Summarization of Arabic Documents

被引:18
作者
Alami, Nabil [1 ]
En-nahnahi, Noureddine [1 ]
Ouatik, Said Alaoui [1 ]
Meknassi, Mohammed [1 ]
机构
[1] Sidi Mohamed Ben Abdellah Univ, Fac Sci Dhar EL Mahraz, LIM, Fes, Morocco
关键词
Arabic text summarization; Deep learning; Unsupervised feature learning; Variational auto-encoder; Graph-based summarization; Query-based summarization; RECOGNITION; CLASSIFICATION; ALGORITHM;
D O I
10.1007/s13369-018-3198-y
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Traditional Arabic text summarization (ATS) systems are based on bag-of-words representation, which involve a sparse and high-dimensional input data. Thus, dimensionality reduction is greatly needed to increase the power of features discrimination. In this paper, we present a new method for ATS using variational auto-encoder (VAE) model to learn a feature space from a high-dimensional input data. We explore several input representations such as term frequency (tf), tf-idf and both local and global vocabularies. All sentences are ranked according to the latent representation produced by the VAE. We investigate the impact of using VAE with two summarization approaches, graph-based and query-based approaches. Experiments on two benchmark datasets specifically designed for ATS show that the VAE using tf-idf representation of global vocabularies clearly provides a more discriminative feature space and improves the recall of other models. Experiment results confirm that the proposed method leads to better performance than most of the state-of-the-art extractive summarization approaches for both graph-based and query-based summarization approaches.
引用
收藏
页码:7803 / 7815
页数:13
相关论文
共 47 条
[1]   Hybrid robust iris recognition approach using iris image pre-processing, two-dimensional gabor features and multi-layer perceptron neural network/PSO [J].
Ahmadi, Neda ;
Akbarizadeh, Gholamreza .
IET BIOMETRICS, 2018, 7 (02) :153-162
[2]   Detection of Lung Nodules in CT Scans Based on Unsupervised Feature Learning and Fuzzy Inference [J].
Akbarizadeh, Gholamreza ;
Moghaddam, Amal Eisapour .
JOURNAL OF MEDICAL IMAGING AND HEALTH INFORMATICS, 2016, 6 (02) :477-483
[3]  
Akbarizadeh G, 2014, MALAYS J COMPUT SCI, V27, P218
[4]   An unsupervised approach to generating generic summaries of documents [J].
Alguliyev, Rasim M. ;
Aliguliyev, Ramiz M. ;
Isazade, Nijat R. .
APPLIED SOFT COMPUTING, 2015, 34 :236-250
[5]  
[Anonymous], 2014, Advances in neural information processing systems
[6]  
[Anonymous], P 2 INT C LEARN REPR
[7]  
[Anonymous], 2014, ARXIV14126815
[8]  
[Anonymous], 2008, ICML 08, DOI 10.1145/1390156.1390294
[9]  
[Anonymous], 2010, P LANGUAGE RESOURCES
[10]  
[Anonymous], 2014, ICLR 2014