BiDETS: Binary Differential Evolutionary based Text Summarization

被引:0
作者
Aljahdali, Hani Moetque [1 ]
Ahmed, Ahmed Hamza Osman [1 ]
Abuobieda, Albaraa [2 ]
机构
[1] King Abdulaziz Univ Rabigh, Fac Comp & Informat Technol, Dept Informat Syst, Rabigh, Saudi Arabia
[2] Int Univ Africa, Fac Comp Studies, Khartoum 2469, Sudan
关键词
Differential evolution; text summarization; PSO; GA; evolutionary algorithms; optimization techniques; feature weighting; ROUGE; DUC; GLOBAL OPTIMIZATION; DOCUMENTS; ALGORITHM; SELECTION; MODELS; COLONY; GA;
D O I
10.14569/IJACSA.2021.0120132
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In extraction-based automatic text summarization (ATS) applications, feature scoring is the cornerstone of the summarization process since it is used for selecting the candidate summary sentences. Handling all features equally leads to generating disqualified summaries. Feature Weighting (FW) is an important approach used to weight the scores of the features based on their presence importance in the current context. Therefore, some of the ATS researchers have proposed evolutionary-based machine learning methods, such as Particle Swarm Optimization (PSO) and Genetic Algorithm (GA), to extract superior weights to their assigned features. Then the extracted weights are used to tune the scored-features in order to generate a high qualified summary. In this paper, the Differential Evolution (DE) algorithm was proposed to act as a feature weighting machine learning method for extraction-based ATS problems. In addition to enabling the DE to represent and control the assigned features in binary dimension space, it was modulated into a binary coded format. Simple mathematical calculation features have been selected from various literature and employed in this study. The sentences in the documents are first clustered according to a multi-objective clustering concept. DE approach simultaneously optimizes two objective functions, which are compactness measuring and separating the sentence clusters based on these objectives. In order to automatically detect a number of sentence clusters contained in a document, representative sentences from various clusters are chosen with certain sentence scoring features to produce the summary. The method was tested and trained using DUC2002 dataset to learn the weight of each feature. To create comparative and competitive findings, the proposed DE method was compared with evolutionary methods: PSO and GA. The DE was also compared against the best and worst systems benchmark in DUC 2002. The performance of the BiDETS model is scored with 49% similar to human performance (52%) in ROUGE-I; 26% which is over the human performance (23%) using ROUGE-2; and lastly 45% similar to human performance (48%) using ROUGE-L. These results showed that the proposed method outperformed all other methods in terms of F-measure using the ROUGE evaluation tool.
引用
收藏
页码:259 / 271
页数:13
相关论文
共 75 条
[1]  
Alguliev Rasim, 2009, Intelligent Information Management, V1, P128, DOI 10.4236/iim.2009.12019
[2]  
Alguliev R. M., KNOWLEDGE BASED SYST
[3]  
Alguliev RM, 2007, APPL COMPUT MATH-BAK, V6, P278
[4]   Multiple documents summarization based on evolutionary optimization algorithm [J].
Alguliev, Rasim M. ;
Aliguliyev, Ramiz M. ;
Isazade, Nijat R. .
EXPERT SYSTEMS WITH APPLICATIONS, 2013, 40 (05) :1675-1689
[5]   Population set-based global optimization algorithms:: some modifications and numerical studies [J].
Ali, MM ;
Törn, A .
COMPUTERS & OPERATIONS RESEARCH, 2004, 31 (10) :1703-1725
[6]   Clustering of document collection - A weighting approach [J].
Aliguliyev, Ramiz M. .
EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (04) :7904-7916
[7]  
[Anonymous], 1997, READINGS INFORM RETR
[8]  
[Anonymous], 2004, P ACL WORKSH TEXT SU
[9]   MACHINE-MADE INDEX FOR TECHNICAL LITERATURE - AN EXPERIMENT [J].
BAXENDALE, PB .
IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 1958, 2 (04) :354-361
[10]  
Bhattacharya Paheli, 2019, Advances in Information Retrieval. 41st European Conference on IR Research, ECIR 2019. Proceedings: Lecture Notes in Computer Science (LNCS 11437), P413, DOI 10.1007/978-3-030-15712-8_27