Deep Learning Based Semantic Similarity Detection Using Text Data

被引:17
作者
Mansoor, Muhammad [1 ]
Rehman, Zahoor Ur [1 ]
Shaheen, Muhammad [2 ]
Khan, Muhammad Attique [3 ]
Habib, Mohamed [4 ,5 ]
机构
[1] COMSATS Univ Islamabad, Comp Sci Dept, Attock Campus, Islamabad, Pakistan
[2] Fdn Univ Islamabad, Fac Engn & IT, Islamabad, Pakistan
[3] HITEC Univ Taxila, Dept Comp Sci, Taxila, Pakistan
[4] Saudi Elect Univ, Coll Comp & Informat, Riyadh, Saudi Arabia
[5] Port Said Univ, Fac Engn, Port Fuad City, Egypt
来源
INFORMATION TECHNOLOGY AND CONTROL | 2020年 / 49卷 / 04期
关键词
Deep Learning; Semantics; Similarity; Quora; question duplication; LSTM and CNN; CONTRAST ENHANCEMENT; NEURAL-NETWORK; RECOGNITION; SELECTION; MODEL;
D O I
10.5755/j01.itc.49.4.27118
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Similarity detection in the text is the main task for a number of Natural Language Processing (NLP) applications. As textual data are comparatively large in quantity and in volume than the numeric data, measuring textual similarity is one of the important problems. Most of the similarity detection algorithms are based upon word to word matching, sentence/paragraph matching, and matching of the whole document. In this research, a novel approach is proposed using deep learning models, combining Long Short-Term Memory Network (LSTM) with Convolutional Neural Network (CNN) for measuring semantics similarity between two questions. The proposed model takes sentence pairs as input to measure the similarity between them. The model is tested on publicly available Quora's dataset. In comparison to the existing techniques gave 87.50 % accuracy which is better than the previous approaches.
引用
收藏
页码:495 / 510
页数:16
相关论文
共 61 条
[1]   A deep network model for paraphrase detection in short text messages [J].
Agarwal, Basant ;
Ramampiaro, Heri ;
Langseth, Helge ;
Ruocco, Massimiliano .
INFORMATION PROCESSING & MANAGEMENT, 2018, 54 (06) :922-937
[2]  
Akmal Farah, 2020, 2020 6th Conference on Data Science and Machine Learning Applications (CDMA), P146, DOI 10.1109/CDMA47397.2020.00031
[3]   Paraphrase identification and semantic text similarity analysis in Arabic news tweets using lexical, syntactic, and semantic features [J].
Al-Smadi, Mohammad ;
Jaradat, Zain ;
Al-Ayyoub, Mahmoud ;
Jararweh, Yaser .
INFORMATION PROCESSING & MANAGEMENT, 2017, 53 (03) :640-652
[4]  
Arora Sanjeev, 2017, ICLR
[5]   A multilevel paradigm for deep convolutional neural network features selection with an application to human gait recognition [J].
Arshad, Habiba ;
Khan, Muhammad Attique ;
Sharif, Muhammad Irfan ;
Yasmin, Mussarat ;
Tavares, Joao Manuel R. S. ;
Zhang, Yu-Dong ;
Satapathy, Suresh Chandra .
EXPERT SYSTEMS, 2022, 39 (07)
[6]  
Bao W., 2018, 2018 IEEE INT C AS L, DOI [10.1109/IALP.2018.8629212, DOI 10.1109/IALP.2018.8629212]
[7]   Offline signature verification system: a novel technique of fusion of GLCM and geometric features using SVM [J].
Batool, Faiza Eba ;
Attique, Muhammad ;
Sharif, Muhammad ;
Javed, Kashif ;
Nazir, Muhammad ;
Abbasi, Aaqif Afzaal ;
Iqbal, Zeshan ;
Riaz, Naveed .
MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (05) :14959-14978
[8]  
Bogdanova D, 2015, P 19 C COMPUTATIONAL, P123, DOI DOI 10.18653/V1/K15-1013
[9]  
Bojanowski P., 2016, Trans. Assoc. Comput. Linguist., V5, P135, DOI [10.1162/tacla00051, DOI 10.1162/TACL_A_00051]
[10]  
Chen GB, 2017, IEEE IJCNN, P2377, DOI 10.1109/IJCNN.2017.7966144