Short-Text Semantic Similarity (STSS): Techniques, Challenges and Future Perspectives

被引:15
作者
Amur, Zaira Hassan [1 ]
Hooi, Yew Kwang [1 ]
Bhanbhro, Hina [1 ]
Dahri, Kamran [2 ]
Soomro, Gul Muhammad [3 ]
机构
[1] Univ Teknol PETRONAS, Dept Comp & Informat Sci, Seri Iskandar 32160, Malaysia
[2] Univ Sindh, Dept Informat Technol, Jamshoro 71000, Pakistan
[3] Tomas Bata Univ, Dept Artificial Intelligence, Zlin 76001, Czech Republic
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 06期
关键词
short text; semantic similarity; natural language processing; deep learning; STSS; NEURAL-NETWORK; MOVIE REVIEWS; BERT; CLASSIFICATION; IDENTIFICATION; TRENDS; MODEL;
D O I
10.3390/app13063911
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
In natural language processing, short-text semantic similarity (STSS) is a very prominent field. It has a significant impact on a broad range of applications, such as question-answering systems, information retrieval, entity recognition, text analytics, sentiment classification, and so on. Despite their widespread use, many traditional machine learning techniques are incapable of identifying the semantics of short text. Traditional methods are based on ontologies, knowledge graphs, and corpus-based methods. The performance of these methods is influenced by the manually defined rules. Applying such measures is still difficult, since it poses various semantic challenges. In the existing literature, the most recent advances in short-text semantic similarity (STSS) research are not included. This study presents the systematic literature review (SLR) with the aim to (i) explain short sentence barriers in semantic similarity, (ii) identify the most appropriate standard deep learning techniques for the semantics of a short text, (iii) classify the language models that produce high-level contextual semantic information, (iv) determine appropriate datasets that are only intended for short text, and (v) highlight research challenges and proposed future improvements. To the best of our knowledge, we have provided an in-depth, comprehensive, and systematic review of short text semantic similarity trends, which will assist the researchers to reuse and enhance the semantic information.
引用
收藏
页数:30
相关论文
共 146 条
  • [1] A deep network model for paraphrase detection in short text messages
    Agarwal, Basant
    Ramampiaro, Heri
    Langseth, Helge
    Ruocco, Massimiliano
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2018, 54 (06) : 922 - 937
  • [2] Deep Contextualized Pairwise Semantic Similarity for Arabic Language Questions
    Al-Bataineh, Hesham
    Farhan, Wael
    Mustafa, Ahmad
    Seelawi, Haitham
    Al-Natsheh, Hussein T.
    [J]. 2019 IEEE 31ST INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2019), 2019, : 1586 - 1591
  • [3] Paraphrase identification and semantic text similarity analysis in Arabic news tweets using lexical, syntactic, and semantic features
    Al-Smadi, Mohammad
    Jaradat, Zain
    Al-Ayyoub, Mahmoud
    Jararweh, Yaser
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2017, 53 (03) : 640 - 652
  • [4] BERT Models for Arabic Text Classification: A Systematic Review
    Alammary, Ali Saleh
    [J]. APPLIED SCIENCES-BASEL, 2022, 12 (11):
  • [5] Challenges of Short Sentence Writing Encountered by First-Year Saudi EFL Undergraduate Students
    Alsslami, Ahmed Ibrahim
    [J]. ARAB WORLD ENGLISH JOURNAL, 2022, 13 (01) : 534 - 549
  • [6] Amur Zaira Hassan, 2022, International Conference on Artificial Intelligence for Smart Community: AISC 2020. Lecture Notes in Electrical Engineering (758), P1033, DOI 10.1007/978-981-16-2183-3_98
  • [7] Amur Zaira Hassan, 2022, 2022 International Conference on Digital Transformation and Intelligence (ICDI), P1, DOI 10.1109/ICDI57181.2022.10007187
  • [8] Beltagy I., 2019, arXiv
  • [9] Birunda S. Selva, 2021, Innovative Data Communication Technologies and Application. Proceedings of ICIDCA 2020. Lecture Notes on Data Engineering and Communications Technologies (LNDECT 59), P267, DOI 10.1007/978-981-15-9651-3_23
  • [10] Bojanowski P., 2017, Trans. ACL, V5, P135, DOI [DOI 10.1162/TACLA00051, 10.1162/tacla00051, 10.1162/tacl_a_00051, DOI 10.1162/TACL_A_00051]