Short-Text Semantic Similarity (STSS): Techniques, Challenges and Future Perspectives

被引：19

作者：

Amur, Zaira Hassan ^{[1
]}

Hooi, Yew Kwang ^{[1
]}

Bhanbhro, Hina ^{[1
]}

Dahri, Kamran ^{[2
]}

Soomro, Gul Muhammad ^{[3
]}

机构：

[1] Univ Teknol PETRONAS, Dept Comp & Informat Sci, Seri Iskandar 32160, Malaysia

[2] Univ Sindh, Dept Informat Technol, Jamshoro 71000, Pakistan

[3] Tomas Bata Univ, Dept Artificial Intelligence, Zlin 76001, Czech Republic

来源：

APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 06期

关键词：

short text; semantic similarity; natural language processing; deep learning; STSS; NEURAL-NETWORK; MOVIE REVIEWS; BERT; CLASSIFICATION; IDENTIFICATION; TRENDS; MODEL;

D O I：

10.3390/app13063911

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

In natural language processing, short-text semantic similarity (STSS) is a very prominent field. It has a significant impact on a broad range of applications, such as question-answering systems, information retrieval, entity recognition, text analytics, sentiment classification, and so on. Despite their widespread use, many traditional machine learning techniques are incapable of identifying the semantics of short text. Traditional methods are based on ontologies, knowledge graphs, and corpus-based methods. The performance of these methods is influenced by the manually defined rules. Applying such measures is still difficult, since it poses various semantic challenges. In the existing literature, the most recent advances in short-text semantic similarity (STSS) research are not included. This study presents the systematic literature review (SLR) with the aim to (i) explain short sentence barriers in semantic similarity, (ii) identify the most appropriate standard deep learning techniques for the semantics of a short text, (iii) classify the language models that produce high-level contextual semantic information, (iv) determine appropriate datasets that are only intended for short text, and (v) highlight research challenges and proposed future improvements. To the best of our knowledge, we have provided an in-depth, comprehensive, and systematic review of short text semantic similarity trends, which will assist the researchers to reuse and enhance the semantic information.

引用

页数：30

共 146 条

[1] A deep network model for paraphrase detection in short text messages [J].

Agarwal, Basant ;

Ramampiaro, Heri ;

Langseth, Helge ;

Ruocco, Massimiliano .

INFORMATION PROCESSING & MANAGEMENT, 2018, 54 (06) :922-937

[2] Deep Contextualized Pairwise Semantic Similarity for Arabic Language Questions [J].

Al-Bataineh, Hesham ;

Farhan, Wael ;

Mustafa, Ahmad ;

Seelawi, Haitham ;

Al-Natsheh, Hussein T. .

2019 IEEE 31ST INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2019), 2019, :1586-1591

[3] Paraphrase identification and semantic text similarity analysis in Arabic news tweets using lexical, syntactic, and semantic features [J].

Al-Smadi, Mohammad ;

Jaradat, Zain ;

Al-Ayyoub, Mahmoud ;

Jararweh, Yaser .

INFORMATION PROCESSING & MANAGEMENT, 2017, 53 (03) :640-652

[4] BERT Models for Arabic Text Classification: A Systematic Review [J].

Alammary, Ali Saleh .

APPLIED SCIENCES-BASEL, 2022, 12 (11)

[5] Challenges of Short Sentence Writing Encountered by First-Year Saudi EFL Undergraduate Students [J].

Alsslami, Ahmed Ibrahim .

ARAB WORLD ENGLISH JOURNAL, 2022, 13 (01) :534-549

[6]

Amur Zaira Hassan, 2022, International Conference on Artificial Intelligence for Smart Community: AISC 2020. Lecture Notes in Electrical Engineering (758), P1033, DOI 10.1007/978-981-16-2183-3_98

[7]

Amur Zaira Hassan, 2022, 2022 International Conference on Digital Transformation and Intelligence (ICDI), P1, DOI 10.1109/ICDI57181.2022.10007187

[8]

[Anonymous], 2011, P 49 ANN M ASS COMP

[9]

Beltagy I, 2019, ARXIV

[10]

Birunda S. Selva, 2021, Innovative Data Communication Technologies and Application. Proceedings of ICIDCA 2020. Lecture Notes on Data Engineering and Communications Technologies (LNDECT 59), P267, DOI 10.1007/978-981-15-9651-3_23

← 1 2 3 4 5 6 7 8 9 10 →