A Deep Learning Model Based on BERT and Sentence Transformer for Semantic Keyphrase Extraction on Big Social Data

被引：27

作者：

Devika, R. ^{[1
]}

Vairavasundaram, Subramaniyaswamy ^{[1
]}

Mahenthar, C. Sakthi Jay ^{[1
]}

Varadarajan, Vijayakumar ^{[2
]}

Kotecha, Ketan ^{[3
]}

机构：

[1] SASTRA Deemed Univ, Sch Comp, Thanjavur 613401, India

[2] Univ New South Wales, Sch Comp Sci & Engn, Kensington, NSW 2052, Australia

[3] Symbiosis Int Deemed Univ, Symbiosis Ctr Appl Artificial Intelligence, Pune 412115, Maharashtra, India

来源：

IEEE ACCESS | 2021年 / 9卷

关键词：

Transformers; Feature extraction; Data mining; Bit error rate; Social networking (online); Task analysis; Blogs; Attention layer; BERT; deep learning; keyphrase extraction; social data;

D O I：

10.1109/ACCESS.2021.3133651

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In the evolution of the Internet, social media platform like Twitter has permitted the public user to share information such as famous current affairs, events, opinions, news, and experiences. Extracting and analyzing keyphrases in Twitter content is an essential and challenging task. Keyphrases can become precise the main contribution of Twitter content as well as it is a vital issue in vast Natural Language Processing (NLP) application. Extracting keyphrases is not only a time-consuming process but also requires much effort. The current works are on graph-based models or machine learning models. The performance of these models relies on feature extraction or statistical measures. In recent year, the application of deep learning algorithms to Twitter data have more insight due to automatic feature extraction can improve the performance of several tasks. This work aims to extract the keyphrase from Big social data using a sentence transformer with Bidirectional Encoder Representation Transformers (BERT) deep learning model. This BERT representation retains semantic and syntactic connectivity between tweets, enhancing performance in every NLP task on large data sets. It can automatically extract the most typical phrases in the Tweets. The proposed Semkey-BERT model shows that BERT with sentence transformer accuracy of 86% is higher than the other existing models.

引用

页码：165252 / 165261

页数：10

共 35 条

[1]

[Anonymous], 2012, P 7 WORKSHOP BUILDIN

[2]

Asrori R. B., 2020, P INT SEM APPL TECHN, P185, DOI [10.1109/iSemantic50169.2020.9234231, DOI 10.1109/ISEMANTIC50169.2020.9234231]

[3]

Bougouin Adrien, 2013, IJCNLP, P543

[4] A Comparison of Supervised Keyphrase Extraction Models [J].

Bulgarov, Florin ;

Caragea, Cornelia .

WWW'15 COMPANION: PROCEEDINGS OF THE 24TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB, 2015, :13-14

[5]

Clark Kevin, 2019, INT C LEARN REPR ICL

[6]

Collobert R., 2008, P 25 INT C MACH LEAR, P160, DOI DOI 10.1145/1390156.1390177.ICML08

[7]

Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171

[8]

Gupta S., 2019, PROC 5 INT JOINT C N, P1

[9]

Hammouda KM, 2005, LECT NOTES ARTIF INT, V3587, P265

[10]

Hasan KS, 2014, PROCEEDINGS OF THE 52ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, P1262

← 1 2 3 4 →