Performance Comparison of Transformer-Based Models on Twitter Health Mention Classification

被引：9

作者：

Khan, Pervaiz Iqbal ^{[1
,2
]}

Razzak, Imran ^{[3
]}

Dengel, Andreas ^{[1
,2
]}

Ahmed, Sheraz ^{[1
]}

机构：

[1] German Res Ctr Artificial Intelligence DFKI, D-67663 Kaiserslautern, Germany

[2] TU Kaiserslautern, Dept Comp Sci, D-67663 Kaiserslautern, Germany

[3] Deakin Univ, Sch Informat Technol, Geelong, Vic 2600, Australia

来源：

IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS | 2023年 / 10卷 / 03期

关键词：

Diseases; Transformers; Task analysis; Social networking (online); Blogs; Computational modeling; Predictive models; Health mention classification; public health surveillance (PHS); tweet classification;

D O I：

10.1109/TCSS.2022.3143768

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Health mention classification classifies a given piece of text as a health mention or not. However, figurative usage of disease words makes the classification task challenging. To address this challenge, consideration of emojis and surrounding words of the disease names in the text can be helpful. Transformer-based methods are better at capturing the meaning of a word based on its surrounding words compared to traditional methods. However, there are numerous transformer-based methods available and pretrained on natural language processing (NLP) data that are inherently different from Twitter data. Moreover, the size of these models varies in terms of the number of parameters. Hence, it is challenging to decide and choose one of these methods for fine-tuning it on the downstream tasks such as tweet classification. In this work, we experiment with nine widely used transformer methods and compare their performance on the personal health mention classification of tweet data. Furthermore, we analyze the impact of model size on the classification task and provide a brief interpretation of the classification decision made by the best performing classifier. Experimental results show that RoBERTa outperforms all other models by achieving an F1 score of 93%, while two other models perform similarly by achieving an F1 score of 92.5%.

引用

页码：1140 / 1149

页数：10

共 36 条

[1]

Baccianella S, 2010, LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION

[2] Leveraging Sentiment Distributions to Distinguish Figurative From Literal Health Reports on Twitter [J].

Biddle, Rhys ;

Joshi, Aditya ;

Liu, Shaowu ;

Paris, Cecile ;

Xu, Guandong .

WEB CONFERENCE 2020: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2020), 2020, :1217-1227

[3]

Brum H. B., 2017, ARXIV171208917

[4]

Clark K., 2020, P 8 INT C LEARNING R, P1

[5]

Nguyen DQ, 2020, PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING: SYSTEM DEMONSTRATIONS, P9

[6] Leveraging Emoji to Improve Sentiment Classification of Tweets [J].

de Barros, Tiago Martinho ;

Pedrini, Helio ;

Dias, Zanoni .

36TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2021, 2021, :845-852

[7]

Devlin J., 2018, NAACLHLT

[8] Explaining Explanations: An Overview of Interpretability of Machine Learning [J].

Gilpin, Leilani H. ;

Bau, David ;

Yuan, Ben Z. ;

Bajwa, Ayesha ;

Specter, Michael ;

Kagal, Lalana .

2018 IEEE 5TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA), 2018, :80-89

[9]

Graves A, 2013, INT CONF ACOUST SPEE, P6645, DOI 10.1109/ICASSP.2013.6638947

[10]

He Pengcheng, 2021, INT C LEARNING REPRE

← 1 2 3 4 →