An aspect-level sentiment analysis dataset for therapies on Twitter

被引:8
作者
Guo, Yuting [1 ]
Das, Sudeshna [1 ]
Lakamana, Sahithi [1 ]
Sarker, Abeed [1 ]
机构
[1] Emory Univ, Atlanta, GA 30322 USA
来源
DATA IN BRIEF | 2023年 / 50卷
基金
美国国家卫生研究院;
关键词
Text classification; Sentiment analysis; Therapy; Natural language processing; Machine learning; Biomedical informatics;
D O I
10.1016/j.dib.2023.109618
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The dataset described is an aspect-level sentiment analysis dataset for therapies, including medication, behavioral and other therapies, created by leveraging user-generated text from Twitter. The dataset was constructed by collecting Twitter posts using keywords associated with the therapies (often referred to as treatments). Subsequently, subsets of the collected posts were manually reviewed, and annotation guide-lines were developed to categorize the posts as positive, negative, or neutral.The dataset contains a total of 5364 posts mentioning 32 therapies. These posts are further categorized manually into 998 (18.6%) positive, 619 (11.5%) negatives, and 3747 (69.9%) neutral sentiments. The inter-annotation agreement for the dataset was evaluated using Cohen's Kappa score, achieving an 0.82 score. The potential use of this dataset lies in the development of automatic systems that can detect users' sentiments to-ward therapies based on their posts. While there are other sentiment analysis datasets available, this is the first that encodes sentiments associated with specific therapies. Re-searchers and developers can utilize this dataset to train sentiment analysis models, natural language processing algorithms, or machine learning systems to accurately identify and analyze the sentiments expressed by consumers on so-cial media platforms like Twitter.
引用
收藏
页数:5
相关论文
共 3 条
  • [1] MEASUREMENT OF OBSERVER AGREEMENT FOR CATEGORICAL DATA
    LANDIS, JR
    KOCH, GG
    [J]. BIOMETRICS, 1977, 33 (01) : 159 - 174
  • [2] Twitter Developer T, 2021, Twitter
  • [3] Viera AJ, 2005, FAM MED, V37, P360