Using a Three-step Social Media Similarity (TSMS) Mapping Method to Analyze Controversial Speech Relating to COVID-19 in Twitter Collections

被引：5

作者：

Yin, Zhanyuan ^{[1
,2
]}

Fan, Lizhou ^{[3
]}

Yu, Huizi ^{[2
,4
]}

Gilliland, Anne J. ^{[5
]}

机构：

[1] Univ Calif Los Angeles, Dept Math, Los Angeles, CA 90024 USA

[2] Univ Calif Los Angeles, Dept Econ, Los Angeles, CA 90024 USA

[3] Univ Calif Los Angeles, Program Digital Humanities, Los Angeles, CA USA

[4] Univ Calif Los Angeles, Dept Stat, Los Angeles, CA USA

[5] Univ Calif Los Angeles, Dept Informat Studies, Los Angeles, CA USA

来源：

2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA) | 2020年

关键词：

archival appraisal; archival description; COVID-19; social media; text mining;

D O I：

10.1109/BigData50022.2020.9377930

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Addressing increasing calls to surface hidden and counter-narratives from within archival collections, this paper reports on a study that provides proof-of-concept of automatic methods that could be used on archived social media collections. Using a test collection of 3,457,434 unique tweets relating to COVID-19, China and Chinese people, it sought to identify instances of Hate Speech as well as hard-to-pinpoint trends in anti-Chinese racist sentiment. The study, part of a larger archival research effort investigating automatic methods for appraisal and description of very large digital archival collections, used a Three-step Social Media Similarity (TSMS) mapping method that aggregates hashtag mapping, TF-IDF Similarity Selection, and Emotion Similarity Calculation on the test collection. Compared to using a purely lexicon-based method to identify and analyze controversial speech, this method successfully expanded the amount of controversial contents detected from 21,050 tweets to 212,605, and the detection rate from 0.6% to 6.1%. We argue that the TSMS method could be similarly applied by archives in automatically identifying, analyzing, describing other controversial content on social media and in other rapidly evolving and complex contexts in order to increase public awareness and facilitate public policy responses.

引用

页码：1949 / 1953

页数：5

共 13 条

[1]

[Anonymous], 2008, Introduction to information retrieval

[2] Districted matching approach for 1D object classification [J].

Chen, L ;

Nilufar, S ;

Kwan, HK .

PROCEEDINGS OF THE 2004 INTERNATIONAL SYMPOSIUM ON INTELLIGENT MULTIMEDIA, VIDEO AND SPEECH PROCESSING, 2004, :206-209

[3]

DiNucci D., 1999, PRINT, V53, P32

[4]

Ernst Wolfgang., 2013, Digital Memory and the Archive

[5] Stigmatization in social media: Documenting and analyzing hate speech for COVID-19 on Twitter [J].

Fan L. ;

Yu H. ;

Yin Z. .

Proceedings of the Association for Information Science and Technology, 2020, 57 (01)

[6]

Lyu Hanjia, 2020, ARXIV200406307

[7] CROWDSOURCING A WORD-EMOTION ASSOCIATION LEXICON [J].

Mohammad, Saif M. ;

Turney, Peter D. .

COMPUTATIONAL INTELLIGENCE, 2013, 29 (03) :436-465

[8] LTIX: A Compact Level-based Tree to Index XML Databases [J].

Mohammad, Samir ;

Martin, Patrick .

PROCEEDINGS OF THE FOURTEENTH INTERNATIONAL DATABASE ENGINEERING & APPLICATIONS SYMPOSIUM (IDEAS '10), 2010, :21-25

[9]

Orasan C., 2004, LREC

[10]

Pei X., 2020, ARXIV200508224V1

← 1 2 →