Investigating Memorization of Conspiracy Theories in Text Generation

被引：0

作者：

Levy, Sharon ^{[1
]}

Saxon, Michael ^{[1
]}

Wang, William Yang ^{[1
]}

机构：

[1] Univ Calif Santa Barbara, Santa Barbara, CA 93106 USA

来源：

FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021 | 2021年

基金：

美国国家科学基金会;

关键词：

EXPOSURE; SCIENCE; BELIEF;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The adoption of natural language generation (NLG) models can leave individuals vulnerable to the generation of harmful information memorized by the models, such as conspiracy theories. While previous studies examine conspiracy theories in the context of social media, they have not evaluated their presence in the new space of generative language models. In this work, we investigate the capability of language models to generate conspiracy theory text. Specifically, we aim to answer: can we test pretrained generative language models for the memorization and elicitation of conspiracy theories without access to the model's training data? We highlight the difficulties of this task and discuss it in the context of memorization, generalization, and hallucination. Utilizing a new dataset consisting of conspiracy theory topics and machine-generated conspiracy theories helps us discover that many conspiracy theories are deeply rooted in the pretrained language models. Our experiments demonstrate a relationship between model parameters such as size and temperature and their propensity to generate conspiracy theory text. These results indicate the need for a more thorough review of NLG applications before release and an in-depth discussion of the drawbacks of memorization in generative language models.

引用

页码：4718 / 4729

页数：12

共 49 条

[1] COVID-19 and the 5G Conspiracy Theory: Social Network Analysis of Twitter Data [J].

Ahmed, Wasim ;

Vidal-Alaball, Josep ;

Downing, Joseph ;

Lopez Segui, Francesc .

JOURNAL OF MEDICAL INTERNET RESEARCH, 2020, 22 (05)

[2]

[Anonymous], 2018, PMLR

[3] On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? [J].

Bender, Emily M. ;

Gebru, Timnit ;

McMillan-Major, Angelina ;

Shmitchell, Shmargaret .

PROCEEDINGS OF THE 2021 ACM CONFERENCE ON FAIRNESS, ACCOUNTABILITY, AND TRANSPARENCY, FACCT 2021, 2021, :610-623

[4] Science vs Conspiracy: Collective Narratives in the Age of Misinformation [J].

Bessi, Alessandro ;

Coletto, Mauro ;

Davidescu, George Alexandru ;

Scala, Antonio ;

Caldarelli, Guido ;

Quattrociocchi, Walter .

PLOS ONE, 2015, 10 (02)

[5]

Bizony Piers, 2009, Engineering & Technology, V4, P24, DOI 10.1049/et.2009.1202

[6]

Brown TB, 2020, ADV NEUR IN, V33

[7]

Buhrmester Michael, 2016, Amazon's mechanical turk: a new source of inexpensive, yet high-quality data?

[8]

Carlini N., 2020, ARXIV201207805

[9]

Carlini N, 2019, PROCEEDINGS OF THE 28TH USENIX SECURITY SYMPOSIUM, P267

[10]

Chakrabarty Tuhin, 2018, P 1 WORKSH FACT EXTR, P127

← 1 2 3 4 5 →