Towards Understanding the Interplay of Generative Artificial Intelligence and the Internet

被引：4

作者：

Martinez, Gonzalo ^{[1
]}

Watson, Lauren ^{[2
]}

Revirieg, Pedro ^{[3
]}

Alberto Hernandez, Jose ^{[1
]}

Juare, Marc ^{[2
]}

Sarka, Rik ^{[2
]}

机构：

[1] Univ Carlos III Madrid, Madrid, Spain

[2] Univ Edinburgh, Sch Informat, Edinburgh, Midlothian, Scotland

[3] Univ Politecn Madrid, Madrid, Spain

来源：

EPISTEMIC UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, EPI UAI 2023 | 2024年 / 14523卷

关键词：

Generative AI; Internet; Degeneration;

D O I：

10.1007/978-3-031-57963-9_5

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The rapid adoption of generative Artificial Intelligence (AI) tools that can generate realistic images or text, such as DALL-E, Mid-Journey, or ChatGPT, have put the societal impacts of these technologies at the center of public debate. These tools are possible due to the massive amount of data (text and images) that is publicly available through the Internet. At the same time, these generative AI tools become content creators that are already contributing to the data that is available to train future models. Therefore, future versions of generative AI tools will be trained with a mix of human-created and AI-generated content, causing a potential feedback loop between generative AI and public data repositories. This interaction raises many questions: how will future versions of generative AI tools behave when trained on a mixture of real and AI-generated data? Will they evolve and improve with the new data sets or on the contrary will they degrade? Will evolution introduce biases or reduce diversity in subsequent generations of generative AI tools? What are the societal implications of the possible degradation of these models? Can we mitigate the effects of this feedback loop? In this work, we explore the effect of this interaction and report some initial results using simple diffusion models trained with various image datasets. Our results show that the quality and diversity of the generated images can degrade over time suggesting that incorporating AI-created data can have undesired effects on future versions of generative models.

引用

页码：59 / 73

页数：15

共 36 条

[1]

Azizi S., 2023, arXiv

[2] A Systematic Review on Data Scarcity Problem in Deep Learning: Solution and Applications [J].