Towards Understanding the Interplay of Generative Artificial Intelligence and the Internet

被引:4
作者
Martinez, Gonzalo [1 ]
Watson, Lauren [2 ]
Revirieg, Pedro [3 ]
Alberto Hernandez, Jose [1 ]
Juare, Marc [2 ]
Sarka, Rik [2 ]
机构
[1] Univ Carlos III Madrid, Madrid, Spain
[2] Univ Edinburgh, Sch Informat, Edinburgh, Midlothian, Scotland
[3] Univ Politecn Madrid, Madrid, Spain
来源
EPISTEMIC UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, EPI UAI 2023 | 2024年 / 14523卷
关键词
Generative AI; Internet; Degeneration;
D O I
10.1007/978-3-031-57963-9_5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The rapid adoption of generative Artificial Intelligence (AI) tools that can generate realistic images or text, such as DALL-E, Mid-Journey, or ChatGPT, have put the societal impacts of these technologies at the center of public debate. These tools are possible due to the massive amount of data (text and images) that is publicly available through the Internet. At the same time, these generative AI tools become content creators that are already contributing to the data that is available to train future models. Therefore, future versions of generative AI tools will be trained with a mix of human-created and AI-generated content, causing a potential feedback loop between generative AI and public data repositories. This interaction raises many questions: how will future versions of generative AI tools behave when trained on a mixture of real and AI-generated data? Will they evolve and improve with the new data sets or on the contrary will they degrade? Will evolution introduce biases or reduce diversity in subsequent generations of generative AI tools? What are the societal implications of the possible degradation of these models? Can we mitigate the effects of this feedback loop? In this work, we explore the effect of this interaction and report some initial results using simple diffusion models trained with various image datasets. Our results show that the quality and diversity of the generated images can degrade over time suggesting that incorporating AI-created data can have undesired effects on future versions of generative models.
引用
收藏
页码:59 / 73
页数:15
相关论文
共 36 条
[1]  
Azizi Shekoofeh, 2023, arXiv
[2]   A Systematic Review on Data Scarcity Problem in Deep Learning: Solution and Applications [J].
Bansal, Aayushi ;
Sharma, Rewa ;
Kathuria, Mamta .
ACM COMPUTING SURVEYS, 2022, 54 (10S)
[3]  
Corvi Riccardo, 2023, Intriguing properties of synthetic images: from generative adversarial networks to diffusion models
[4]  
Doyle B. Francis, 1992, Feedback Control Theory
[5]   Generative Adversarial Networks-Based Data Augmentation for Brain-Computer Interface [J].
Fahimi, Fatemeh ;
Dosen, Strahinja ;
Ang, Kai Keng ;
Mrachacz-Kersting, Natalie ;
Guan, Cuntai .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (09) :4039-4051
[6]   Generative Adversarial Networks in Al-Enabled Safety Critical Systems: Friend or Foe? [J].
Fournaris, Apostolos P. ;
Lalos, Aris S. ;
Serpanos, Dimitrios .
COMPUTER, 2019, 52 (09) :78-81
[7]   Fundamental Technologies in Modern Speech Recognition [J].
Furui, Sadaoki ;
Deng, Li ;
Gales, Mark ;
Ney, Hermann ;
Tokuda, Keiichi .
IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) :16-17
[8]  
Gozalo-Brizuela R, 2023, arXiv, DOI DOI 10.48550/ARXIV.2301.04655
[9]  
Hataya R, 2023, Arxiv, DOI arXiv:2211.08095
[10]  
Heusel M, 2017, ADV NEUR IN, V30