Artificial intelligence versus Maya Angelou: Experimental evidence that people cannot differentiate AI-generated from human-written poetry

被引:148
作者
Kobis, Nils [1 ,2 ,3 ]
Mossink, Luca D. [1 ,2 ]
机构
[1] Univ Amsterdam, Dept Econ, Amsterdam, Netherlands
[2] Univ Amsterdam, Ctr Expt Econ & Polit Decis Making CREED, Amsterdam, Netherlands
[3] Max Planck Inst Human Dev, Ctr Humans & Machines, Berlin, Germany
基金
欧洲研究理事会;
关键词
Natural language generation; Computational creativity; Turing; Test; Creativity; Machine behavior; ACCOUNTABILITY; TRANSPARENCY; PSYCHOLOGY; ALGORITHMS;
D O I
10.1016/j.chb.2020.106553
中图分类号
B84 [心理学];
学科分类号
04 ; 0402 ;
摘要
The release of openly available, robust natural language generation algorithms (NLG) has spurred much public attention and debate. One reason lies in the algorithms' purported ability to generate humanlike text across various domains. Empirical evidence using incentivized tasks to assess whether people (a) can distinguish and (b) prefer algorithm-generated versus human-written text is lacking. We conducted two experiments assessing behavioral reactions to the state-of-the-art Natural Language Generation algorithm GPT-2 (Ntotal = 830). Using the identical starting lines of human poems, GPT-2 produced samples of poems. From these samples, either a random poem was chosen (Human-out-of-theloop) or the best one was selected (Human-in-the-loop) and in turn matched with a human-written poem. In a new incentivized version of the Turing Test, participants failed to reliably detect the algorithmically generated poems in the Human-in-the-loop treatment, yet succeeded in the Human-out-of-the-loop treatment. Further, people reveal a slight aversion to algorithm-generated poetry, independent on whether participants were informed about the algorithmic origin of the poem (Transparency) or not (Opacity). We discuss what these results convey about the performance of NLG algorithms to produce human-like text and propose methodologies to study such learning algorithms in human-agent experimental settings.
引用
收藏
页数:13
相关论文
共 69 条
  • [11] A systematic review of algorithm aversion in augmented decision making
    Burton, Jason W.
    Stein, Mari-Klara
    Jensen, Tina Blegind
    [J]. JOURNAL OF BEHAVIORAL DECISION MAKING, 2020, 33 (02) : 220 - 239
  • [12] Camerer C. F., 2003, Behavioral game theory: Experiments in strategic interaction
  • [13] THE ROBOTIC REPORTER Automated journalism and the redefinition of labor, compositional forms, and journalistic authority
    Carlson, Matt
    [J]. DIGITAL JOURNALISM, 2015, 3 (03) : 416 - 431
  • [14] Task-Dependent Algorithm Aversion
    Castelo, Noah
    Bos, Maarten W.
    Lehmann, Donald R.
    [J]. JOURNAL OF MARKETING RESEARCH, 2019, 56 (05) : 809 - 825
  • [15] ENTER THE ROBOT JOURNALIST Users' perceptions of automated content
    Clerwall, Christer
    [J]. JOURNALISM PRACTICE, 2014, 8 (05) : 519 - 531
  • [16] Craglia M., 2018, Eur.Commission Joint Res. Centre, Sci. Policy
  • [17] There is a blind spot in AI research
    Crawford, Kate
    Calo, Ryan
    [J]. NATURE, 2016, 538 (7625) : 311 - 313
  • [18] Dafoe A, 2017, AI Governance: A Research Agenda' in Governance ofAI Program, P1, DOI 10.1176/ajp.134.8.aj1348938
  • [19] ALGORITHMIC TRANSPARENCY IN THE NEWS MEDIA
    Diakopoulos, Nicholas
    Koliska, Michael
    [J]. DIGITAL JOURNALISM, 2017, 5 (07) : 809 - 828
  • [20] Accountability in Algorithmic Decision Making
    Diakopoulos, Nicholas
    [J]. COMMUNICATIONS OF THE ACM, 2016, 59 (02) : 56 - 62