A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play

被引：2201

作者：

Silver, David ^{[1
,2
]}

Hubert, Thomas ^{[1
]}

Schrittwieser, Julian ^{[1
]}

Antonoglou, Ioannis ^{[1
]}

Lai, Matthew ^{[1
]}

Guez, Arthur ^{[1
]}

Lanctot, Marc ^{[1
]}

Sifre, Laurent ^{[1
]}

Kumaran, Dharshan ^{[1
]}

Graepel, Thore ^{[1
]}

Lillicrap, Timothy ^{[1
]}

Simonyan, Karen ^{[1
]}

Hassabis, Demis ^{[1
]}

机构：

[1] DeepMind, 6 Pancras Sq, London N1C 4AG, England

[2] UCL, Gower St, London WC1E 6BT, England

来源：

SCIENCE | 2018年 / 362卷 / 6419期

关键词：

GAME;

D O I：

10.1126/science.aar6404

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

The game of chess is the longest-studied domain in the history of artificial intelligence. The strongest programs are based on a combination of sophisticated search techniques, domain-specific adaptations, and handcrafted evaluation functions that have been refined by human experts over several decades. By contrast, the AlphaGo Zero program recently achieved superhuman performance in the game of Go by reinforcement learning from self-play. In this paper, we generalize this approach into a single AlphaZero algorithm that can achieve superhuman performance in many challenging games. Starting from random play and given no domain knowledge except the game rules, AlphaZero convincingly defeated a world champion program in the games of chess and shogi (Japanese chess), as well as Go.

引用

页码：1140 / +

页数：30

共 40 条

[21] In-Datacenter Performance Analysis of a Tensor Processing Unit [J].

Jouppi, Norman P. ;

Young, Cliff ;

Patil, Nishant ;

Patterson, David ;

Agrawal, Gaurav ;

Bajwa, Raminder ;

Bates, Sarah ;

Bhatia, Suresh ;

Boden, Nan ;

Borchers, Al ;

Boyle, Rick ;

Cantin, Pierre-luc ;

Chao, Clifford ;

Clark, Chris ;

Coriell, Jeremy ;

Daley, Mike ;

Dau, Matt ;

Dean, Jeffrey ;

Gelb, Ben ;

Ghaemmaghami, Tara Vazir ;

Gottipati, Rajendra ;

Gulland, William ;

Hagmann, Robert ;

Ho, C. Richard ;

Hogberg, Doug ;

Hu, John ;

Hundt, Robert ;

Hurt, Dan ;

Ibarz, Julian ;

Jaffey, Aaron ;

Jaworski, Alek ;

Kaplan, Alexander ;

Khaitan, Harshit ;

Killebrew, Daniel ;

Koch, Andy ;

Kumar, Naveen ;

Lacy, Steve ;

Laudon, James ;

Law, James ;

Le, Diemthu ;

Leary, Chris ;

Liu, Zhuyuan ;

Lucke, Kyle ;

Lundin, Alan ;

MacKean, Gordon ;

Maggiore, Adriana ;

Mahony, Maire ;

Miller, Kieran ;

Nagarajan, Rahul ;

Narayanaswami, Ravi .

44TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2017), 2017, :1-12

[22]

Kaneko Tomoyuki, 2012, Advances in Computer Games. 13th International Conference, ACG 2011. Revised Selected Papers, P158, DOI 10.1007/978-3-642-31866-5_14

[23]

Knudsen J., 2000, ESSENTIAL CHESS QUOT

[24] ANALYSIS OF ALPHA-BETA PRUNING [J].

KNUTH, DE ;

MOORE, RW .

ARTIFICIAL INTELLIGENCE, 1975, 6 (04) :293-326

[25]

Lasker Emmanuel., 1965, COMMON SENSE CHESS

[26]

Levy D. N. L., 2009, COMPUTERS PLAY CHESS

[27]

Maddison C. J., 2015, INT C LEARN REPR 201

[28]

Marsland T., 1987, ENCY ARTIFICIAL INTE

[29] A strategic metagame player for general chess-like games [J].

Pell, B .

COMPUTATIONAL INTELLIGENCE, 1996, 12 (01) :177-198

[30]

Ramanujan R., 2010, P 26 C UNC ART INT U

← 1 2 3 4 →