Scalable photonic reinforcement learning by time-division multiplexing of laser chaos

被引:44
作者
Naruse, Makoto [1 ]
Mihana, Takatomo [2 ]
Hori, Hirokazu [3 ]
Saigo, Hayato [4 ]
Okamura, Kazuya [5 ]
Hasegawa, Mikio [6 ]
Uchida, Atsushi [2 ]
机构
[1] Natl Inst Informat & Commun Technol, Network Syst Res Inst, 4-2-1 Nukui Kita, Koganei, Tokyo 1848795, Japan
[2] Saitama Univ, Dept Informat & Comp Sci, Sakura Ku, 255 Shimo Okubo, Saitama, Saitama 3388570, Japan
[3] Univ Yamanashi, Interdisciplinary Grad Sch, Kofu, Yamanashi 4008510, Japan
[4] Nagahama Inst Biosci & Technol, 1266 Tamura, Nagahama, Shiga 5260829, Japan
[5] Nagoya Univ, Grad Sch Informat, Chikusa Ku, Nagoya, Aichi 4648601, Japan
[6] Tokyo Univ Sci, Dept Elect Engn, 6-3-1 Niijuku, Tokyo 1258585, Japan
来源
SCIENTIFIC REPORTS | 2018年 / 8卷
基金
日本学术振兴会; 日本科学技术振兴机构;
关键词
SINGLE-PHOTON; IMPLEMENTATION; OPTIMIZATION; ALGORITHM;
D O I
10.1038/s41598-018-29117-y
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Reinforcement learning involves decision-making in dynamic and uncertain environments and constitutes a crucial element of artificial intelligence. In our previous work, we experimentally demonstrated that the ultrafast chaotic oscillatory dynamics of lasers can be used to efficiently solve the two-armed bandit problem, which requires decision-making concerning a class of difficult trade-offs called the exploration-exploitation dilemma. However, only two selections were employed in that research; hence, the scalability of the laser-chaos-based reinforcement learning should be clarified. In this study, we demonstrated a scalable, pipelined principle of resolving the multi-armed bandit problem by introducing time-division multiplexing of chaotically oscillated ultrafast time series. The experimental demonstrations in which bandit problems with up to 64 arms were successfully solved are presented where laser chaos time series significantly outperforms quasiperiodic signals, computer-generated pseudorandom numbers, and coloured noise. Detailed analyses are also provided that include performance comparisons among laser chaos signals generated in different physical conditions, which coincide with the diffusivity inherent in the time series. This study paves the way for ultrafast reinforcement learning by taking advantage of the ultrahigh bandwidths of light wave and practical enabling technologies.
引用
收藏
页数:16
相关论文
共 31 条
[1]   Implementation of 140 Gb/s true random bit generator based on a chaotic photonic integrated circuit [J].
Argyris, Apostolos ;
Deligiannidis, Stavros ;
Pikasis, Evangelos ;
Bogris, Adonis ;
Syvridis, Dimitris .
OPTICS EXPRESS, 2010, 18 (18) :18763-18768
[2]   Finite-time analysis of the multiarmed bandit problem [J].
Auer, P ;
Cesa-Bianchi, N ;
Fischer, P .
MACHINE LEARNING, 2002, 47 (2-3) :235-256
[3]   Online linear optimization and adaptive routing [J].
Awerbuch, Baruch ;
Kleinberg, Robert .
JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2008, 74 (01) :97-114
[4]   Parallel photonic information processing at gigabyte per second data rates using transient states [J].
Brunner, Daniel ;
Soriano, Miguel C. ;
Mirasso, Claudio R. ;
Fischer, Ingo .
NATURE COMMUNICATIONS, 2013, 4
[5]   Cortical substrates for exploratory decisions in humans [J].
Daw, Nathaniel D. ;
O'Doherty, John P. ;
Dayan, Peter ;
Seymour, Ben ;
Dolan, Raymond J. .
NATURE, 2006, 441 (7095) :876-879
[6]   Invited Review Article: Single-photon sources and detectors [J].
Eisaman, M. D. ;
Fan, J. ;
Migdall, A. ;
Polyakov, S. V. .
REVIEW OF SCIENTIFIC INSTRUMENTS, 2011, 82 (07)
[7]   FAST, ACCURATE ALGORITHM FOR NUMERICAL-SIMULATION OF EXPONENTIALLY CORRELATED COLORED NOISE [J].
FOX, RF ;
GATLAND, IR ;
ROY, R ;
VEMURI, G .
PHYSICAL REVIEW A, 1988, 38 (11) :5938-5940
[8]  
Gentle JE, 2009, STAT COMPUT SER, P3
[9]   Optically interconnected parallel computing systems [J].
Ishikawa, M ;
McArdle, N .
COMPUTER, 1998, 31 (02) :61-+
[10]  
KIM SJ, 2013, SCI REP UK, V3