共 57 条
[52]
Sutton RS, 2018, ADAPT COMPUT MACH LE, P1
[54]
Watkins C. J. C. H, 1989, Learning from delayed rewards
[55]
WATKINS CJCH, 1992, MACH LEARN, V8, P279, DOI 10.1007/BF00992698
[56]
Wilcox D, 2014, HIERARCHICAL CAUSALI, DOI DOI 10.2139/SSRN.2544327
[57]
Winker P., 2007, J ECON INTERACT COOR, V2, P125, DOI DOI 10.1007/S11403-007-0020-4