共 38 条
- [1] [Anonymous], 1993, MACHINE LEARNING MET
- [2] [Anonymous], 2010, P 6 INT WIRELESS COM, DOI DOI 10.1145/1815396.1815448
- [3] Bellman R. E., 1957, Dynamic programming. Princeton landmarks in mathematics
- [4] A MINIMUM-TIME TRAJECTORY PLANNING METHOD FOR 2 ROBOTS [J]. IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION, 1992, 8 (03): : 414 - 418
- [5] Busoniu L, 2010, AUTOM CONTROL ENG SE, P1, DOI 10.1201/9781439821091-f
- [7] Hybrid Q-learning Algorithm About Cooperation in MAS [J]. CCDC 2009: 21ST CHINESE CONTROL AND DECISION CONFERENCE, VOLS 1-6, PROCEEDINGS, 2009, : 3943 - 3947
- [8] Chen Z, 2011, IEEE SOUTHEASTCON, P409, DOI 10.1109/SECON.2011.5752976
- [9] A production technique for a Q-table with an influence map for speeding up Q-learning [J]. 2007 INTERNATIONAL CONFERENCE ON INTELLIGENT PERVASIVE COMPUTING, PROCEEDINGS, 2007, : 72 - +
- [10] The knowledge gradient policy for offline learning with independent normal rewards [J]. 2007 IEEE INTERNATIONAL SYMPOSIUM ON APPROXIMATE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING, 2007, : 143 - +