共 43 条
- [1] Achiam OJ, 2023, Arxiv, DOI [arXiv:2303.08774, DOI 10.48550/ARXIV.2303.08774]
- [2] Bang Y, 2023, Arxiv, DOI arXiv:2302.04023
- [4] Brown TB, 2020, ARXIV, DOI DOI 10.48550/ARXIV.2005.14165
- [5] Chen Mark, 2021, arXiv, DOI DOI 10.48550/ARXIV.2107.03374
- [6] Chen WH, 2023, Arxiv, DOI arXiv:2211.12588
- [7] Chen X., 2023, arXiv, DOI [10.48550/arXiv.2303.00293, DOI 10.48550/ARXIV.2303.00293]
- [8] Shao CC, 2019, Arxiv, DOI arXiv:1806.00920
- [9] Christiano P.F., 2017, Deep reinforcement learning from human preferences, P4299
- [10] Cui YM, 2019, 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019), P5883