Preference-based online learning with dueling bandits: A survey

被引:0
作者
Bengs, Viktor [1 ]
Busa-Fekete, Robert [2 ]
Mesaoudi-Paul, Adil El [1 ]
Hullermeier, Eyke [1 ]
机构
[1] Heinz Nixdorf Institute, Department of Computer Science, Paderborn University, Germany
[2] Google Research, New York,NY, United States
关键词
D O I
暂无
中图分类号
学科分类号
摘要
引用
收藏
相关论文
共 50 条
[21]   Active Preference-Based Learning of Reward Functions [J].
Sadigh, Dorsa ;
Dragan, Anca D. ;
Sastry, Shankar ;
Seshia, Sanjit A. .
ROBOTICS: SCIENCE AND SYSTEMS XIII, 2017,
[22]   Learning solution similarity in preference-based CBR [J].
Abdel-Aziz, Amira ;
Strickert, Marc ;
Hüllermeier, Eyke .
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2014, 8765 :17-31
[23]   Versatile Dueling Bandits: Best-of-both World Analyses for Online Learning from Relative Preferences [J].
Saha, Aadirupa ;
Gaillard, Pierre .
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022, :19011-19026
[24]   Inverse Preference Learning: Preference-based RL without a Reward Function [J].
Hejna, Joey ;
Sadigh, Dorsa .
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[25]   Online Certification of Preference-Based Fairness for Personalized Recommender Systems [J].
Do, Virginie ;
Corbett-Davies, Sam ;
Atif, Jamal ;
Usunier, Nicolas .
THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, :6532-6540
[26]   Online Rank Elicitation for Plackett-Luce: A Dueling Bandits Approach [J].
Szorenyi, Balazs ;
Busa-Fekete, Robert ;
Paul, Adil ;
Huellermeier, Eyke .
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
[27]   A Generalized Acquisition Function for Preference-based Reward Learning [J].
Ellis, Evan ;
Ghosal, Gaurav R. ;
Russell, Stuart J. ;
Dragan, Anca ;
Biyik, Erdem .
2024 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA 2024, 2024, :2814-2821
[28]   Model-Free Preference-Based Reinforcement Learning [J].
Wirth, Christian ;
Fuernkranz, Johannes ;
Neumann, Gerhard .
THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, :2222-2228
[29]   Embedding Learning for Preference-based Speech Quality Assessment [J].
Hu, Cheng-Hung ;
Yasuda, Yusuke ;
Toda, Tomoki .
INTERSPEECH 2024, 2024, :2685-2689
[30]   Learning to Identify Top Elo Ratings: A Dueling Bandits Approach [J].
Yan, Xue ;
Du, Yali ;
Ru, Binxin ;
Wang, Jun ;
Zhang, Haifeng ;
Chen, Xu .
THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, :8797-8805