Online Personalized Preference Learning Method Based on In-Formative Query for Lane Centering Control Trajectory

被引：3

作者：

Ran, Wei ^{[1
]}

Chen, Hui ^{[1
]}

Xia, Taokai ^{[1
]}

Nishimura, Yosuke ^{[2
]}

Guo, Chaopeng ^{[2
]}

Yin, Youyu ^{[3
]}

机构：

[1] Tongji Univ, Sch Automot Studies, Shanghai 201804, Peoples R China

[2] JTEKT Corp, Nara 6348555, Japan

[3] JTEKT Res & Dev Ctr WUXI Co Ltd, Wuxi 214161, Peoples R China

来源：

SENSORS | 2023年 / 23卷 / 11期

关键词：

online learning; preference learning; utility theory; Bayesian approach; LCC trajectory; ADVANCED DRIVER ASSISTANCE; DRIVING STYLE RECOGNITION; MODEL;

D O I：

10.3390/s23115246

中图分类号：

O65 [分析化学];

学科分类号：

070302 ; 081704 ;

摘要：

The personalization of autonomous vehicles or advanced driver assistance systems has been a widely researched topic, with many proposals aiming to achieve human-like or driver-imitating methods. However, these approaches rely on an implicit assumption that all drivers prefer the vehicle to drive like themselves, which may not hold true for all drivers. To address this issue, this study proposes an online personalized preference learning method (OPPLM) that utilizes a pairwise comparison group preference query and the Bayesian approach. The proposed OPPLM adopts a two-layer hierarchical structure model based on utility theory to represent driver preferences on the trajectory. To improve the accuracy of learning, the uncertainty of driver query answers is modeled. In addition, informative query and greedy query selection methods are used to improve learning speed. To determine when the driver's preferred trajectory has been found, a convergence criterion is proposed. To evaluate the effectiveness of the OPPLM, a user study is conducted to learn the driver's preferred trajectory in the curve of the lane centering control (LCC) system. The results show that the OPPLM can converge quickly, requiring only about 11 queries on average. Moreover, it accurately learned the driver's favorite trajectory, and the estimated utility of the driver preference model is highly consistent with the subject evaluation score.

引用

页数：22

共 40 条

[1]

Abbeel P., 2004, Apprenticeship learning via inverse reinforcement learning, P1

[2]

Akgun B., 2012, P 7 ANN ACM IEEE INT

[3]

Akrour Riad, 2012, Machine Learning and Knowledge Discovery in Databases. Proceedings of the European Conference (ECML PKDD 2012), P116, DOI 10.1007/978-3-642-33486-3_8

[4] A survey of robot learning from demonstration [J].

Argall, Brenna D. ;

Chernova, Sonia ;

Veloso, Manuela ;

Browning, Brett .

ROBOTICS AND AUTONOMOUS SYSTEMS, 2009, 57 (05) :469-483

[5]

Bajcsy A, 2017, PR MACH LEARN RES, V78

[6] Learning from Richer Human Guidance: Augmenting Comparison-Based Learning with Feature Queries [J].

Basu, Chandrayee ;

Singhal, Mukesh ;

Dragan, Anca D. .

HRI '18: PROCEEDINGS OF THE 2018 ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION, 2018, :132-140

[7] Do You Want Your Autonomous Car To Drive Like You? [J].

Basu, Chandrayee ;

Yang, Qian ;

Hungerman, David ;

Singhal, Mukesh ;

Dragan, Anca D. .

PROCEEDINGS OF THE 2017 ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION (HRI'17), 2017, :417-425

[8]

Ben-Akiva M., 1985, Discrete Choice Analysis: Theory and Application to Travel Demand"

[9]

Biyik E., 2018, C ROBOT LEARNING, V87, P519

[10]

Biyik E, 2019, Arxiv, DOI arXiv:1910.04365

← 1 2 3 4 →