Leveraging explainability for understanding object descriptions in ambiguous 3D environments

被引:4
作者
Dogan, Fethiye Irmak [1 ]
Melsion, Gaspar I. [1 ]
Leite, Iolanda [1 ]
机构
[1] KTH Royal Inst Technol, Sch Elect Engn & Comp Sci, Div Robot Percept & Learning, Stockholm, Sweden
基金
瑞典研究理事会;
关键词
explainability; resolving ambiguities; depth; referring expression comprehension (REC); real-world environments;
D O I
10.3389/frobt.2022.937772
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
For effective human-robot collaboration, it is crucial for robots to understand requests from users perceiving the three-dimensional space and ask reasonable follow-up questions when there are ambiguities. While comprehending the users' object descriptions in the requests, existing studies have focused on this challenge for limited object categories that can be detected or localized with existing object detection and localization modules. Further, they have mostly focused on comprehending the object descriptions using flat RGB images without considering the depth dimension. On the other hand, in the wild, it is impossible to limit the object categories that can be encountered during the interaction, and 3-dimensional space perception that includes depth information is fundamental in successful task completion. To understand described objects and resolve ambiguities in the wild, for the first time, we suggest a method leveraging explainability. Our method focuses on the active areas of an RGB scene to find the described objects without putting the previous constraints on object categories and natural language instructions. We further improve our method to identify the described objects considering depth dimension. We evaluate our method in varied real-world images and observe that the regions suggested by our method can help resolve ambiguities. When we compare our method with a state-of-the-art baseline, we show that our method performs better in scenes with ambiguous objects which cannot be recognized by existing object detectors. We also show that using depth features significantly improves performance in scenes where depth data is critical to disambiguate the objects and across our evaluation dataset that contains objects that can be specified with and without the depth dimension.
引用
收藏
页数:19
相关论文
共 67 条
[1]   Trends and Trajectories for Explainable, Accountable and Intelligible Systems: An HCI Research Agenda [J].
Abdul, Ashraf ;
Vermeulen, Jo ;
Wang, Danding ;
Lim, Brian ;
Kankanhalli, Mohan .
PROCEEDINGS OF THE 2018 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS (CHI 2018), 2018,
[2]   ReferIt3D: Neural Listeners for Fine-Grained 3D Object Identification in Real-World Scenes [J].
Achlioptas, Panos ;
Abdelreheem, Ahmed ;
Xia, Fei ;
Elhoseiny, Mohamed ;
Guibas, Leonidas .
COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :422-440
[3]   Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI) [J].
Adadi, Amina ;
Berrada, Mohammed .
IEEE ACCESS, 2018, 6 :52138-52160
[4]  
Alonso Jose M., 2021, Trustworthy AI - Integrating Learning, Optimization and Reasoning. First International Workshop, TAILOR 2020. Revised Selected Papers. Lecture Notes in Artificial Intelligence, Subseries of Lecture Notes in Computer Science (LNAI 12641), P63, DOI 10.1007/978-3-030-73959-1_5
[5]  
[Anonymous], 2018, P 11 INT C NAT LANG
[6]   Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI [J].
Barredo Arrieta, Alejandro ;
Diaz-Rodriguez, Natalia ;
Del Ser, Javier ;
Bennetot, Adrien ;
Tabik, Siham ;
Barbado, Alberto ;
Garcia, Salvador ;
Gil-Lopez, Sergio ;
Molina, Daniel ;
Benjamins, Richard ;
Chatila, Raja ;
Herrera, Francisco .
INFORMATION FUSION, 2020, 58 :82-115
[7]   State-of-the-Art in Visual Attention Modeling [J].
Borji, Ali ;
Itti, Laurent .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (01) :185-207
[8]  
Chao C, 2010, ACMIEEE INT CONF HUM, P317, DOI 10.1109/HRI.2010.5453178
[9]   ScanRefer: 3D Object Localization in RGB-D Scans Using Natural Language [J].
Chen, Dave Zhenyu ;
Chang, Angel X. ;
Niessner, Matthias .
COMPUTER VISION - ECCV 2020, PT XX, 2020, 12365 :202-221
[10]  
Chen Z., 2021, P IEEE CVF C COMP VI, P3193