Robots Understanding Contextual Information in Human-Centered Environments Using Weakly Supervised Mask Data Distillation

被引：2

作者：

Dworakowski, Daniel ^{[1
]}

Fung, Angus ^{[1
]}

Nejat, Goldie ^{[1
]}

机构：

[1] Univ Toronto, Dept Mech & Ind Engn, Autonomous Syst & Biomechatron Lab ASBLab, 5 Kings Coll Rd, Toronto, ON M5S 3G8, Canada

来源：

INTERNATIONAL JOURNAL OF COMPUTER VISION | 2023年 / 131卷 / 02期

基金：

加拿大自然科学与工程研究理事会;

关键词：

Weakly supervised learning for robots; Environment context identification; Segmentation and labeling; Robot navigation and exploration; ORIENTED TEXT; OBJECT; SEGMENTATION; RECOGNITION; ATTENTION;

D O I：

10.1007/s11263-022-01706-5

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Contextual information contained within human environments, such as text on signs, symbols and objects provide important information for robots to use for exploration and navigation. To identify and segment contextual information from images obtained in these environments data-driven methods such as Convolutional Neural Networks (CNNs) can be used. However, these methods require significant amounts of human labeled data which is time-consuming to obtain. In this paper, we present the novel Weakly Supervised Mask Data Distillation (WeSuperMaDD) architecture for autonomously generating pseudo segmentation labels (PSLs) using CNNs not specifically trained for the task of text segmentation, e.g., CNNs alternatively trained for: object classification or image captioning. WeSuperMaDD is uniquely able to generate PSLs using learned image features from datasets that are sparse and with limited diversity, which are common in robot navigation tasks in human-centred environments (i.e., malls, stores). Our proposed architecture uses a new mask refinement system which automatically searches for the PSL with the fewest foreground pixels that satisfies cost constraints. This removes the need for handcrafted heuristic rules. Extensive experiments were conducted to validate the performance of WeSuperMaDD in generating PSLs for datasets containing text of various scales, fonts, orientations, curvatures, and perspectives in several indoor/outdoor environments. A detailed comparison study conducted with existing approaches found a significant improvement in PSL quality. Furthermore, an instance segmentation CNN trained using the WeSuperMaDD architecture achieved measurable improvements in accuracy when compared to an instance segmentation CNN trained with Naive PSLs. We also found our method to have comparable performance to existing text detection methods.

引用

页码：407 / 430

页数：24

共 36 条

[21] Human Detection and Identification by Robots Using Thermal and Visual Information in Domestic Environments
Mauricio Correa
Gabriel Hermosilla
Rodrigo Verschae
Javier Ruiz-del-Solar
Journal of Intelligent & Robotic Systems, 2012, 66 : 223 - 243
[22] Human Detection and Identification by Robots Using Thermal and Visual Information in Domestic Environments
Correa, Mauricio
Hermosilla, Gabriel
Verschae, Rodrigo
Ruiz-del-Solar, Javier
JOURNAL OF INTELLIGENT & ROBOTIC SYSTEMS, 2012, 66 (1-2) : 223 - 243
[23] Information support system with case-based reasoning using motion recognition in human-centered city
Okawa, Tomo
Sato, Eri
Yamaguchi, Toru
PROCEEDINGS OF SICE ANNUAL CONFERENCE, VOLS 1-8, 2007, : 607 - 610
[24] Safe mobile robot navigation in human-centered environments using a heat map-based path planner
Abhijeet Ravankar
Ankit A. Ravankar
Yohei Hoshino
Michiko Watanabe
Yukinori Kobayashi
Artificial Life and Robotics, 2020, 25 : 264 - 272
[25] Toward A Human-Centered Hyperlipidemia Management System: The Interaction between Internal and External Information on Relational Data Search
Gong, Yang
Zhang, Jiajie
JOURNAL OF MEDICAL SYSTEMS, 2011, 35 (02) : 169 - 177
[26] Toward A Human-Centered Hyperlipidemia Management System: The Interaction between Internal and External Information on Relational Data Search
Yang Gong
Jiajie Zhang
Journal of Medical Systems, 2011, 35 : 169 - 177
[27] Effective light spot detection in intracellular images for small number of supervised data using contextual information
Hotta, Kazuhiro, 1731, Institute of Electrical Engineers of Japan (134):
[28] Human Centered Scene Understanding Based on Depth Information - How to Deal with Noisy Skeleton Data?
Planinc, Rainer
Kampel, Martin
ADVANCES IN VISUAL COMPUTING (ISVC 2014), PT 1, 2014, 8887 : 609 - 618
[29] A Provably Secure IBE Transformation Model for PKC Using Conformable Chebyshev Chaotic Maps under Human-Centered IoT Environments
Meshram, Chandrashekhar
Imoize, Agbotiname Lucky
Aljaedi, Amer
Alharbi, Adel R.
Jamal, Sajjad Shaukat
Barve, Sharad Kumar
SENSORS, 2021, 21 (21)
[30] Design of a Framework for Interoperable Motion Effects for 4D Theaters using Human-centered Motion Data
Shin, Suchul
Ahn, Yangwoo
Choi, Jaesung
Han, Soonhung
PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTER ENTERTAINMENT TECHNOLOGY (ACE 2010), 2010, : 96 - 97

← 1 2 3 4 →