Robots Understanding Contextual Information in Human-Centered Environments Using Weakly Supervised Mask Data Distillation

被引：2

作者：

Dworakowski, Daniel ^{[1
]}

Fung, Angus ^{[1
]}

Nejat, Goldie ^{[1
]}

机构：

[1] Univ Toronto, Dept Mech & Ind Engn, Autonomous Syst & Biomechatron Lab ASBLab, 5 Kings Coll Rd, Toronto, ON M5S 3G8, Canada

来源：

INTERNATIONAL JOURNAL OF COMPUTER VISION | 2023年 / 131卷 / 02期

基金：

加拿大自然科学与工程研究理事会;

关键词：

Weakly supervised learning for robots; Environment context identification; Segmentation and labeling; Robot navigation and exploration; ORIENTED TEXT; OBJECT; SEGMENTATION; RECOGNITION; ATTENTION;

D O I：

10.1007/s11263-022-01706-5

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Contextual information contained within human environments, such as text on signs, symbols and objects provide important information for robots to use for exploration and navigation. To identify and segment contextual information from images obtained in these environments data-driven methods such as Convolutional Neural Networks (CNNs) can be used. However, these methods require significant amounts of human labeled data which is time-consuming to obtain. In this paper, we present the novel Weakly Supervised Mask Data Distillation (WeSuperMaDD) architecture for autonomously generating pseudo segmentation labels (PSLs) using CNNs not specifically trained for the task of text segmentation, e.g., CNNs alternatively trained for: object classification or image captioning. WeSuperMaDD is uniquely able to generate PSLs using learned image features from datasets that are sparse and with limited diversity, which are common in robot navigation tasks in human-centred environments (i.e., malls, stores). Our proposed architecture uses a new mask refinement system which automatically searches for the PSL with the fewest foreground pixels that satisfies cost constraints. This removes the need for handcrafted heuristic rules. Extensive experiments were conducted to validate the performance of WeSuperMaDD in generating PSLs for datasets containing text of various scales, fonts, orientations, curvatures, and perspectives in several indoor/outdoor environments. A detailed comparison study conducted with existing approaches found a significant improvement in PSL quality. Furthermore, an instance segmentation CNN trained using the WeSuperMaDD architecture achieved measurable improvements in accuracy when compared to an instance segmentation CNN trained with Naive PSLs. We also found our method to have comparable performance to existing text detection methods.

引用

页码：407 / 430

页数：24

共 36 条

[1] Robots Understanding Contextual Information in Human-Centered Environments Using Weakly Supervised Mask Data Distillation
Daniel Dworakowski
Angus Fung
Goldie Nejat
International Journal of Computer Vision, 2023, 131 : 407 - 430
[2] Understanding Our Robots With the Help of Human-Centered Explainable AI
Sanneman, Lindsay
XRDS: Crossroads, 2023, 30 (01): : 52 - 57
[3] Remote object navigation for service robots using hierarchical knowledge graph in human-centered environments
Yongwei Li
Yalong Ma
Xiang Huo
Xinkai Wu
Intelligent Service Robotics, 2022, 15 : 459 - 473
[4] Remote object navigation for service robots using hierarchical knowledge graph in human-centered environments
Li, Yongwei
Ma, Yalong
Huo, Xiang
Wu, Xinkai
INTELLIGENT SERVICE ROBOTICS, 2022, 15 (04) : 459 - 473
[5] Editorial: Task planning and motion control problems of service robots in human-centered environments
Hyungpil Moon
Byoung-Tak Zhang
Changjoo Nam
Intelligent Service Robotics, 2022, 15 : 439 - 440
[6] Editorial: Task planning and motion control problems of service robots in human-centered environments
Moon, Hyungpil
Zhang, Byoung-Tak
Nam, Changjoo
INTELLIGENT SERVICE ROBOTICS, 2022, 15 (04) : 439 - 440
[7] Challenges for Smart Environments - Human-Centered Computing, Data Science, and Ambient Intelligence
Baloian, Nelson
Pino, Jose A.
JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2021, 27 (11) : 1149 - 1151
[8] Supervised Landmask Estimation using Contextual Information in SAR Data
Martin-de-Nicolas, J.
Barcena-Humanes, J. L.
Palma-Vazquez, A.
Mata-Moya, D.
Jarabo-Amores, P.
2012 IEEE INTELLIGENT VEHICLES SYMPOSIUM (IV), 2012, : 153 - 158
[9] Towards a Context Model for Human-Centered Design of Contextual Data Entry Systems in Healthcare Domain
Baas, Maxime
Bernonville, Stephanie
Bricon-Souf, Nathalie
Hassler, Sylvain
Kolski, Christophe
Boy, Guy Andre
ENGINEERING PSYCHOLOGY AND COGNITIVE ERGONOMICS, EPCE 2014, 2014, 8532 : 223 - 233
[10] Recognizing Human Activities From Video Using Weakly Supervised Contextual Features
Ajmal, Muhammad
Ahmad, Farooq
Naseer, Mudasser
Jamjoom, Mona
IEEE ACCESS, 2019, 7 : 98420 - 98435

← 1 2 3 4 →