Robots Understanding Contextual Information in Human-Centered Environments Using Weakly Supervised Mask Data Distillation

被引:2
|
作者
Dworakowski, Daniel [1 ]
Fung, Angus [1 ]
Nejat, Goldie [1 ]
机构
[1] Univ Toronto, Dept Mech & Ind Engn, Autonomous Syst & Biomechatron Lab ASBLab, 5 Kings Coll Rd, Toronto, ON M5S 3G8, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Weakly supervised learning for robots; Environment context identification; Segmentation and labeling; Robot navigation and exploration; ORIENTED TEXT; OBJECT; SEGMENTATION; RECOGNITION; ATTENTION;
D O I
10.1007/s11263-022-01706-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Contextual information contained within human environments, such as text on signs, symbols and objects provide important information for robots to use for exploration and navigation. To identify and segment contextual information from images obtained in these environments data-driven methods such as Convolutional Neural Networks (CNNs) can be used. However, these methods require significant amounts of human labeled data which is time-consuming to obtain. In this paper, we present the novel Weakly Supervised Mask Data Distillation (WeSuperMaDD) architecture for autonomously generating pseudo segmentation labels (PSLs) using CNNs not specifically trained for the task of text segmentation, e.g., CNNs alternatively trained for: object classification or image captioning. WeSuperMaDD is uniquely able to generate PSLs using learned image features from datasets that are sparse and with limited diversity, which are common in robot navigation tasks in human-centred environments (i.e., malls, stores). Our proposed architecture uses a new mask refinement system which automatically searches for the PSL with the fewest foreground pixels that satisfies cost constraints. This removes the need for handcrafted heuristic rules. Extensive experiments were conducted to validate the performance of WeSuperMaDD in generating PSLs for datasets containing text of various scales, fonts, orientations, curvatures, and perspectives in several indoor/outdoor environments. A detailed comparison study conducted with existing approaches found a significant improvement in PSL quality. Furthermore, an instance segmentation CNN trained using the WeSuperMaDD architecture achieved measurable improvements in accuracy when compared to an instance segmentation CNN trained with Naive PSLs. We also found our method to have comparable performance to existing text detection methods.
引用
收藏
页码:407 / 430
页数:24
相关论文
共 36 条
  • [1] Robots Understanding Contextual Information in Human-Centered Environments Using Weakly Supervised Mask Data Distillation
    Daniel Dworakowski
    Angus Fung
    Goldie Nejat
    International Journal of Computer Vision, 2023, 131 : 407 - 430
  • [2] Understanding Our Robots With the Help of Human-Centered Explainable AI
    Sanneman, Lindsay
    XRDS: Crossroads, 2023, 30 (01): : 52 - 57
  • [3] Remote object navigation for service robots using hierarchical knowledge graph in human-centered environments
    Yongwei Li
    Yalong Ma
    Xiang Huo
    Xinkai Wu
    Intelligent Service Robotics, 2022, 15 : 459 - 473
  • [4] Remote object navigation for service robots using hierarchical knowledge graph in human-centered environments
    Li, Yongwei
    Ma, Yalong
    Huo, Xiang
    Wu, Xinkai
    INTELLIGENT SERVICE ROBOTICS, 2022, 15 (04) : 459 - 473
  • [5] Editorial: Task planning and motion control problems of service robots in human-centered environments
    Hyungpil Moon
    Byoung-Tak Zhang
    Changjoo Nam
    Intelligent Service Robotics, 2022, 15 : 439 - 440
  • [6] Editorial: Task planning and motion control problems of service robots in human-centered environments
    Moon, Hyungpil
    Zhang, Byoung-Tak
    Nam, Changjoo
    INTELLIGENT SERVICE ROBOTICS, 2022, 15 (04) : 439 - 440
  • [7] Challenges for Smart Environments - Human-Centered Computing, Data Science, and Ambient Intelligence
    Baloian, Nelson
    Pino, Jose A.
    JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2021, 27 (11) : 1149 - 1151
  • [8] Supervised Landmask Estimation using Contextual Information in SAR Data
    Martin-de-Nicolas, J.
    Barcena-Humanes, J. L.
    Palma-Vazquez, A.
    Mata-Moya, D.
    Jarabo-Amores, P.
    2012 IEEE INTELLIGENT VEHICLES SYMPOSIUM (IV), 2012, : 153 - 158
  • [9] Towards a Context Model for Human-Centered Design of Contextual Data Entry Systems in Healthcare Domain
    Baas, Maxime
    Bernonville, Stephanie
    Bricon-Souf, Nathalie
    Hassler, Sylvain
    Kolski, Christophe
    Boy, Guy Andre
    ENGINEERING PSYCHOLOGY AND COGNITIVE ERGONOMICS, EPCE 2014, 2014, 8532 : 223 - 233
  • [10] Recognizing Human Activities From Video Using Weakly Supervised Contextual Features
    Ajmal, Muhammad
    Ahmad, Farooq
    Naseer, Mudasser
    Jamjoom, Mona
    IEEE ACCESS, 2019, 7 : 98420 - 98435