Robots Understanding Contextual Information in Human-Centered Environments Using Weakly Supervised Mask Data Distillation

被引:2
|
作者
Dworakowski, Daniel [1 ]
Fung, Angus [1 ]
Nejat, Goldie [1 ]
机构
[1] Univ Toronto, Dept Mech & Ind Engn, Autonomous Syst & Biomechatron Lab ASBLab, 5 Kings Coll Rd, Toronto, ON M5S 3G8, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Weakly supervised learning for robots; Environment context identification; Segmentation and labeling; Robot navigation and exploration; ORIENTED TEXT; OBJECT; SEGMENTATION; RECOGNITION; ATTENTION;
D O I
10.1007/s11263-022-01706-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Contextual information contained within human environments, such as text on signs, symbols and objects provide important information for robots to use for exploration and navigation. To identify and segment contextual information from images obtained in these environments data-driven methods such as Convolutional Neural Networks (CNNs) can be used. However, these methods require significant amounts of human labeled data which is time-consuming to obtain. In this paper, we present the novel Weakly Supervised Mask Data Distillation (WeSuperMaDD) architecture for autonomously generating pseudo segmentation labels (PSLs) using CNNs not specifically trained for the task of text segmentation, e.g., CNNs alternatively trained for: object classification or image captioning. WeSuperMaDD is uniquely able to generate PSLs using learned image features from datasets that are sparse and with limited diversity, which are common in robot navigation tasks in human-centred environments (i.e., malls, stores). Our proposed architecture uses a new mask refinement system which automatically searches for the PSL with the fewest foreground pixels that satisfies cost constraints. This removes the need for handcrafted heuristic rules. Extensive experiments were conducted to validate the performance of WeSuperMaDD in generating PSLs for datasets containing text of various scales, fonts, orientations, curvatures, and perspectives in several indoor/outdoor environments. A detailed comparison study conducted with existing approaches found a significant improvement in PSL quality. Furthermore, an instance segmentation CNN trained using the WeSuperMaDD architecture achieved measurable improvements in accuracy when compared to an instance segmentation CNN trained with Naive PSLs. We also found our method to have comparable performance to existing text detection methods.
引用
收藏
页码:407 / 430
页数:24
相关论文
共 36 条
  • [21] Human Detection and Identification by Robots Using Thermal and Visual Information in Domestic Environments
    Mauricio Correa
    Gabriel Hermosilla
    Rodrigo Verschae
    Javier Ruiz-del-Solar
    Journal of Intelligent & Robotic Systems, 2012, 66 : 223 - 243
  • [22] Human Detection and Identification by Robots Using Thermal and Visual Information in Domestic Environments
    Correa, Mauricio
    Hermosilla, Gabriel
    Verschae, Rodrigo
    Ruiz-del-Solar, Javier
    JOURNAL OF INTELLIGENT & ROBOTIC SYSTEMS, 2012, 66 (1-2) : 223 - 243
  • [23] Information support system with case-based reasoning using motion recognition in human-centered city
    Okawa, Tomo
    Sato, Eri
    Yamaguchi, Toru
    PROCEEDINGS OF SICE ANNUAL CONFERENCE, VOLS 1-8, 2007, : 607 - 610
  • [24] Safe mobile robot navigation in human-centered environments using a heat map-based path planner
    Abhijeet Ravankar
    Ankit A. Ravankar
    Yohei Hoshino
    Michiko Watanabe
    Yukinori Kobayashi
    Artificial Life and Robotics, 2020, 25 : 264 - 272
  • [25] Toward A Human-Centered Hyperlipidemia Management System: The Interaction between Internal and External Information on Relational Data Search
    Gong, Yang
    Zhang, Jiajie
    JOURNAL OF MEDICAL SYSTEMS, 2011, 35 (02) : 169 - 177
  • [26] Toward A Human-Centered Hyperlipidemia Management System: The Interaction between Internal and External Information on Relational Data Search
    Yang Gong
    Jiajie Zhang
    Journal of Medical Systems, 2011, 35 : 169 - 177
  • [27] Effective light spot detection in intracellular images for small number of supervised data using contextual information
    Hotta, Kazuhiro, 1731, Institute of Electrical Engineers of Japan (134):
  • [28] Human Centered Scene Understanding Based on Depth Information - How to Deal with Noisy Skeleton Data?
    Planinc, Rainer
    Kampel, Martin
    ADVANCES IN VISUAL COMPUTING (ISVC 2014), PT 1, 2014, 8887 : 609 - 618
  • [29] A Provably Secure IBE Transformation Model for PKC Using Conformable Chebyshev Chaotic Maps under Human-Centered IoT Environments
    Meshram, Chandrashekhar
    Imoize, Agbotiname Lucky
    Aljaedi, Amer
    Alharbi, Adel R.
    Jamal, Sajjad Shaukat
    Barve, Sharad Kumar
    SENSORS, 2021, 21 (21)
  • [30] Design of a Framework for Interoperable Motion Effects for 4D Theaters using Human-centered Motion Data
    Shin, Suchul
    Ahn, Yangwoo
    Choi, Jaesung
    Han, Soonhung
    PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTER ENTERTAINMENT TECHNOLOGY (ACE 2010), 2010, : 96 - 97