Towards real-time embodied AI agent: a bionic visual encoding framework for mobile robotics

被引:3
|
作者
Hou, Xueyu [1 ]
Guan, Yongjie [1 ]
Han, Tao [2 ]
Wang, Cong [2 ]
机构
[1] Univ Maine, ECE Dept, Orono, ME 04469 USA
[2] New Jersey Inst Technol, ECE Dept, Newark, NJ USA
关键词
Mobile robotics; Visual encoding; Embodied AI; Computer vision; ICONIC MEMORY;
D O I
10.1007/s41315-024-00363-w
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
Embodied artificial intelligence (AI) agents, which navigate and interact with their environment using sensors and actuators, are being applied for mobile robotic platforms with limited computing power, such as autonomous vehicles, drones, and humanoid robots. These systems make decisions through environmental perception from deep neural network (DNN)-based visual encoders. However, the constrained computational resources and the large amounts of visual data to be processed can create bottlenecks, such as taking almost 300 milliseconds per decision on an embedded GPU board (Jetson Xavier). Existing DNN acceleration methods need model retraining and can still reduce accuracy. To address these challenges, our paper introduces a bionic visual encoder framework, }Robye\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathsf \small {Robye}$$\end{document}, to support real-time requirements of embodied AI agents. The proposed framework complements existing DNN acceleration techniques. Specifically, we integrate motion data to identify overlapping areas between consecutive frames, which reduces DNN workload by propagating encoding results. We bifurcate processing into high-resolution for task-critical areas and low-resolution for less-significant regions. This dual-resolution approach allows us to maintain task performance while lowering the overall computational demands. We evaluate }Robye\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathsf \small {Robye}$$\end{document} across three robotic scenarios: autonomous driving, vision-and-language navigation, and drone navigation, using various DNN models and mobile platforms. }Robye\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathsf \small {Robye}$$\end{document} outperforms baselines in speed (1.2-3. 3 x\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times$$\end{document}), performance (+4%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$+4\%$$\end{document} to +29%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$+29\%$$\end{document}), and power consumption (-36%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-36\%$$\end{document} to -47%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-47\%$$\end{document}).
引用
收藏
页码:1038 / 1056
页数:19
相关论文
共 8 条
  • [1] Real-Time Schedule for Mobile Robotics and WSN Aplications
    Chovanec, Michal
    Sarafin, Peter
    PROCEEDINGS OF THE 2015 FEDERATED CONFERENCE ON COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2015, 5 : 1199 - 1202
  • [2] Real-time Automated Visual Inspection using Mobile Robots
    Hugo Vieira Neto
    Ulrich Nehmzow
    Journal of Intelligent and Robotic Systems, 2007, 49 : 293 - 307
  • [3] Real-time automated visual inspection using mobile robots
    Vieira Neto, Hugo
    Nehmzow, Ulrich
    JOURNAL OF INTELLIGENT & ROBOTIC SYSTEMS, 2007, 49 (03) : 293 - 307
  • [4] Applying Edge AI towards Deep-learning-based Monocular Visual Odometry Model for Mobile Robotics
    Martins de Sousa, Frederico Luiz
    Silva, Mateus Coelho
    Rabelo Oliveira, Ricardo Augusto
    ICEIS: PROCEEDINGS OF THE 24TH INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS - VOL 1, 2022, : 561 - 568
  • [5] Automating quality control: real-time defect detection and automated decision-making with ai and doosan robotics
    Chaabani, Ameni
    Cherif, Raef
    Yaddaden, Yacine
    INTERNATIONAL JOURNAL OF INTELLIGENT ROBOTICS AND APPLICATIONS, 2025,
  • [6] A Dynamic Localized Adjustable Force Field Method for Real-time Assistive Non-holonomic Mobile Robotics
    Gillham, Michael
    Howells, Gareth
    INTERNATIONAL JOURNAL OF ADVANCED ROBOTIC SYSTEMS, 2015, 12
  • [7] A transformer based real-time photo captioning framework for visually impaired people with visual attention
    Muhammed Kunju A.K.
    Baskar S.
    Zafar S.
    A R B.
    S R.
    A S.K.
    Multimedia Tools and Applications, 2024, 83 (41) : 88859 - 88878
  • [8] A Real-time 3D Pose Based Visual Servoing Implementation for an Autonomous Mobile Robot Manipulator
    Sanchez-Lopez, Jose R.
    Marin-Hernandez, Antonio
    Palacios-Hernandez, Elvia R.
    Rios-Figueroa, Homero V.
    Marin-Urias, Luis F.
    3RD IBEROAMERICAN CONFERENCE ON ELECTRONICS ENGINEERING AND COMPUTER SCIENCE, CIIECC 2013, 2013, 7 : 416 - 423