Visual-Tactile Cross-Modal Data Generation Using Residue-Fusion GAN With Feature-Matching and Perceptual Losses

被引：34

作者：

Cai, Shaoyu ^{[1
]}

Zhu, Kening ^{[1
,2
]}

Ban, Yuki ^{[3
]}

Narumi, Takuji ^{[4
,5
]}

机构：

[1] City Univ Hong Kong, Sch Creat Media, Hong Kong, Peoples R China

[2] City Univ Hong Kong, Shenzhen Res Inst, Shenzhen, Peoples R China

[3] Univ Tokyo, Grad Sch Frontier Sci, Chiba, Japan

[4] Univ Tokyo, Grad Sch Informat Sci & Technol, Tokyo, Japan

[5] JST PRESTO, Tokyo, Japan

来源：

IEEE ROBOTICS AND AUTOMATION LETTERS | 2021年 / 6卷 / 04期

基金：

中国国家自然科学基金;

关键词：

Visual learning; deep learning for visual perception; haptics and haptic interfaces; VISION; SHAPE;

D O I：

10.1109/LRA.2021.3095925

中图分类号：

TP24 [机器人技术];

学科分类号：

080202 ; 1405 ;

摘要：

Existing psychophysical studies have revealed that the cross-modal visual-tactile perception is common for humans performing daily activities. However, it is still challenging to build the algorithmic mapping from one modality space to another, namely the cross-modal visual-tactile data translation/generation, which could be potentially important for robotic operation. In this letter, we propose a deep-learning-based approach for cross-modal visual-tactile data generation by leveraging the framework of the generative adversarial networks (GANs). Our approach takes the visual image of a material surface as the visual data, and the accelerometer signal induced by the pen-sliding movement on the surface as the tactile data. We adopt the conditional-GAN (cGAN) structure together with the residue-fusion (RF) module, and train the model with the additional feature-matching (FM) and perceptual losses to achieve the cross-modal data generation. The experimental results show that the inclusion of the RF module, and the FM and the perceptual losses significantly improves cross-modal data generation performance in terms of the classification accuracy upon the generated data and the visual similarity between the ground-truth and the generated data.

引用

页码：7525 / 7532

页数：8

共 44 条

[1]

Aytar Y, 2016, ADV NEUR IN, V29

[2]

Cai S., 2020, EUROGRAPHICS ASS

[3]

Calandra R., 2017, ANN C ROBOT LEARNING, V78, P314

[4] Crossmodal processing in the human brain: Insights from functional neuroimaging studies [J].

Calvert, GA .

CEREBRAL CORTEX, 2001, 11 (12) :1110-1123

[5] Facilitating Human-Mobile Robot Communication via Haptic Feedback and Gesture Teleoperation [J].

Che, Yuhang ;

Culbertson, Heather ;

Tang, Chih-Wei ;

Aich, Sudipto ;

Okamura, Allison M. .

ACM TRANSACTIONS ON HUMAN-ROBOT INTERACTION, 2018, 7 (03)

[6] Deep Cross-Modal Audio-Visual Generation [J].

Chen, Lele ;

Srivastava, Sudhanshu ;

Duan, Zhiyao ;

Xu, Chenliang .

PROCEEDINGS OF THE THEMATIC WORKSHOPS OF ACM MULTIMEDIA 2017 (THEMATIC WORKSHOPS'17), 2017, :349-357

[7] DeepFaceDrawing: Deep Generation of Face Images from Sketches [J].

Chen, Shu-Yu ;

Su, Wanchao ;

Gao, Lin ;

Xia, Shihong ;

Fu, Hongbo .

ACM TRANSACTIONS ON GRAPHICS, 2020, 39 (04)

[8] A Frequency-Domain Analysis of Haptic Gratings [J].

Cholewiak, Steven A. ;

Kim, Kwangtaek ;

Tan, Hong Z. ;

Adelstein, Bernard D. .

IEEE TRANSACTIONS ON HAPTICS, 2010, 3 (01) :3-14

[9]

Duan B., 2019, ARXIV190701826

[10]

Fang B, 2018, IEEE INT CONF ROBOT, P4740

← 1 2 3 4 5 →