Visual-Tactile Cross-Modal Data Generation Using Residue-Fusion GAN With Feature-Matching and Perceptual Losses

被引:34
作者
Cai, Shaoyu [1 ]
Zhu, Kening [1 ,2 ]
Ban, Yuki [3 ]
Narumi, Takuji [4 ,5 ]
机构
[1] City Univ Hong Kong, Sch Creat Media, Hong Kong, Peoples R China
[2] City Univ Hong Kong, Shenzhen Res Inst, Shenzhen, Peoples R China
[3] Univ Tokyo, Grad Sch Frontier Sci, Chiba, Japan
[4] Univ Tokyo, Grad Sch Informat Sci & Technol, Tokyo, Japan
[5] JST PRESTO, Tokyo, Japan
基金
中国国家自然科学基金;
关键词
Visual learning; deep learning for visual perception; haptics and haptic interfaces; VISION; SHAPE;
D O I
10.1109/LRA.2021.3095925
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
Existing psychophysical studies have revealed that the cross-modal visual-tactile perception is common for humans performing daily activities. However, it is still challenging to build the algorithmic mapping from one modality space to another, namely the cross-modal visual-tactile data translation/generation, which could be potentially important for robotic operation. In this letter, we propose a deep-learning-based approach for cross-modal visual-tactile data generation by leveraging the framework of the generative adversarial networks (GANs). Our approach takes the visual image of a material surface as the visual data, and the accelerometer signal induced by the pen-sliding movement on the surface as the tactile data. We adopt the conditional-GAN (cGAN) structure together with the residue-fusion (RF) module, and train the model with the additional feature-matching (FM) and perceptual losses to achieve the cross-modal data generation. The experimental results show that the inclusion of the RF module, and the FM and the perceptual losses significantly improves cross-modal data generation performance in terms of the classification accuracy upon the generated data and the visual similarity between the ground-truth and the generated data.
引用
收藏
页码:7525 / 7532
页数:8
相关论文
共 44 条
[1]  
Aytar Y, 2016, ADV NEUR IN, V29
[2]  
Cai S., 2020, EUROGRAPHICS ASS
[3]  
Calandra R., 2017, ANN C ROBOT LEARNING, V78, P314
[4]   Crossmodal processing in the human brain: Insights from functional neuroimaging studies [J].
Calvert, GA .
CEREBRAL CORTEX, 2001, 11 (12) :1110-1123
[5]   Facilitating Human-Mobile Robot Communication via Haptic Feedback and Gesture Teleoperation [J].
Che, Yuhang ;
Culbertson, Heather ;
Tang, Chih-Wei ;
Aich, Sudipto ;
Okamura, Allison M. .
ACM TRANSACTIONS ON HUMAN-ROBOT INTERACTION, 2018, 7 (03)
[6]   Deep Cross-Modal Audio-Visual Generation [J].
Chen, Lele ;
Srivastava, Sudhanshu ;
Duan, Zhiyao ;
Xu, Chenliang .
PROCEEDINGS OF THE THEMATIC WORKSHOPS OF ACM MULTIMEDIA 2017 (THEMATIC WORKSHOPS'17), 2017, :349-357
[7]   DeepFaceDrawing: Deep Generation of Face Images from Sketches [J].
Chen, Shu-Yu ;
Su, Wanchao ;
Gao, Lin ;
Xia, Shihong ;
Fu, Hongbo .
ACM TRANSACTIONS ON GRAPHICS, 2020, 39 (04)
[8]   A Frequency-Domain Analysis of Haptic Gratings [J].
Cholewiak, Steven A. ;
Kim, Kwangtaek ;
Tan, Hong Z. ;
Adelstein, Bernard D. .
IEEE TRANSACTIONS ON HAPTICS, 2010, 3 (01) :3-14
[9]  
Duan B., 2019, ARXIV190701826
[10]  
Fang B, 2018, IEEE INT CONF ROBOT, P4740