Cross-Modal Generation of Tactile Friction Coefficient From Audio and Visual Measurements by Transformer

被引：7

作者：

Song, Rui ^{[1
,2
]}

Sun, Xiaoying ^{[1
]}

Liu, Guohong ^{[1
]}

机构：

[1] Jilin Univ, Coll Commun Engn, Changchun 130022, Peoples R China

[2] Jilin Univ, Int Ctr Future Sci, Changchun 130022, Peoples R China

来源：

IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT | 2023年 / 72卷

关键词：

Audio data; electrovibration; surface haptics; tactile friction coefficient; Transformer; visual data; FUSION;

D O I：

10.1109/TIM.2023.3311071

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Generating tactile data (e.g., friction coefficient) from audio and visual modalities can avoid time-consuming practical measurements and ensure high-fidelity haptic rendering of surface textures. In this article, we present a Transformer-based method for cross-modal generation of the tactile friction coefficient. Using the self-attention mechanism, we jointly encode the amplitude and phase information of audio spectrums and RGB images to extract global and local features. Then, we convert the joint coding features into tactile decoding features using a Transformer module in a bottleneck converter. We continuously decode and reconstruct them to obtain amplitude and phase information of tactile friction coefficients. Finally, we convert this information into 1-D friction coefficients using inverse short-time Fourier transform (ISTFT). Evaluations of the LMT Haptic Material Database confirm the obvious performance improvement of the proposed method. Furthermore, with the generated friction by the Transformer and the custom-designed electrovibration device, a novel rendering method is proposed, which simultaneously utilizes the amplitude and frequency of driving signals to display tactile textures on touchscreens. User experiments are organized to evaluate the rendering fidelity of the generated friction coefficient. The two-way analysis of variance (ANOVA) with repeated measurements indicates that the rendering fidelity of the Transformer method is significantly improved compared with the contrast methods ( p < 0.05).

引用

页数：12

共 46 条

[1]

Ba JL, 2016, arXiv

[2] The significance and use of the friction coefficient [J].

Blau, PJ .

TRIBOLOGY INTERNATIONAL, 2001, 34 (09) :585-591

[3] Multi-modal Transformer-based Tactile Signal Generation for Haptic Texture Simulation of Materials in Virtual and Augmented Reality [J].

Cai, Shaoyu ;

Zhu, Kening .

2022 IEEE INTERNATIONAL SYMPOSIUM ON MIXED AND AUGMENTED REALITY ADJUNCT (ISMAR-ADJUNCT 2022), 2022, :810-811

[4] GAN-based image-to-friction generation for tactile simulation of fabric material [J].

Cai, Shaoyu ;

Zhao, Lu ;

Ban, Yuki ;

Narumi, Takuji ;

Liu, Yue ;

Zhu, Kening .

COMPUTERS & GRAPHICS-UK, 2022, 102 :460-473

[5] Visual-Tactile Cross-Modal Data Generation Using Residue-Fusion GAN With Feature-Matching and Perceptual Losses [J].

Cai, Shaoyu ;

Zhu, Kening ;

Ban, Yuki ;

Narumi, Takuji .

IEEE ROBOTICS AND AUTOMATION LETTERS, 2021, 6 (04) :7525-7532

[6]

Cao G., 2023, arXiv

[7] End-to-End Object Detection with Transformers [J].

Carion, Nicolas ;

Massa, Francisco ;

Synnaeve, Gabriel ;

Usunier, Nicolas ;

Kirillov, Alexander ;

Zagoruyko, Sergey .

COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229

[8]

Culbertson H, 2014, IEEE HAPTICS SYM, P319, DOI 10.1109/HAPTICS.2014.6775475

[9] Multimodal Affective Computing With Dense Fusion Transformer for Inter- and Intra-Modality Interactions [J].

Deng, Huan ;

Yang, Zhenguo ;

Hao, Tianyong ;

Li, Qing ;

Liu, Wenyin .

IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 :6575-6587

[10]

Devlin J, 2019, Arxiv, DOI [arXiv:1810.04805, 10.48550/arxiv.1810.04805]

← 1 2 3 4 5 →