Gesture2Text: A Generalizable Decoder for Word-Gesture Keyboards in XR Through Trajectory Coarse Discretization and Pre-Training

被引:0
作者
Shen, Junxiao [1 ]
Khaldi, Khadija [2 ]
Zhou, Enmin [2 ]
Surale, Hemant Bhaskar [2 ]
Karlson, Amy [2 ]
机构
[1] Univ Bristol, Bristol, England
[2] Meta, Real Labs Res, Burlingame, CA USA
关键词
Decoding; Trajectory; Keyboards; Accuracy; Sharks; Extended reality; Training; Pre-trained models; text entry; word-gesture keyboard; discretization;
D O I
10.1109/TVCG.2024.3456198
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Text entry with word-gesture keyboards (WGK) is emerging as a popular method and becoming a key interaction for Extended Reality (XR). However, the diversity of interaction modes, keyboard sizes, and visual feedback in these environments introduces divergent word-gesture trajectory data patterns, thus leading to complexity in decoding trajectories into text. Template-matching decoding methods, such as SHARK(2) [32], are commonly used for these WGK systems because they are easy to implement and configure. However, these methods are susceptible to decoding inaccuracies for noisy trajectories. While conventional neural-network-based decoders (neural decoders) trained on word-gesture trajectory data have been proposed to improve accuracy, they have their own limitations: they require extensive data for training and deep-learning expertise for implementation. To address these challenges, we propose a novel solution that combines ease of implementation with high decoding accuracy: a generalizable neural decoder enabled by pre-training on large-scale coarsely discretized word-gesture trajectories. This approach produces a ready-to-use WGK decoder that is generalizable across mid-air and on-surface WGK systems in augmented reality (AR) and virtual reality (VR), which is evident by a robust average Top-4 accuracy of 90.4% on four diverse datasets. It significantly outperforms SHARK(2) with a 37.2% enhancement and surpasses the conventional neural decoder by 7.4%. Moreover, the Pre-trained Neural Decoder's size is only 4 MB after quantization, without sacrificing accuracy, and it can operate in real-time, executing in just 97 milliseconds on Quest 3.
引用
收藏
页码:7118 / 7128
页数:11
相关论文
共 65 条
[1]  
Agarwal Aman, 2018, arXiv
[2]  
Alsharif O, 2015, INT CONF ACOUST SPEE, P2076, DOI 10.1109/ICASSP.2015.7178336
[3]  
[Anonymous], 2015, Technical Report
[4]  
[Anonymous], 2023, Use your mac with apple vision pro
[5]  
apple, Apple vision pro
[6]  
Bergstra J., 2011, ADV NEURAL INFORM PR, V24, P2546, DOI DOI 10.5555/2986459.2986743
[7]  
Brown TB, 2020, ADV NEUR IN, V33
[8]  
Chen K, 2019, Arxiv, DOI arXiv:1906.07155
[9]   Exploring Word-gesture Text Entry Techniques in Virtual Reality [J].
Chen, Sibo ;
Wang, Junce ;
Guerra, Santiago ;
Mittal, Neha ;
Prakkamakul, Soravis .
CHI EA '19 EXTENDED ABSTRACTS: EXTENDED ABSTRACTS OF THE 2019 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, 2019,
[10]  
DAVID PA, 1985, AM ECON REV, V75, P332