Gesture2Text: A Generalizable Decoder for Word-Gesture Keyboards in XR Through Trajectory Coarse Discretization and Pre-Training

被引:0
作者
Shen, Junxiao [1 ]
Khaldi, Khadija [2 ]
Zhou, Enmin [2 ]
Surale, Hemant Bhaskar [2 ]
Karlson, Amy [2 ]
机构
[1] Univ Bristol, Bristol, England
[2] Meta, Real Labs Res, Burlingame, CA USA
关键词
Decoding; Trajectory; Keyboards; Accuracy; Sharks; Extended reality; Training; Pre-trained models; text entry; word-gesture keyboard; discretization;
D O I
10.1109/TVCG.2024.3456198
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Text entry with word-gesture keyboards (WGK) is emerging as a popular method and becoming a key interaction for Extended Reality (XR). However, the diversity of interaction modes, keyboard sizes, and visual feedback in these environments introduces divergent word-gesture trajectory data patterns, thus leading to complexity in decoding trajectories into text. Template-matching decoding methods, such as SHARK(2) [32], are commonly used for these WGK systems because they are easy to implement and configure. However, these methods are susceptible to decoding inaccuracies for noisy trajectories. While conventional neural-network-based decoders (neural decoders) trained on word-gesture trajectory data have been proposed to improve accuracy, they have their own limitations: they require extensive data for training and deep-learning expertise for implementation. To address these challenges, we propose a novel solution that combines ease of implementation with high decoding accuracy: a generalizable neural decoder enabled by pre-training on large-scale coarsely discretized word-gesture trajectories. This approach produces a ready-to-use WGK decoder that is generalizable across mid-air and on-surface WGK systems in augmented reality (AR) and virtual reality (VR), which is evident by a robust average Top-4 accuracy of 90.4% on four diverse datasets. It significantly outperforms SHARK(2) with a 37.2% enhancement and surpasses the conventional neural decoder by 7.4%. Moreover, the Pre-trained Neural Decoder's size is only 4 MB after quantization, without sacrificing accuracy, and it can operate in real-time, executing in just 97 milliseconds on Quest 3.
引用
收藏
页码:7118 / 7128
页数:11
相关论文
共 66 条
  • [1] Abadi M., 2015, TENSORFLOW LARGE SCA
  • [2] Agarwal Aman, 2018, arXiv
  • [3] Alsharif O, 2015, INT CONF ACOUST SPEE, P2076, DOI 10.1109/ICASSP.2015.7178336
  • [4] [Anonymous], 2023, Use your mac with apple vision pro
  • [5] apple, Apple Vision Pro
  • [6] Ba Jimmy Lei, 2016, arXiv
  • [7] Bergstra J., 2011, NIPS 11 P 24 INT C N, V24, P2546
  • [8] Brown TB, 2020, ADV NEUR IN, V33
  • [9] Chen K, 2019, Arxiv, DOI [arXiv:1906.07155, DOI 10.48550/ARXIV.1906.07155]
  • [10] Exploring Word-gesture Text Entry Techniques in Virtual Reality
    Chen, Sibo
    Wang, Junce
    Guerra, Santiago
    Mittal, Neha
    Prakkamakul, Soravis
    [J]. CHI EA '19 EXTENDED ABSTRACTS: EXTENDED ABSTRACTS OF THE 2019 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, 2019,