Gesture2Text: A Generalizable Decoder for Word-Gesture Keyboards in XR Through Trajectory Coarse Discretization and Pre-Training

被引：0

作者：

Shen, Junxiao ^{[1
]}

Khaldi, Khadija ^{[2
]}

Zhou, Enmin ^{[2
]}

Surale, Hemant Bhaskar ^{[2
]}

Karlson, Amy ^{[2
]}

机构：

[1] Univ Bristol, Bristol, England

[2] Meta, Real Labs Res, Burlingame, CA USA

来源：

IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS | 2024年 / 30卷 / 11期

关键词：

Decoding; Trajectory; Keyboards; Accuracy; Sharks; Extended reality; Training; Pre-trained models; text entry; word-gesture keyboard; discretization;

D O I：

10.1109/TVCG.2024.3456198

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Text entry with word-gesture keyboards (WGK) is emerging as a popular method and becoming a key interaction for Extended Reality (XR). However, the diversity of interaction modes, keyboard sizes, and visual feedback in these environments introduces divergent word-gesture trajectory data patterns, thus leading to complexity in decoding trajectories into text. Template-matching decoding methods, such as SHARK(2) [32], are commonly used for these WGK systems because they are easy to implement and configure. However, these methods are susceptible to decoding inaccuracies for noisy trajectories. While conventional neural-network-based decoders (neural decoders) trained on word-gesture trajectory data have been proposed to improve accuracy, they have their own limitations: they require extensive data for training and deep-learning expertise for implementation. To address these challenges, we propose a novel solution that combines ease of implementation with high decoding accuracy: a generalizable neural decoder enabled by pre-training on large-scale coarsely discretized word-gesture trajectories. This approach produces a ready-to-use WGK decoder that is generalizable across mid-air and on-surface WGK systems in augmented reality (AR) and virtual reality (VR), which is evident by a robust average Top-4 accuracy of 90.4% on four diverse datasets. It significantly outperforms SHARK(2) with a 37.2% enhancement and surpasses the conventional neural decoder by 7.4%. Moreover, the Pre-trained Neural Decoder's size is only 4 MB after quantization, without sacrificing accuracy, and it can operate in real-time, executing in just 97 milliseconds on Quest 3.

引用

页码：7118 / 7128

页数：11

共 65 条

[1]

Agarwal Aman, 2018, arXiv

[2]

Alsharif O, 2015, INT CONF ACOUST SPEE, P2076, DOI 10.1109/ICASSP.2015.7178336

[3]

[Anonymous], 2015, Technical Report

[4]

[Anonymous], 2023, Use your mac with apple vision pro

[5]

apple, Apple vision pro

[6]

Bergstra J., 2011, ADV NEURAL INFORM PR, V24, P2546, DOI DOI 10.5555/2986459.2986743

[7]

Brown TB, 2020, ADV NEUR IN, V33

[8]

Chen K, 2019, Arxiv, DOI arXiv:1906.07155

[9] Exploring Word-gesture Text Entry Techniques in Virtual Reality [J].

Chen, Sibo ;

Wang, Junce ;

Guerra, Santiago ;

Mittal, Neha ;

Prakkamakul, Soravis .

CHI EA '19 EXTENDED ABSTRACTS: EXTENDED ABSTRACTS OF THE 2019 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, 2019,

[10]

DAVID PA, 1985, AM ECON REV, V75, P332

← 1 2 3 4 5 6 7 →