An End-to-End Air Writing Recognition Method Based on Transformer

被引:0
作者
Tan, Xuhang [1 ]
Tong, Jicheng [1 ]
Matsumaru, Takafumi [1 ]
Dutta, Vibekananda [2 ]
He, Xin [1 ]
机构
[1] Waseda Univ, Grad Sch Informat Prod & Syst, Wakamatsu Ku, Kitakyushu, Fukuoka 8080135, Japan
[2] Warsaw Univ Technol, Inst Micromech & Photon, Fac Mechatron, PL-00661 Warsaw, Poland
基金
日本学术振兴会;
关键词
Writing; Character recognition; Task analysis; Visualization; Transformers; Trajectory; Data augmentation; Human computer interaction; Air writing recognition; transformer model; human-computer interaction (HCI);
D O I
10.1109/ACCESS.2023.3321807
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The air-writing recognition task entails the computer's ability to directly recognize and interpret user input generated by finger movements in the air. This form of interaction between humans and computers is considered natural, cost-effective, and immersive within the domain of human-computer interaction (HCI). While conventional air-writing recognition has primarily focused on recognizing individual characters, a recent advancement in 2022 introduced the concept of writing in the air (WiTA) to address continuous air-writing tasks. In this context, we assert that the Transformer-based approach can offer improved performance for the WiTA task. To solve the WiTA task, this study formulated an end-to-end air-writing recognition method called TR-AWR, which leverages the Transformer model. Our proposed method adopts a holistic approach by utilizing video frame sequences as input and generating letter sequences as outputs. To enhance the performance of the WiTA task, our method combines the vision transformer model with the traditional transformer model, while introducing data augmentation techniques for the first time. Our approach achieves a character error rate (CER) of 29.86% and a decoding frames per second (D-fps) value of 194.67 fps. Notably, our method outperforms the baseline models in terms of recognition accuracy while maintaining a certain level of real-time performance. The contributions of this paper are as follows: Firstly, this study is the first to incorporate the Transformer method into continuous air-writing recognition research, thereby reducing overall complexity and attaining improved results. Additionally, we adopt an end-to-end approach that streamlines the entire recognition process. Lastly, we propose specific data augmentation guidelines tailored explicitly for the WiTA task. In summary, our study presents a promising direction for effectively addressing the WiTA task and holds potential for further advancements in this domain.
引用
收藏
页码:109885 / 109898
页数:14
相关论文
共 52 条
  • [1] Agrawal S., 2011, Proceedings of the 9th International Conference on Mobile Systems, Applications, P15, DOI DOI 10.1145/1999995.1999998
  • [2] A Novel Accelerometer-Based Gesture Recognition System
    Akl, Ahmad
    Feng, Chen
    Valaee, Shahrokh
    [J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2011, 59 (12) : 6197 - 6205
  • [3] Trajectory-Based Air-Writing Recognition Using Deep Neural Network and Depth Sensor
    Alam, Md. Shahinur
    Kwon, Ki-Chul
    Alam, Md. Ashraful
    Abbass, Mohammed Y.
    Imtiaz, Shariar Md
    Kim, Nam
    [J]. SENSORS, 2020, 20 (02)
  • [4] Airwriting: Hands-free Mobile Text Input by Spotting and Continuous Recognition of 3d-Space Handwriting with Inertial Sensors
    Amma, Christoph
    Georgi, Marcus
    Schultz, Tanja
    [J]. 2012 16TH INTERNATIONAL SYMPOSIUM ON WEARABLE COMPUTERS (ISWC), 2012, : 52 - 59
  • [5] MC-OCR Challenge 2021: Deep Learning Approach for Vietnamese Receipts OCR
    Bui, Doanh C.
    Dung Truong
    Vo, Nguyen D.
    Khang Nguyen
    [J]. 2021 RIVF INTERNATIONAL CONFERENCE ON COMPUTING AND COMMUNICATION TECHNOLOGIES (RIVF 2021), 2021, : 94 - 99
  • [6] Fast multi-language LSTM-based online handwriting recognition
    Carbune, Victor
    Gonnet, Pedro
    Deselaers, Thomas
    Rowley, Henry A.
    Daryin, Alexander
    Calvo, Marcos
    Wang, Li-Lun
    Keysers, Daniel
    Feuz, Sandro
    Gervais, Philippe
    [J]. INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2020, 23 (02) : 89 - 102
  • [7] Survey on Videos Data Augmentation for Deep Learning Models
    Cauli, Nino
    Recupero, Diego Reforgiato
    [J]. FUTURE INTERNET, 2022, 14 (03)
  • [8] Chelba C, 2014, 15 ANN C INT SPEECH, P2635, DOI DOI 10.21437/INTERSPEECH.2014-564
  • [9] Air-Writing Recognition-Part II: Detection and Recognition of Writing Activity in Continuous Stream of Motion Data
    Chen, Mingyu
    AlRegib, Ghassan
    Juang, Biing-Hwang
    [J]. IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS, 2016, 46 (03) : 436 - 444
  • [10] Air-Writing Recognition-Part I: Modeling and Recognition of Characters, Words, and Connecting Motions
    Chen, Mingyu
    AlRegib, Ghassan
    Juang, Biing-Hwang
    [J]. IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS, 2016, 46 (03) : 403 - 413