Lightweight Text Spotting for Interactive User Experience in Mixed Reality

被引:0
作者
Chen, Xi-Wen [1 ]
Chen, Jian-Yu [2 ]
Lin, Yu-Kai [3 ]
Huang, Chih-Wei [2 ]
Chern, Jann-Long [4 ]
机构
[1] Natl Cent Univ, Dept Math, Taoyuan, Taiwan
[2] Natl Cent Univ, Dept Commun Engn, Taoyuan, Taiwan
[3] Natl Yang Ming Chiao Tung Univ, Coll Comp Sci, Hsinchu, Taiwan
[4] Natl Taiwan Normal Univ, Dept Math, Taipei, Taiwan
来源
2023 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS, ICCE | 2023年
关键词
Augmented reality; mixed reality; text spotting; client-server model;
D O I
10.1109/ICCE56470.2023.10043519
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
We propose a machine learning-aided semantic understanding framework of surrounding scenes for intelligent human-computer interaction in mixed reality (MR). The proposed framework perceives semantic information from the frontview camera of MR glasses with fast and accurate machine learning-based scene text spotting models. Furthermore, it allows MR glasses to generate corresponding virtual objects automatically to coincide with the surrounding scenes without further user intervention. Moreover, for near real-time computing capability, scene text spotting models serve as a remote service under the client-server model in the framework to break through the computing bottleneck of wearable devices. We demonstrate the framework with Microsoft HoloLens 2, and experiment results show its feasibility in improving user experience under selfcollected real-world scenarios. In addition, the proposed clientserver architecture provides 0.77 seconds of computational time per frame on average, which is not only on average 11.8 times faster than the client-only architecture but also achieves near real-time computation. To investigate the usability of text spotting algorithms in real-world applications, we also compare several state-of-the-art scene text spotting approaches regarding recognition precision and computational time.
引用
收藏
页数:5
相关论文
共 14 条
[1]   Effectiveness of the HoloLens mixed-reality headset in minimally invasive surgery: a simulation-based feasibility study [J].
Al Janabi, Hasaneen Fathy ;
Aydin, Abdullatif ;
Palaneer, Sharanya ;
Macchione, Nicola ;
Al-Jabir, Ahmed ;
Khan, Muhammad Shamim ;
Dasgupta, Prokar ;
Ahmed, Kamran .
SURGICAL ENDOSCOPY AND OTHER INTERVENTIONAL TECHNIQUES, 2020, 34 (03) :1143-1149
[2]  
Eckert M., 2018, P 11 INT C BIOM ENG, V5, P555, DOI 10. 5220/0006655605550561
[3]  
Imam Mustafa, 2020, INT J SCI ENG RES, V11
[4]   Students' Learning Experience in a Mixed Reality Environment: Drivers and Barriers [J].
John, Blooma ;
Kurian, Jayan C. ;
Fitzgerald, Robert ;
Goh, Dion Hoe Lian .
COMMUNICATIONS OF THE ASSOCIATION FOR INFORMATION SYSTEMS, 2022, 50 :511-535
[5]   Mask TextSpotter v3: Segmentation Proposal Network for Robust Scene Text Spotting [J].
Liao, Minghui ;
Pang, Guan ;
Huang, Jing ;
Hassner, Tal ;
Bai, Xiang .
COMPUTER VISION - ECCV 2020, PT XI, 2020, 12356 :706-722
[6]   FOTS: Fast Oriented Text Spotting with a Unified Network [J].
Liu, Xuebo ;
Liang, Ding ;
Yan, Shi ;
Chen, Dagui ;
Qiao, Yu ;
Yan, Junjie .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :5676-5685
[7]   ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network [J].
Liu, Yuliang ;
Chen, Hao ;
Shen, Chunhua ;
He, Tong ;
Jin, Lianwen ;
Wang, Liangwei .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :9806-9815
[8]   Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes [J].
Lyu, Pengyuan ;
Liao, Minghui ;
Yao, Cong ;
Wu, Wenhao ;
Bai, Xiang .
COMPUTER VISION - ECCV 2018, PT XIV, 2018, 11218 :71-88
[9]  
Munsinger B., 2019, P 11 INT C VIRT WORL, P1
[10]  
Naritomi S, 2020, 2020 IEEE CONFERENCE ON VIRTUAL REALITY AND 3D USER INTERFACES WORKSHOPS (VRW 2020), P819, DOI [10.1109/VRW50115.2020.00-15, 10.1109/VRW50115.2020.00260]