Real-Time Deep Learning-Based Object Recognition in Augmented Reality

被引：0

作者：

Egipko, V ^{[1
]}

Zhdanova, M. ^{[1
,2
]}

Gapon, N. ^{[1
,2
]}

Voronin, V. ^{[1
]}

Semenishchev, E. ^{[1
]}

机构：

[1] Moscow State Univ Technol STANKIN, Ctr Cognit Technol & Machine Vis, Moscow, Russia

[2] Don State Tech Univ, Rostov Na Donu, Russia

来源：

REAL-TIME PROCESSING OF IMAGE, DEPTH, AND VIDEO INFORMATION 2024 | 2024年 / 13000卷

基金：

俄罗斯科学基金会;

关键词：

deep learning; augmented reality; computer vision; object recognition; robotic systems; real-time processing;

D O I：

10.1117/12.3024957

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Augmented reality is a visualization technology that displays information by adding virtual images to the real world. Effective implementation of augmented reality requires recognition of the current scene. Identifying objects in real-time video on computationally limited hardware requires significant effort. One way to solve this problem is to create a hybrid system that, based on machine learning and computer vision technology, processes and analyzes visual data to identify and classify real-world objects. The proposed architecture is based on a combination of the Vuforia augmented system, which provides good performance by balancing prediction accuracy and efficiency. First, the Vuforia neural network architecture allows convenient interaction with AR in Unity and provides initial conditions for detecting 3D objects. The augmented reality construction algorithm is based on the ARCore framework and the OpenGL interface for embedded systems. The system integrates recognition data with an AR platform to display corresponding 3D models, allowing users to interact with them through the functionality of the AR application. This method also involves the development of an enhanced user interface for AR, making the augmented environment more accessible for navigation and control. Experimental research has shown that the proposed method significantly improves the accuracy of object recognition and the ease of working with 3D models in AR.

引用

页数：7

共 15 条

[1] Chen LC, 2017, Arxiv, DOI arXiv:1706.05587
[2] Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation
Chen, Liang-Chieh
Zhu, Yukun
Papandreou, George
Schroff, Florian
Adam, Hartwig
[J]. COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 : 833 - 851
[3] Chen LJ, 2018, ADV NEUR IN, V31
[4] The Cityscapes Dataset for Semantic Urban Scene Understanding
Cordts, Marius
Omran, Mohamed
Ramos, Sebastian
Rehfeld, Timo
Enzweiler, Markus
Benenson, Rodrigo
Franke, Uwe
Roth, Stefan
Schiele, Bernt
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 3213 - 3223
[5] Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[6] Augmenting Phenomenology: Using Augmented Reality to Aid Archaeological Phenomenology in the Landscape
Eve, Stuart
[J]. JOURNAL OF ARCHAEOLOGICAL METHOD AND THEORY, 2012, 19 (04) : 582 - 600
[7] Howard AG, 2017, Arxiv, DOI arXiv:1704.04861
[8] ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design
Ma, Ningning
Zhang, Xiangyu
Zheng, Hai-Tao
Sun, Jian
[J]. COMPUTER VISION - ECCV 2018, PT XIV, 2018, 11218 : 122 - 138
[9] Paszke A, 2016, Arxiv, DOI [arXiv:1606.02147, 10.48550/arXiv.1606.02147]
[10] An adaptive gamma correction for image enhancement
Rahman, Shanto
Rahman, Md Mostafijur
Abdullah-Al-Wadud, M.
Al-Quaderi, Golam Dastegir
Shoyaib, Mohammad
[J]. EURASIP JOURNAL ON IMAGE AND VIDEO PROCESSING, 2016,

← 1 2 →