From a Visual Scene to a Virtual Representation: A Cross-Domain Review

被引:2
作者
Pereira, Americo [1 ,2 ]
Carvalho, Pedro [1 ,3 ]
Pereira, Nuno [1 ,3 ]
Viana, Paula [1 ,3 ]
Corte-Real, Luis [1 ,2 ]
机构
[1] Inst Syst & Comp Engn Technol & Sci INESC TEC, Ctr Telecommun & Multimedia, P-4200465 Porto, Portugal
[2] Univ Porto, Fac Engn, P-4099002 Porto, Portugal
[3] Polytech Porto, ISEP, P-4249015 Porto, Portugal
关键词
Computer vision; datasets; scene analysis; scene reconstruction; visual scene understanding; 3D RECONSTRUCTION; OBJECT DETECTION; HUMAN POSE; ATTRIBUTE; TRACKING; RECOGNITION; ONTOLOGY; DATABASE; DATASETS;
D O I
10.1109/ACCESS.2023.3283495
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The widespread use of smartphones and other low-cost equipment as recording devices, the massive growth in bandwidth, and the ever-growing demand for new applications with enhanced capabilities, made visual data a must in several scenarios, including surveillance, sports, retail, entertainment, and intelligent vehicles. Despite significant advances in analyzing and extracting data from images and video, there is a lack of solutions able to analyze and semantically describe the information in the visual scene so that it can be efficiently used and repurposed. Scientific contributions have focused on individual aspects or addressing specific problems and application areas, and no cross-domain solution is available to implement a complete system that enables information passing between cross-cutting algorithms. This paper analyses the problem from an end-to-end perspective, i.e., from the visual scene analysis to the representation of information in a virtual environment, including how the extracted data can be described and stored. A simple processing pipeline is introduced to set up a structure for discussing challenges and opportunities in different steps of the entire process, allowing to identify current gaps in the literature. The work reviews various technologies specifically from the perspective of their applicability to an end-to-end pipeline for scene analysis and synthesis, along with an extensive analysis of datasets for relevant tasks.
引用
收藏
页码:57916 / 57933
页数:18
相关论文
共 233 条
[1]   Multi-Task CNN Model for Attribute Prediction [J].
Abdulnabi, Abrar H. ;
Wang, Gang ;
Lu, Jiwen ;
Jia, Kui .
IEEE TRANSACTIONS ON MULTIMEDIA, 2015, 17 (11) :1949-1959
[2]  
Abu-El-Haija S., 2016, YouTube-8m: A Large-Scale Video Classification Benchmark
[3]  
Andrews M, 2019, Arxiv, DOI arXiv:1909.06273
[4]   PoseTrack: A Benchmark for Human Pose Estimation and Tracking [J].
Andriluka, Mykhaylo ;
Iqbal, Umar ;
Insafutdinov, Eldar ;
Pishchulin, Leonid ;
Milan, Anton ;
Gall, Juergen ;
Schiele, Bernt .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :5167-5176
[5]   2D Human Pose Estimation: New Benchmark and State of the Art Analysis [J].
Andriluka, Mykhaylo ;
Pishchulin, Leonid ;
Gehler, Peter ;
Schiele, Bernt .
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :3686-3693
[6]  
Antoniou G, 2003, IN HAND I S, P67
[7]  
AT&T Research and Lucent Bell Labs, 2022, DOT LANG
[8]   Scan2CAD: Learning CAD Model Alignment in RGB-D Scans [J].
Avetisyan, Armen ;
Dahnert, Manuel ;
Dai, Angela ;
Savva, Manolis ;
Chang, Angel X. ;
Niessner, Matthias .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :2609-2618
[9]  
Beal J., 2020, PREPRINT
[10]  
Bewley A, 2016, IEEE IMAGE PROC, P3464, DOI 10.1109/ICIP.2016.7533003