Multi-sensor data fusion across dimensions: A novel approach to synopsis generation using sensory data

被引：0

作者：

Ingle, Palash Yuvraj ^{[1
,2
]}

Kim, Young-Gab ^{[1
,2
]}

机构：

[1] Sejong Univ, Dept Comp & Informat Secur, Seoul 05006, South Korea

[2] Sejong Univ, Convergence Engn Intelligent Drone, Seoul 05006, South Korea

来源：

JOURNAL OF INDUSTRIAL INFORMATION INTEGRATION | 2025年 / 46卷

基金：

新加坡国家研究基金会;

关键词：

Drone surveillance; Deep learning; Video fusion; Video synopsis; VIDEO SYNOPSIS; ATTENTION; DEVICES;

D O I：

10.1016/j.jii.2025.100876

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Unmanned aerial vehicles (UAVs) and autonomous ground vehicles are increasingly outfitted with advanced sensors such as LiDAR, cameras, and GPS, enabling real-time object detection, tracking, localization, and navigation. These platforms generate high-volume sensory data, such as video streams and point clouds, that require efficient processing to support timely and informed decision-making. Although video synopsis techniques are widely used for visual data summarization, they encounter significant challenges in multi-sensor environments due to disparities in sensor modalities. To address these limitations, we propose a novel sensory data synopsis framework designed for both UAV and autonomous vehicle applications. The proposed system integrates a dualtask learning model with a real-time sensor fusion module to jointly perform abnormal object segmentation and depth estimation by combining LiDAR and camera data. The framework comprises a sensory fusion algorithm, a 3D-to-2D projection mechanism, and a Metropolis-Hastings-based trajectory optimization strategy to refine object tubes and construct concise, temporally-shifted synopses. This design selectively preserves and repositions salient information across space and time, enhancing synopsis clarity while reducing computational overhead. Experimental evaluations conducted on standard datasets (i.e., KITTI, Cityscapes, and DVS) demonstrate that our framework achieves a favorable balance between segmentation accuracy and inference speed. In comparison with existing studies, it yields superior performance in terms of frame reduction, recall, and F1 score. The results highlight the robustness, real-time capability, and broad applicability of the proposed approach to intelligent surveillance, smart infrastructure, and autonomous mobility systems.

引用

页数：24

共 81 条

[1] Video Summarization Using Deep Neural Networks: A Survey [J].

Apostolidis, Evlampios ;

Adamantidou, Eleni ;

Metsai, Alexandros, I ;

Mezaris, Vasileios ;

Patras, Ioannis .

PROCEEDINGS OF THE IEEE, 2021, 109 (11) :1838-1863

[2] Video synopsis: A survey [J].

Baskurt, Kemal Batuhan ;

Samet, Refik .

COMPUTER VISION AND IMAGE UNDERSTANDING, 2019, 181 :26-38

[3] EMI-LiDAR: Uncovering Vulnerabilities of LiDAR Sensors in Autonomous Driving Setting using Electromagnetic Interference [J].

Bhupathiraju, Sri Hrushikesh Varma ;

Sheldon, Jennifer ;

Bauer, Luke A. ;

Bindschaedler, Vincent ;

Sugawara, Takeshi ;

Rampazzi, Sara .

PROCEEDINGS OF THE 16TH ACM CONFERENCE ON SECURITY AND PRIVACY IN WIRELESS AND MOBILE NETWORKS, WISEC 2023, 2023, :329-340

[4]

Brand I, 2018, IEEE INT C INT ROBOT, P5697, DOI 10.1109/IROS.2018.8593943

[5] HarDNet: A Low Memory Traffic Network [J].

Chao, Ping ;

Kao, Chao-Yang ;

Ruan, Yu-Shan ;

Huang, Chien-Hsiang ;

Lin, Youn-Long .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :3551-3560

[6]

Chen LC, 2017, Arxiv, DOI [arXiv:1706.05587, 10.48550/arXiv.1706.05587, DOI 10.48550/ARXIV.1706.05587]

[7] Multi-View 3D Object Detection Network for Autonomous Driving [J].

Chen, Xiaozhi ;

Ma, Huimin ;

Wan, Ji ;

Li, Bo ;

Xia, Tian .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6526-6534

[8] Deformable Feature Aggregation for Dynamic Multi-modal 3D Object Detection [J].

Chen, Zehui ;

Li, Zhenyu ;

Zhang, Shiquan ;

Fang, Liangji ;

Jiang, Qinhong ;

Zhao, Feng .

COMPUTER VISION, ECCV 2022, PT VIII, 2022, 13668 :628-644

[9] The Cityscapes Dataset for Semantic Urban Scene Understanding [J].

Cordts, Marius ;

Omran, Mohamed ;

Ramos, Sebastian ;

Rehfeld, Timo ;

Enzweiler, Markus ;

Benenson, Rodrigo ;

Franke, Uwe ;

Roth, Stefan ;

Schiele, Bernt .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3213-3223

[10] A Review of PointPillars Architecture for Object Detection from Point Clouds [J].

Desai, Nagaraj ;

Schumann, Thomas ;

Alsheakhali, Mohamed .

2020 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS - TAIWAN (ICCE-TAIWAN), 2020,

← 1 2 3 4 5 6 7 8 9 →