Traffic Scenario Understanding and Video Captioning via Guidance Attention Captioning Network

被引：2

作者：

Liu, Chunsheng ^{[1
]}

Zhang, Xiao ^{[1
]}

Chang, Faliang ^{[1
]}

Li, Shuang ^{[2
]}

Hao, Penghui ^{[1
]}

Lu, Yansha ^{[1
]}

Wang, Yinhai ^{[3
]}

机构：

[1] Shandong Univ, Sch Control Sci & Engn, Jinan 250061, Peoples R China

[2] Qilu Univ Technol, Shandong Acad Sci, Sch Informat & Automat Engn, Jinan 250353, Peoples R China

[3] Univ Washington, Dept Civil & Environm Engn, Seattle, WA 98195 USA

来源：

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS | 2024年 / 25卷 / 05期

基金：

中国国家自然科学基金;

关键词：

Traffic scenario understanding; video captioning; guidance captioning; attention mechanism;

D O I：

10.1109/TITS.2023.3323085

中图分类号：

TU [建筑科学];

学科分类号：

0813 ;

摘要：

Describing a traffic scenario from the driver's perspective is a challenging process for Advanced Driving Assistance System (ADAS), involving different sub-tasks of detection, tracking, segmentation, etc. Previous methods mainly focus on independent sub-tasks and have difficulties to comprehensively describe the incidents. In this study, this problem is novelly treated as a video captioning task, and a Guidance Attention Captioning Network (GAC-Network) structure is proposed for describing the incidents in a concise single sentence. In GAC-Network, an Attention based Encoder-Decoder Net (AED-Net) is built as the main network; with the temporal spatial attention mechanisms, the AED-Net make it possible to effectively reject the unimportant traffic behaviors and redundant backgrounds. Considering various driving scenarios, the Spatio-Temporal Layer Normalization is used to improve the generalization ability. To generate captions for incidents in driving, the novel Guidance Module is proposed to boost the encoder-decoder model to generate words in a caption, which have better relationship to the past and future words. Because there is no public dataset for captioning of driving scenarios, the Traffic Video Captioning (TVC) dataset is released for the video captioning task in driving scenarios. Experimental results show that the proposed methods can fulfill the captioning task for complex driving scenarios, and achieve higher performance than the methods for comparison, including at least 2.5%, 1.8%, 3.6%, and 13.1% better results on BLEU_1, METEOR, ROUGE_L and CIDEr, respectively.

引用

页码：3615 / 3627

页数：13

共 58 条

[1]

Bahdanau D, 2016, Arxiv, DOI [arXiv:1409.0473, 10.48550/arXiv.1409.0473, DOI 10.48550/ARXIV.1409.0473]

[2] Visual Content based Video Retrieval on Natural Language Queries [J].

Bansal, Ravi ;

Chakraborty, Sandip .

SAC '19: PROCEEDINGS OF THE 34TH ACM/SIGAPP SYMPOSIUM ON APPLIED COMPUTING, 2019, :212-219

[3] How Teens with Visual Impairments Take, Edit, and Share Photos on Social Media [J].

Bennett, Cynthia L. ;

Jane, E. ;

Mott, Martez E. ;

Cutrell, Edward ;

Morris, Meredith Ringel .

PROCEEDINGS OF THE 2018 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS (CHI 2018), 2018,

[4]

Che ZP, 2019, Arxiv, DOI arXiv:1904.01975

[5]

Chen XL, 2015, Arxiv, DOI arXiv:1504.00325

[6]

Denkowski M., 2014, P 9 WORKSH STAT MACH, P376

[7] Traffic Accident Detection via Self-Supervised Consistency Learning in Driving Scenarios [J].

Fang, Jianwu ;

Qiao, Jiahuan ;

Bai, Jie ;

Yu, Hongkai ;

Xue, Jianru .

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (07) :9601-9614

[8]

Fang JW, 2019, IEEE INT C INTELL TR, P4303, DOI 10.1109/ITSC.2019.8917218

[9] Video Captioning With Attention-Based LSTM and Semantic Consistency [J].

Gao, Lianli ;

Guo, Zhao ;

Zhang, Hanwang ;

Xu, Xing ;

Shen, Heng Tao .

IEEE TRANSACTIONS ON MULTIMEDIA, 2017, 19 (09) :2045-2055

[10] YouTube2Text: Recognizing and Describing Arbitrary Activities Using Semantic Hierarchies and Zero-shot Recognition [J].

Guadarrama, Sergio ;

Krishnamoorthy, Niveda ;

Malkarnenkar, Girish ;

Venugopalan, Subhashini ;

Mooney, Raymond ;

Darrell, Trevor ;

Saenko, Kate .

2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, :2712-2719

← 1 2 3 4 5 6 →