Enhancing video temporal grounding with large language model-based data augmentation

被引：0

作者：

Tian, Yun ^{[1
]}

Guo, Xiaobo ^{[1
]}

Wang, Jinsong ^{[1
]}

Li, Bin ^{[2
]}

机构：

[1] Changchun Univ Sci & Technol, Sch Optoelect Engn, Changchun 130022, Peoples R China

[2] Chinese Acad Sci, Shenzhen Inst Adv Technol, Shenzhen 518055, Peoples R China

来源：

JOURNAL OF SUPERCOMPUTING | 2025年 / 81卷 / 05期

关键词：

Video temporal grounding; Large language model; Data augmentation; Video description; Semantic enrichment; ANNOTATION; QUALITY;

D O I：

10.1007/s11227-025-07159-0

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Given an untrimmed video and a natural language query, the task of video temporal grounding (VTG) aims to precisely identify the temporal segment in the video that semantically matches the query. Existing datasets for this task often provide natural language queries that are overly simplistic and manually annotated, which lack sufficient semantic richness to fully capture the video's content. This limitation hinders the model's ability to comprehend complex semantic scenarios and degrades its overall performance. To address these challenges, we introduce a novel, low-cost, large language model-based data augmentation method, that can enrich the original samples and expand the dataset without requiring external data. We propose a fine-grained image captioning module with a noise filter to extract unexploited information from videos. Additionally, we design a hierarchical semantic prompting framework to guide GPT-3.5 in producing semantically rich and contextually coherent natural language queries. Our method outperforms the SOTA method MRTNet when combined with 2D-TAN and VSLNet across three public VTG datasets, particularly excelling in complex semantics and long-duration segment localization.

引用

页数：31

共 50 条

[41] DynamicAug: Enhancing Transfer Learning Through Dynamic Data Augmentation Strategies Based on Model State
Yu, Xinyi
Zhao, Haodong
Zhang, Mingyang
Wei, Yan
Zhou, Libo
Ou, Linlin
NEURAL PROCESSING LETTERS, 2024, 56 (03)
[42] DMDAT: Diffusion Model-Based Data Augmentation Technique for Vision-Based Accident Detection in Vehicular Networks
Sai, Siva
Mittal, Uday
Chamola, Vinay
IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2025, 74 (02) : 2241 - 2250
[43] Action Video Recognition Framework based on NetVLAD with Data Augmentation
Wang, Fa-fa
Kong, Jian-lei
Peng, Shi-yu
Jin, Xue-bo
Su, Ting-li
Bai, Yu-ting
2018 CHINESE AUTOMATION CONGRESS (CAC), 2018, : 1986 - 1991
[44] Neural Language Model Based Training Data Augmentation for Weakly Supervised Early Rumor Detection
Han, Sooji
Gao, Jie
Ciravegna, Fabio
PROCEEDINGS OF THE 2019 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM 2019), 2019, : 105 - 112
[45] Protecting Intellectual Property of Large Language Model-Based Code Generation APIs via Watermarks
Li, Zongjie
Wang, Chaozheng
Wang, Shuai
Gao, Cuiyun
PROCEEDINGS OF THE 2023 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, CCS 2023, 2023, : 2336 - 2350
[46] How well do large language model-based chatbots perform in oral and maxillofacial radiology?
Jeong, Hui
Han, Sang-Sun
Yu, Youngjae
Kim, Saejin
Jeon, Kug Jin
DENTOMAXILLOFACIAL RADIOLOGY, 2024, 53 (06) : 390 - 395
[47] CALM: Context Augmentation with Large Language Model for Named Entity Recognition
Luiggi, Tristan
Herserant, Tanguy
Trani, Thong
Soulier, Laure
Guigue, Vincent
LINKING THEORY AND PRACTICE OF DIGITAL LIBRARIES, PT I, TPDL 2024, 2024, 15177 : 273 - 291
[48] Large language model-based approach for human-mobile inspection robot interactive navigation
Wang T.
Fan J.
Zheng P.
Jisuanji Jicheng Zhizao Xitong/Computer Integrated Manufacturing Systems, CIMS, 2024, 30 (05): : 1587 - 1594
[49] Large language model-based planning agent with generative memory strengthens performance in textualized world
Liu, Junyang
Hao, Wenning
Cheng, Kai
Jin, Dawei
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2025, 148
[50] A Generative Data Augmentation Model for Enhancing Chinese Dialect Pronunciation Prediction
Lin, Chu-Cheng
Tsai, Richard Tzong-Han
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (04): : 1109 - 1117

← 1 2 3 4 5 →