Enhancing video temporal grounding with large language model-based data augmentation

被引：0

作者：

Tian, Yun ^{[1
]}

Guo, Xiaobo ^{[1
]}

Wang, Jinsong ^{[1
]}

Li, Bin ^{[2
]}

机构：

[1] Changchun Univ Sci & Technol, Sch Optoelect Engn, Changchun 130022, Peoples R China

[2] Chinese Acad Sci, Shenzhen Inst Adv Technol, Shenzhen 518055, Peoples R China

来源：

JOURNAL OF SUPERCOMPUTING | 2025年 / 81卷 / 05期

关键词：

Video temporal grounding; Large language model; Data augmentation; Video description; Semantic enrichment; ANNOTATION; QUALITY;

D O I：

10.1007/s11227-025-07159-0

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Given an untrimmed video and a natural language query, the task of video temporal grounding (VTG) aims to precisely identify the temporal segment in the video that semantically matches the query. Existing datasets for this task often provide natural language queries that are overly simplistic and manually annotated, which lack sufficient semantic richness to fully capture the video's content. This limitation hinders the model's ability to comprehend complex semantic scenarios and degrades its overall performance. To address these challenges, we introduce a novel, low-cost, large language model-based data augmentation method, that can enrich the original samples and expand the dataset without requiring external data. We propose a fine-grained image captioning module with a noise filter to extract unexploited information from videos. Additionally, we design a hierarchical semantic prompting framework to guide GPT-3.5 in producing semantically rich and contextually coherent natural language queries. Our method outperforms the SOTA method MRTNet when combined with 2D-TAN and VSLNet across three public VTG datasets, particularly excelling in complex semantics and long-duration segment localization.

引用

页数：31

共 50 条

[31] Exploring Conditional Language Model Based Data Augmentation Approaches for Hate Speech Classification
D'Sa, Ashwin Geet
Illina, Irina
Fohr, Dominique
Klakow, Dietrich
Ruiter, Dana
TEXT, SPEECH, AND DIALOGUE, TSD 2021, 2021, 12848 : 135 - 146
[32] Improving Machine-Learning Diagnostics with Model-Based Data Augmentation Showcased for a Transformer Fault
Kahlen, Jannis N.
Andres, Michael
Moser, Albert
ENERGIES, 2021, 14 (20)
[33] Enhancing Indian sign language recognition through data augmentation and visual transformer
Singla V.
Bawa S.
Singh J.
Neural Computing and Applications, 2024, 36 (24) : 15103 - 15116
[34] Military reinforcement learning with large language model-based agents: a case of weapon selection
Ma, Jungmok
JOURNAL OF DEFENSE MODELING AND SIMULATION-APPLICATIONS METHODOLOGY TECHNOLOGY-JDMS, 2025,
[35] LLM-BRC: A large language model-based bug report classification framework
Du, Xiaoting
Liu, Zhihao
Li, Chenglong
Ma, Xiangyue
Li, Yingzhuo
Wang, Xinyu
SOFTWARE QUALITY JOURNAL, 2024, 32 (03) : 985 - 1005
[36] Causality-inspired legal provision selection with large language model-based explanation
Wang, Zheng
Ding, Yuanzhi
Wu, Caiyuan
Guo, Yuzhen
Zhou, Wei
ARTIFICIAL INTELLIGENCE AND LAW, 2024,
[37] Geometry of Textual Data Augmentation: Insights from Large Language Models
Feng, Sherry J. H.
Lai, Edmund M. K.
Li, Weihua
ELECTRONICS, 2024, 13 (18)
[38] Large-Scale Language Models for Sarcasm Detection with Data Augmentation
Zhang, Linrui
Copus, Belinda
NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, PT II, NLDB 2024, 2024, 14763 : 1 - 9
[39] ThaiNutriChat: development of a Thai large language model-based chatbot for health food services
Luangaphirom, Thananan
Jocknoi, Lojrutai
Wunchum, Chalermchai
Chokerungreang, Kittitee
Siriborvornratanakul, Thitirat
MULTIMEDIA SYSTEMS, 2024, 30 (05)
[40] Large language model-based interpretable machine learning control in building energy systems
Zhang, Liang
Chen, Zhelun
ENERGY AND BUILDINGS, 2024, 313

← 1 2 3 4 5 →