Enhancing video temporal grounding with large language model-based data augmentation

被引:0
|
作者
Tian, Yun [1 ]
Guo, Xiaobo [1 ]
Wang, Jinsong [1 ]
Li, Bin [2 ]
机构
[1] Changchun Univ Sci & Technol, Sch Optoelect Engn, Changchun 130022, Peoples R China
[2] Chinese Acad Sci, Shenzhen Inst Adv Technol, Shenzhen 518055, Peoples R China
关键词
Video temporal grounding; Large language model; Data augmentation; Video description; Semantic enrichment; ANNOTATION; QUALITY;
D O I
10.1007/s11227-025-07159-0
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Given an untrimmed video and a natural language query, the task of video temporal grounding (VTG) aims to precisely identify the temporal segment in the video that semantically matches the query. Existing datasets for this task often provide natural language queries that are overly simplistic and manually annotated, which lack sufficient semantic richness to fully capture the video's content. This limitation hinders the model's ability to comprehend complex semantic scenarios and degrades its overall performance. To address these challenges, we introduce a novel, low-cost, large language model-based data augmentation method, that can enrich the original samples and expand the dataset without requiring external data. We propose a fine-grained image captioning module with a noise filter to extract unexploited information from videos. Additionally, we design a hierarchical semantic prompting framework to guide GPT-3.5 in producing semantically rich and contextually coherent natural language queries. Our method outperforms the SOTA method MRTNet when combined with 2D-TAN and VSLNet across three public VTG datasets, particularly excelling in complex semantics and long-duration segment localization.
引用
收藏
页数:31
相关论文
共 50 条
  • [31] Exploring Conditional Language Model Based Data Augmentation Approaches for Hate Speech Classification
    D'Sa, Ashwin Geet
    Illina, Irina
    Fohr, Dominique
    Klakow, Dietrich
    Ruiter, Dana
    TEXT, SPEECH, AND DIALOGUE, TSD 2021, 2021, 12848 : 135 - 146
  • [32] Improving Machine-Learning Diagnostics with Model-Based Data Augmentation Showcased for a Transformer Fault
    Kahlen, Jannis N.
    Andres, Michael
    Moser, Albert
    ENERGIES, 2021, 14 (20)
  • [33] Enhancing Indian sign language recognition through data augmentation and visual transformer
    Singla V.
    Bawa S.
    Singh J.
    Neural Computing and Applications, 2024, 36 (24) : 15103 - 15116
  • [34] Military reinforcement learning with large language model-based agents: a case of weapon selection
    Ma, Jungmok
    JOURNAL OF DEFENSE MODELING AND SIMULATION-APPLICATIONS METHODOLOGY TECHNOLOGY-JDMS, 2025,
  • [35] LLM-BRC: A large language model-based bug report classification framework
    Du, Xiaoting
    Liu, Zhihao
    Li, Chenglong
    Ma, Xiangyue
    Li, Yingzhuo
    Wang, Xinyu
    SOFTWARE QUALITY JOURNAL, 2024, 32 (03) : 985 - 1005
  • [36] Causality-inspired legal provision selection with large language model-based explanation
    Wang, Zheng
    Ding, Yuanzhi
    Wu, Caiyuan
    Guo, Yuzhen
    Zhou, Wei
    ARTIFICIAL INTELLIGENCE AND LAW, 2024,
  • [37] Geometry of Textual Data Augmentation: Insights from Large Language Models
    Feng, Sherry J. H.
    Lai, Edmund M. K.
    Li, Weihua
    ELECTRONICS, 2024, 13 (18)
  • [38] Large-Scale Language Models for Sarcasm Detection with Data Augmentation
    Zhang, Linrui
    Copus, Belinda
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, PT II, NLDB 2024, 2024, 14763 : 1 - 9
  • [39] ThaiNutriChat: development of a Thai large language model-based chatbot for health food services
    Luangaphirom, Thananan
    Jocknoi, Lojrutai
    Wunchum, Chalermchai
    Chokerungreang, Kittitee
    Siriborvornratanakul, Thitirat
    MULTIMEDIA SYSTEMS, 2024, 30 (05)
  • [40] Large language model-based interpretable machine learning control in building energy systems
    Zhang, Liang
    Chen, Zhelun
    ENERGY AND BUILDINGS, 2024, 313