Enhancing video temporal grounding with large language model-based data augmentation

被引:0
|
作者
Tian, Yun [1 ]
Guo, Xiaobo [1 ]
Wang, Jinsong [1 ]
Li, Bin [2 ]
机构
[1] Changchun Univ Sci & Technol, Sch Optoelect Engn, Changchun 130022, Peoples R China
[2] Chinese Acad Sci, Shenzhen Inst Adv Technol, Shenzhen 518055, Peoples R China
关键词
Video temporal grounding; Large language model; Data augmentation; Video description; Semantic enrichment; ANNOTATION; QUALITY;
D O I
10.1007/s11227-025-07159-0
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Given an untrimmed video and a natural language query, the task of video temporal grounding (VTG) aims to precisely identify the temporal segment in the video that semantically matches the query. Existing datasets for this task often provide natural language queries that are overly simplistic and manually annotated, which lack sufficient semantic richness to fully capture the video's content. This limitation hinders the model's ability to comprehend complex semantic scenarios and degrades its overall performance. To address these challenges, we introduce a novel, low-cost, large language model-based data augmentation method, that can enrich the original samples and expand the dataset without requiring external data. We propose a fine-grained image captioning module with a noise filter to extract unexploited information from videos. Additionally, we design a hierarchical semantic prompting framework to guide GPT-3.5 in producing semantically rich and contextually coherent natural language queries. Our method outperforms the SOTA method MRTNet when combined with 2D-TAN and VSLNet across three public VTG datasets, particularly excelling in complex semantics and long-duration segment localization.
引用
收藏
页数:31
相关论文
共 50 条
  • [41] DynamicAug: Enhancing Transfer Learning Through Dynamic Data Augmentation Strategies Based on Model State
    Yu, Xinyi
    Zhao, Haodong
    Zhang, Mingyang
    Wei, Yan
    Zhou, Libo
    Ou, Linlin
    NEURAL PROCESSING LETTERS, 2024, 56 (03)
  • [42] DMDAT: Diffusion Model-Based Data Augmentation Technique for Vision-Based Accident Detection in Vehicular Networks
    Sai, Siva
    Mittal, Uday
    Chamola, Vinay
    IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2025, 74 (02) : 2241 - 2250
  • [43] Action Video Recognition Framework based on NetVLAD with Data Augmentation
    Wang, Fa-fa
    Kong, Jian-lei
    Peng, Shi-yu
    Jin, Xue-bo
    Su, Ting-li
    Bai, Yu-ting
    2018 CHINESE AUTOMATION CONGRESS (CAC), 2018, : 1986 - 1991
  • [44] Neural Language Model Based Training Data Augmentation for Weakly Supervised Early Rumor Detection
    Han, Sooji
    Gao, Jie
    Ciravegna, Fabio
    PROCEEDINGS OF THE 2019 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM 2019), 2019, : 105 - 112
  • [45] Protecting Intellectual Property of Large Language Model-Based Code Generation APIs via Watermarks
    Li, Zongjie
    Wang, Chaozheng
    Wang, Shuai
    Gao, Cuiyun
    PROCEEDINGS OF THE 2023 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, CCS 2023, 2023, : 2336 - 2350
  • [46] How well do large language model-based chatbots perform in oral and maxillofacial radiology?
    Jeong, Hui
    Han, Sang-Sun
    Yu, Youngjae
    Kim, Saejin
    Jeon, Kug Jin
    DENTOMAXILLOFACIAL RADIOLOGY, 2024, 53 (06) : 390 - 395
  • [47] CALM: Context Augmentation with Large Language Model for Named Entity Recognition
    Luiggi, Tristan
    Herserant, Tanguy
    Trani, Thong
    Soulier, Laure
    Guigue, Vincent
    LINKING THEORY AND PRACTICE OF DIGITAL LIBRARIES, PT I, TPDL 2024, 2024, 15177 : 273 - 291
  • [48] Large language model-based approach for human-mobile inspection robot interactive navigation
    Wang T.
    Fan J.
    Zheng P.
    Jisuanji Jicheng Zhizao Xitong/Computer Integrated Manufacturing Systems, CIMS, 2024, 30 (05): : 1587 - 1594
  • [49] Large language model-based planning agent with generative memory strengthens performance in textualized world
    Liu, Junyang
    Hao, Wenning
    Cheng, Kai
    Jin, Dawei
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2025, 148
  • [50] A Generative Data Augmentation Model for Enhancing Chinese Dialect Pronunciation Prediction
    Lin, Chu-Cheng
    Tsai, Richard Tzong-Han
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (04): : 1109 - 1117