Video Moment Retrieval With Noisy Labels

被引:35
作者
Pan, Wenwen [1 ]
Zhao, Zhou [1 ]
Huang, Wencan [1 ]
Zhang, Zhu [1 ]
Fu, Liyong [2 ,3 ]
Pan, Zhigeng [1 ]
Yu, Jun [4 ]
Wu, Fei [1 ]
机构
[1] Zhejiang Univ, Coll Comp Sci, Hangzhou 310027,, Peoples R China
[2] Chinese Acad Forestry, Inst Forest Resource Informat Tech, Beijing 100091, Peoples R China
[3] Natl Forestry & Grassland Adm, Key Lab Forest Management & Growth Modeling, Beijing 100091, Peoples R China
[4] Hangzhou Dianzi Univ, Coll Comp Sci, Hangzhou 310018, Peoples R China
基金
浙江省自然科学基金; 中国国家自然科学基金;
关键词
Noise measurement; Annotations; Training; Manuals; Feature extraction; Deep learning; Task analysis; Co-teaching; feature pyramid network; multilevel losses; noisy label learning; video moment retrieval (VMR); RECURRENT NEURAL-NETWORK; SYLVESTER EQUATION; TIME CONVERGENCE; MATRIX EQUATIONS; FINITE-TIME; DESIGN; ALGORITHM; DYNAMICS; MODELS;
D O I
10.1109/TNNLS.2022.3212900
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video moment retrieval (VMR) aims to localize the target moment in an untrimmed video according to the given nature language query. The existing algorithms typically rely on clean annotations to train their models. However, making annotations by human labors may introduce much noise. Thus, the video moment retrieval models will not be well trained in practice. In this article, we present a simple yet effective video moment retrieval framework via bottom-up schema, which is in end-to-end manners and robust to noisy label training. Specifically, we extract the multimodal features by syntactic graph convolutional networks and multihead attention layers, which are fused by the cross gates and the bilinear approach. Then, the feature pyramid networks are constructed to encode plentiful scene relationships and capture high semantics. Furthermore, to mitigate the effects of noisy annotations, we devise the multilevel losses characterized by two levels: a frame-level loss that improves noise tolerance and an instance-level loss that reduces adverse effects of negative instances. For the frame level, we adopt the Gaussian smoothing to regard noisy labels as soft labels through the partial fitting. For the instance level, we exploit a pair of structurally identical models to let them teach each other during iterations. This leads to our proposed robust video moment retrieval model, which experimentally and significantly outperforms the state-of-the-art approaches on standard public datasets ActivityCaption and textually annotated cooking scene (TACoS). We also evaluate the proposed approach on the different manual annotation noises to further demonstrate the effectiveness of our model.
引用
收藏
页码:6779 / 6791
页数:13
相关论文
共 38 条
  • [21] Varga A, 2000, PROCEEDINGS OF THE 2000 IEEE INTERNATIONAL SYMPOSIUM ON COMPUTER-AIDED CONTROL SYSTEM DESIGN, P13
  • [22] A RECURRENT NEURAL-NETWORK FOR COMPUTING PSEUDOINVERSE MATRICES
    WU, G
    WANG, J
    HOOTMAN, J
    [J]. MATHEMATICAL AND COMPUTER MODELLING, 1994, 20 (01) : 13 - 21
  • [23] Two New Zhang Neural Networks for Solving Time-Varying Linear Equations and Inequalities Systems
    Wu, Wenqi
    Zheng, Bing
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (08) : 4957 - 4965
  • [24] Improved recurrent neural networks for solving Moore-Penrose inverse of real-time full-rank matrix
    Wu, Wenqi
    Zheng, Bing
    [J]. NEUROCOMPUTING, 2020, 418 : 221 - 231
  • [25] Finite-Time and Predefined-Time Convergence Design for Zeroing Neural Network: Theorem, Method, and Verification
    Xiao, Lin
    Cao, Yingkun
    Dai, Jianhua
    Jia, Lei
    Tan, Haiyan
    [J]. IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2021, 17 (07) : 4724 - 4732
  • [26] Design, verification and robotic application of a novel recurrent neural network for computing dynamic Sylvester equation
    Xiao, Lin
    Zhang, Zhijun
    Zhang, Zili
    Li, Weibing
    Li, Shuai
    [J]. NEURAL NETWORKS, 2018, 105 : 185 - 196
  • [27] Zeroing Neural Network for Solving Time-Varying Linear Equation and Inequality Systems
    Xu, Feng
    Li, Zexin
    Nie, Zhuoyun
    Shao, Hui
    Guo, Dongsheng
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (08) : 2346 - 2357
  • [28] New Zeroing Neural Network Models for Solving Nonstationary Sylvester Equation With Verifications on Mobile Manipulators
    Yan, Xiaogang
    Liu, Mei
    Jin, Long
    Li, Shuai
    Hu, Bin
    Zhang, Xin
    Huang, Zhiguan
    [J]. IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2019, 15 (09) : 5011 - 5022
  • [29] GNN Model for Time-Varying Matrix Inversion With Robust Finite-Time Convergence
    Zhang, Yinyan
    Li, Shuai
    Weng, Jian
    Liao, Bolin
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (01) : 559 - 569
  • [30] Design and analysis of a general recurrent neural network model for time-varying matrix inversion
    Zhang, YN
    Ge, SS
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS, 2005, 16 (06): : 1477 - 1490