GENERATING COHERENT NATURAL LANGUAGE ANNOTATIONS FOR VIDEO STREAMS

被引:0
|
作者
Khan, Muhammad Usman Ghani [1 ]
Zhang, Lei [2 ]
Gotoh, Yoshihiko [1 ]
机构
[1] Univ Sheffield, Sheffield, S Yorkshire, England
[2] Harbin Engn Univ, Harbin, Heilongjiang, Peoples R China
来源
2012 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP 2012) | 2012年
关键词
Video processing; Video annotation; Natural language description; video feature units;
D O I
暂无
中图分类号
TB8 [摄影技术];
学科分类号
0804 ;
摘要
This contribution addresses generation of natural language annotations for human actions, behaviour and their interactions with other objects observed in video streams. The work starts with implementation of conventional image processing techniques to extract high level features for individual frames. Natural language description of the frame contents is produced based on high level features. Although feature extraction processes are erroneous at various levels, we explore approaches to put them together to produce a coherent description. For extending the approach to description of video streams, units of features are introduced to present coherent, smooth and well phrased descriptions by incorporating spatial and temporal information. Evaluation is made by calculating ROUGE scores between human annotated and machine generated descriptions.
引用
收藏
页码:2893 / 2896
页数:4
相关论文
共 47 条
  • [41] Multimodal Alignment and Attention-Based Person Search via Natural Language Description
    Ji, Zhong
    Li, Shengjia
    IEEE INTERNET OF THINGS JOURNAL, 2020, 7 (11) : 11147 - 11156
  • [42] Scene understanding using natural language description based on 3D semantic graph map
    Moon, Jiyoun
    Lee, Beomhee
    INTELLIGENT SERVICE ROBOTICS, 2018, 11 (04) : 347 - 354
  • [43] Scene understanding using natural language description based on 3D semantic graph map
    Jiyoun Moon
    Beomhee Lee
    Intelligent Service Robotics, 2018, 11 : 347 - 354
  • [44] American Sign Language Words Recognition of Skeletal Videos Using Processed Video Driven Multi-Stacked Deep LSTM
    Abdullahi, Sunusi Bala
    Chamnongthai, Kosin
    SENSORS, 2022, 22 (04)
  • [45] Dual Sticky Hierarchical Dirichlet Process Hidden Markov Model and Its Application to Natural Language Description of Motions
    Hu, Weiming
    Tian, Guodong
    Kang, Yongxin
    Yuan, Chunfeng
    Maybank, Stephen
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (10) : 2355 - 2373
  • [46] 3D gesture segmentation for word-level Arabic sign language using large-scale RGB video sequences and autoencoder convolutional networks
    Abdelbasset Boukdir
    Mohamed Benaddy
    Ayoub Ellahyani
    Othmane El Meslouhi
    Mustapha Kardouchi
    Signal, Image and Video Processing, 2022, 16 : 2055 - 2062
  • [47] 3D gesture segmentation for word-level Arabic sign language using large-scale RGB video sequences and autoencoder convolutional networks
    Boukdir, Abdelbasset
    Benaddy, Mohamed
    Ellahyani, Ayoub
    El Meslouhi, Othmane
    Kardouchi, Mustapha
    SIGNAL IMAGE AND VIDEO PROCESSING, 2022, 16 (08) : 2055 - 2062