GENERATING COHERENT NATURAL LANGUAGE ANNOTATIONS FOR VIDEO STREAMS

被引：0

作者：

Khan, Muhammad Usman Ghani ^{[1
]}

Zhang, Lei ^{[2
]}

Gotoh, Yoshihiko ^{[1
]}

机构：

[1] Univ Sheffield, Sheffield, S Yorkshire, England

[2] Harbin Engn Univ, Harbin, Heilongjiang, Peoples R China

来源：

2012 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP 2012) | 2012年

关键词：

Video processing; Video annotation; Natural language description; video feature units;

D O I：

暂无

中图分类号：

TB8 [摄影技术];

学科分类号：

0804 ;

摘要：

This contribution addresses generation of natural language annotations for human actions, behaviour and their interactions with other objects observed in video streams. The work starts with implementation of conventional image processing techniques to extract high level features for individual frames. Natural language description of the frame contents is produced based on high level features. Although feature extraction processes are erroneous at various levels, we explore approaches to put them together to produce a coherent description. For extending the approach to description of video streams, units of features are introduced to present coherent, smooth and well phrased descriptions by incorporating spatial and temporal information. Evaluation is made by calculating ROUGE scores between human annotated and machine generated descriptions.

引用

页码：2893 / 2896

页数：4

共 47 条

[41] Multimodal Alignment and Attention-Based Person Search via Natural Language Description
Ji, Zhong
Li, Shengjia
IEEE INTERNET OF THINGS JOURNAL, 2020, 7 (11) : 11147 - 11156
[42] Scene understanding using natural language description based on 3D semantic graph map
Moon, Jiyoun
Lee, Beomhee
INTELLIGENT SERVICE ROBOTICS, 2018, 11 (04) : 347 - 354
[43] Scene understanding using natural language description based on 3D semantic graph map
Jiyoun Moon
Beomhee Lee
Intelligent Service Robotics, 2018, 11 : 347 - 354
[44] American Sign Language Words Recognition of Skeletal Videos Using Processed Video Driven Multi-Stacked Deep LSTM
Abdullahi, Sunusi Bala
Chamnongthai, Kosin
SENSORS, 2022, 22 (04)
[45] Dual Sticky Hierarchical Dirichlet Process Hidden Markov Model and Its Application to Natural Language Description of Motions
Hu, Weiming
Tian, Guodong
Kang, Yongxin
Yuan, Chunfeng
Maybank, Stephen
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (10) : 2355 - 2373
[46] 3D gesture segmentation for word-level Arabic sign language using large-scale RGB video sequences and autoencoder convolutional networks
Abdelbasset Boukdir
Mohamed Benaddy
Ayoub Ellahyani
Othmane El Meslouhi
Mustapha Kardouchi
Signal, Image and Video Processing, 2022, 16 : 2055 - 2062
[47] 3D gesture segmentation for word-level Arabic sign language using large-scale RGB video sequences and autoencoder convolutional networks
Boukdir, Abdelbasset
Benaddy, Mohamed
Ellahyani, Ayoub
El Meslouhi, Othmane
Kardouchi, Mustapha
SIGNAL IMAGE AND VIDEO PROCESSING, 2022, 16 (08) : 2055 - 2062

← 1 2 3 4 5 →