Semantic Video Retrieval using Deep Learning Techniques

被引：0

作者：

Yasin, Danish

Sohail, Ashbal

Siddiqi, Imran

机构：

来源：

PROCEEDINGS OF 2020 17TH INTERNATIONAL BHURBAN CONFERENCE ON APPLIED SCIENCES AND TECHNOLOGY (IBCAST) | 2020年

关键词：

Semantic Retrieval; Deep Convolutional Neural Networks (CNNs); Long-Short Term Memory Networks (LSTMs); IMAGE;

D O I：

10.1109/ibcast47879.2020.9044601

中图分类号：

T [工业技术];

学科分类号：

08 ;

摘要：

Content based video retrieval has been an active research area for many decades. Unlike tagged-based search engines which rely on user-assigned annotations to retrieve the desired content, content based retrieval systems match the actual content of video with the provided query to fetch the required set of videos. Thanks to the recent advancements in deep learning, the traditional pipeline of content based systems (pre-processing, segmentation, object classification, action recognition etc.) is being replaced by end-to-end trainable systems which are not only effective and robust but also avoid the complex processing in the conventional image based techniques. The present study exploits these developments to develop a semantic video retrieval system accepting natural language queries and retrieving the relevant videos. We focus on key individuals appearing in certain scenarios as queries in the current study. Persons appearing in a video are recognized by tuning FaceNet to our set of images while caption generation is exploited to make sense of the scenario within a given video frame. The outputs of the two modules are combined to generate a description of the frame. During the retrieval phase, natural language queries are provided to the system and the concept of word embeddings is employed to find similar words to those appearing in the query text. For a given query, all videos where the queried individuals and scenarios have appeared are returned by the system. The preliminary experimental study on a collection of 50 videos reported promising retrieval results.

引用

页码：338 / 343

页数：6

共 29 条

[1]

[Anonymous], 2015, arXiv:1504.00325

[2] An ontology-based evidential framework for video indexing using high-level multimodal fusion [J].

Benmokhtar, Rachid ;

Huet, Benoit .

MULTIMEDIA TOOLS AND APPLICATIONS, 2014, 73 (02) :663-689

[3]

Eitel A, 2015, IEEE INT C INT ROBOT, P681, DOI 10.1109/IROS.2015.7353446

[4] QUERY BY IMAGE AND VIDEO CONTENT - THE QBIC SYSTEM [J].

FLICKNER, M ;

SAWHNEY, H ;

NIBLACK, W ;

ASHLEY, J ;

HUANG, Q ;

DOM, B ;

GORKANI, M ;

HAFNER, J ;

LEE, D ;

PETKOVIC, D ;

STEELE, D ;

YANKER, P .

COMPUTER, 1995, 28 (09) :23-32

[5] Video Captioning With Attention-Based LSTM and Semantic Consistency [J].

Gao, Lianli ;

Guo, Zhao ;

Zhang, Hanwang ;

Xu, Xing ;

Shen, Heng Tao .

IEEE TRANSACTIONS ON MULTIMEDIA, 2017, 19 (09) :2045-2055

[6] Attention-based LSTM with Semantic Consistency for Videos Captioning [J].

Guo, Zhao ;

Gao, Lianli ;

Song, Jingkuan ;

Xu, Xing ;

Shao, Jie ;

Shen, Heng Tao .

MM'16: PROCEEDINGS OF THE 2016 ACM MULTIMEDIA CONFERENCE, 2016, :357-361

[7]

He KM, 2017, IEEE I CONF COMP VIS, P2980, DOI [10.1109/TPAMI.2018.2844175, 10.1109/ICCV.2017.322]

[8]

Jabeen S, 2018, INT CONF EMERG TECHN

[9]

Liang M, 2015, PROC CVPR IEEE, P3367, DOI 10.1109/CVPR.2015.7298958

[10] Deep Video Hashing [J].

Liong, Venice Erin ;

Lu, Jiwen ;

Tan, Yap-Peng ;

Zhou, Jie .

IEEE TRANSACTIONS ON MULTIMEDIA, 2017, 19 (06) :1209-1219

← 1 2 3 →