Clustering and retrieval of video shots based on natural stimulus fMRI

被引：9

作者：

Han, Junwei ^{[1
]}

Ji, Xiang ^{[1
]}

Hu, Xintao ^{[1
]}

Han, Jungong ^{[2
]}

Liu, Tianming ^{[3
,4
]}

机构：

[1] Northwestern Polytech Univ, Sch Automat, Xian 710072, Peoples R China

[2] Civolut Technol, Eindhoven, Netherlands

[3] Univ Georgia, Dept Comp Sci, Athens, GA 30602 USA

[4] Univ Georgia, Bioimaging Res Ctr, Athens, GA 30602 USA

来源：

NEUROCOMPUTING | 2014年 / 144卷

基金：

美国国家科学基金会; 美国国家卫生研究院; 中国国家自然科学基金; 中国博士后科学基金;

关键词：

Video clustering; Video retrieval; Functional magnetic resonance imaging; Feature integration; IMAGE RETRIEVAL; CONNECTIVITY; FEATURES;

D O I：

10.1016/j.neucom.2013.11.052

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Functional magnetic resonance imaging (fMRI) is a powerful tool to probe the human brain's perception and cognition. Besides being extensively exploited in the clinical applications, fMRI technique is also useful to human's ordinary life. In this paper, we investigate a novel application of leveraging fMRI techniques to video clustering and retrieval. In the proposed work, we successfully integrate semantic human-centric features derived from natural stimulus fMRI data and low-level visual-audio features to facilitate video clustering and retrieval, which is a significant innovation compared to the previous works relying on either fMRI-derived features or low-level visual-audio features. Our system consists of several algorithmic modules. First, fMRI data when the subjects are watching video shot samples are acquired. Then a newly developed brain networks localization system is employed to locate the cortical regions of interests (ROIs) for each individual subject. The functional interactions computed by wavelet transform coherence are quantified, from which the human-centric features are derived. Afterwards, the Gaussian process regression model mapping visual-audio feature space to an fMRI-derived feature space is trained, given the training samples. The trained model is then adopted to predict fMRI-derived features for videos without the fMRI data. Finally, the multi-modal spectral clustering and multi-modal ranking algorithm are adopted and proposed to integrate these two heterogeneous features for video clustering and retrieval, respectively. Our experiment on TRECVID database has demonstrated the precision of video clustering and retrieval can be substantially improved by integration of visual-audio features and fMRI-derived features. (C) 2014 Elsevier B.V. All rights reserved.

引用

页码：128 / 137

页数：10

共 40 条

[1] Quantifying Colocalization by Correlation: The Pearson Correlation Coefficient is Superior to the Mander's Overlap Coefficient [J].

Adler, Jeremy ;

Parmryd, Ingela .

CYTOMETRY PART A, 2010, 77A (08) :733-742

[2]

[Anonymous], P TEXT RETRIEVAL C

[3]

[Anonymous], 2007, COMPUTATIONAL METHOD

[4]

[Anonymous], P 20 ACM INT C MULT

[5]

[Anonymous], 2010, P 18 ACM INT C MULTI, DOI 10.1145/1873951.1874016

[6] SURF: Speeded up robust features [J].

Bay, Herbert ;

Tuytelaars, Tinne ;

Van Gool, Luc .

COMPUTER VISION - ECCV 2006 , PT 1, PROCEEDINGS, 2006, 3951 :404-417

[7] Twin Gaussian Processes for Structured Prediction [J].

Bo, Liefeng ;

Sminchisescu, Cristian .

INTERNATIONAL JOURNAL OF COMPUTER VISION, 2010, 87 (1-2) :28-52

[8]

Chang C, 2010, FILM COMMENT, V46, P81

[9] Kernel-based distance metric learning for content-based image retrieval [J].

Chang, Hong ;

Yeung, Dit-Yan .

IMAGE AND VISION COMPUTING, 2007, 25 (05) :695-703

[10] Spectral clustering: A semi-supervised approach [J].

Chen, Weifu ;

Feng, Guocan .

NEUROCOMPUTING, 2012, 77 (01) :229-242

← 1 2 3 4 →