Lip Segmentation under MAP-MRF Framework with Automatic Selection of Local Observation Scale and Number of Segments

被引：15

作者：

Cheung, Yiu-ming ^{[1
,2
]}

Li, Meng ^{[3
]}

Cao, Xiaochun ^{[4
]}

You, Xinge ^{[5
]}

机构：

[1] Hong Kong Baptist Univ, Inst Computat & Theoret Studies, Dept Comp Sci, Hong Kong, Hong Kong, Peoples R China

[2] Hong Kong Baptist Univ, Beijing Normal Univ, United Int Coll, Zhuhai, Peoples R China

[3] Hong Kong Baptist Univ, Dept Comp Sci, Hong Kong, Hong Kong, Peoples R China

[4] Chinese Acad Sci, Inst Informat Engn, State Key Lab Informat Secur, Beijing 100093, Peoples R China

[5] Huazhong Univ Sci & Technol, Dept Elect & Informat Engn, Wuhan 430074, Peoples R China

来源：

IEEE TRANSACTIONS ON IMAGE PROCESSING | 2014年 / 23卷 / 08期

基金：

美国国家科学基金会;

关键词：

Lip segmentation; MAP-MRF framework; number of segments; local scale selection; SPEAKER IDENTIFICATION; FEATURES; IMAGES; EXTRACTION; TRACKING;

D O I：

10.1109/TIP.2014.2331137

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper addresses the problem of segmenting lip region from frontal human face image. Supposing each pixel of the target image has an optimal local scale from the segmentation viewpoint, we treat the lip segmentation problem as a combination of observation scale selection and observed data classification. Accordingly, we propose a hierarchical multiscale Markov random field (MRF) model to represent the membership map of each input pixel to a specific segment and localscale map simultaneously. Subsequently, lip segmentation can be formulated as an optimal problem in the maximum a posteriori (MAP)-MRF framework. Then, we present a rival-penalized iterative algorithm to implement the segmentation, which is independent of the number of predefined segments. The proposed method mainly features two aspects: 1) its performance is independent of the predefined number of segments, and 2) it takes into account the local optimal observation scale for each pixel. Finally, we conduct the experiments on four benchmark databases, i.e. AR, CVL, GTAV, and VidTIMIT. Experimental results show that the proposed method is robust to the segment number that changes with a speaker's appearance, and can enhance the segmentation accuracy by taking advantage of the local optimal observation scale information.

引用

页码：3397 / 3411

页数：15

共 49 条

[1]

Agoston M., 2005, Computer graphics and geometric modeling

[2]

[Anonymous], P 2 INT C COMP CONTR

[3]

[Anonymous], 1998, 24 CVC U AUT BARC

[4]

[Anonymous], 1999, Morphological Image Analysis: Principles and Applications

[5]

Bakshi S., 2011, P ANN IEEE IND C DEC, P1

[6]

Beaumesnil B, 2006, INT C PATT RECOG, P219

[7]

BESAG J, 1974, J ROY STAT SOC B MET, V36, P192

[8]

Bouvier C, 2007, IEEE IMAGE PROC, P1997

[9]

Calinski T., 1974, Communications in Statistics-theory and Methods, V3, P1, DOI [10.1080/03610927408827101, DOI 10.1080/03610927408827101]

[10] Discriminative analysis of lip motion features for speaker identification and speech-reading [J].

Cetinguel, H. Ertan ;

Yemez, Yuecel ;

Erzin, Engin ;

Tekalp, A. Murat .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2006, 15 (10) :2879-2891

← 1 2 3 4 5 →