Socializing the Semantic Gap: A Comparative Survey on Image Tag Assignment, Refinement, and Retrieval

被引:100
作者
Li, Xirong [1 ]
Uricchio, Tiberio [3 ]
Ballan, Lamberto [3 ]
Bertini, Marco [3 ]
Snoek, Cees G. M. [2 ,4 ]
Del Bimbo, Alberto [3 ]
机构
[1] Renmin Univ China, Key Lab Data Engn & Knowledge Engn, Sch Informat, 59 Zhongguancun St, Beijing 100872, Peoples R China
[2] Univ Amsterdam, Intelligent Syst Lab Amsterdam, Sci Pk 904, NL-1012 WX Amsterdam, Netherlands
[3] Univ Florence, Media Integrat & Commun Ctr, Viale Morgagni 65, I-50139 Florence, Italy
[4] Qualcomm Res Netherlands, Amsterdam, Netherlands
关键词
Algorithms; Documentation; Performance; Social media; social tagging; tag relevance; content-based image retrieval; tag assignment; tag refinement; tag retrieval; RELEVANCE; DISCOVERY; FEATURES; SEARCH;
D O I
10.1145/2906152
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Where previous reviews on content-based image retrieval emphasize what can be seen in an image to bridge the semantic gap, this survey considers what people tag about an image. A comprehensive treatise of three closely linked problems (i.e., image tag assignment, refinement, and tag-based image retrieval) is presented. While existing works vary in terms of their targeted tasks and methodology, they rely on the key functionality of tag relevance, that is, estimating the relevance of a specific tag with respect to the visual content of a given image and its social context. By analyzing what information a specific method exploits to construct its tag relevance function and how such information is exploited, this article introduces a two-dimensional taxonomy to structure the growing literature, understand the ingredients of the main works, clarify their connections and difference, and recognize their merits and limitations. For a head-to-head comparison with the state of the art, a new experimental protocol is presented, with training sets containing 10,000, 100,000, and 1 million images, and an evaluation on three test sets, contributed by various research groups. Eleven representative works are implemented and evaluated. Putting all this together, the survey aims to provide an overview of the past and foster progress for the near future.
引用
收藏
页数:39
相关论文
共 121 条
[1]  
Ames M, 2007, CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, VOLS 1 AND 2, P971
[2]  
[Anonymous], 2003, Journal of machine learning research
[3]  
[Anonymous], P CVPR
[4]  
[Anonymous], 2009, Proceedings of the 18th International Conference on World Wide Web, WWW '09, DOI 10.1145/1526709.1526758
[5]  
[Anonymous], ICME
[6]  
[Anonymous], P INT C MULT RETR
[7]  
[Anonymous], 2013, P ACM INT C MULT RET
[8]  
[Anonymous], 2010, ACM SIGKDD Explorations Newsletter
[9]   Multimodal fusion for multimedia analysis: a survey [J].
Atrey, Pradeep K. ;
Hossain, M. Anwar ;
El Saddik, Abdulmotaleb ;
Kankanhalli, Mohan S. .
MULTIMEDIA SYSTEMS, 2010, 16 (06) :345-379
[10]  
Ballan L, 2014, P INT C MULT RETR, P73