Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books

被引:1293
作者
Zhu, Yukun [1 ]
Kiros, Ryan [1 ]
Zemel, Richard [1 ]
Salakhutdinov, Ruslan [1 ]
Urtasun, Raquel [1 ]
Torralba, Antonio [2 ]
Fidler, Sanja [1 ]
机构
[1] Univ Toronto, Toronto, ON M5S 1A1, Canada
[2] MIT, Cambridge, MA 02139 USA
来源
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV) | 2015年
关键词
VIDEOS;
D O I
10.1109/ICCV.2015.11
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Books are a rich source of both fine-grained information, how a character, an object or a scene looks like, as well as high-level semantics, what someone is thinking, feeling and how these states evolve through a story. This paper aims to align books to their movie releases in order to provide rich descriptive explanations for visual content that go semantically far beyond the captions available in current datasets. To align movies and books we exploit a neural sentence embedding that is trained in an unsupervised way from a large corpus of books, as well as a video-text neural embedding for computing similarities between movie clips and sentences in the book. We propose a context-aware CNN to combine information from multiple sources. We demonstrate good quantitative performance for movie/book alignment and show several qualitative examples that showcase the diversity of tasks our model can be used for.
引用
收藏
页码:19 / 27
页数:9
相关论文
共 42 条
[1]  
[Anonymous], 2015, P IEEE C COMP VIS PA
[2]  
[Anonymous], 2013, EMNLP
[3]  
[Anonymous], 2014, Transactions of the Association for Computational Linguistics, DOI [DOI 10.1162/TACLA00177, DOI 10.1162/TACL_A_00177]
[4]  
[Anonymous], 2013, CVPR
[5]  
[Anonymous], 2014, Advances in neural information processing systems
[6]  
[Anonymous], 2015, CVPR
[7]  
Bahdanau D, 2016, Arxiv, DOI [arXiv:1409.0473, DOI 10.48550/ARXIV.1409.0473, 10.48550/ARXIV.1409.0473]
[8]  
Bojanowski P, 2014, LECT NOTES COMPUT SC, V8693, P628, DOI 10.1007/978-3-319-10602-1_41
[9]  
Chov K., 2014, EMNLP
[10]  
Chung J., 2014, NIPS 2014 WORKSH DEE, DOI DOI 10.48550/ARXIV.1412.3555