共 32 条
[1]
ANTOL S, AGRAWAL A, LU J, Et al., VQA: visual questio answering, Proceedings of the IEEE International Confer ence on Computer Vision, pp. 2425-2433, (2015)
[2]
MIKOLOV T, CHEN K, CORRADO G, Et al., Efficient esti mation of word representations in vector space, Proceeding of the 1st International Conference on Learning Representa tions, pp. 1-12, (2013)
[3]
PENNINGTON J, SOCHER R, MANNING C D., Glove: globa vectors for word representation, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532-1543, (2014)
[4]
KENTON J D M W C, TOUTANOVA L K., BERT: pre-training of deep bidirectional transformers for language understanding, Proceedings of NAACL-HLT, pp. 4171-4186, (2019)
[5]
SIMONYAN K, ZISSERMAN A., Very deep convolutional neworks for large-scale image recognition, Proceedings of the 3rd International Conference on Learing Representations, pp. 1-14, (2015)
[6]
HE K, ZHANG X, REN S, Et al., Deep residual learning for image recognition, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770-778, (2016)
[7]
REN S, HE K, Girshick R, Et al., Faster R-CNN: towards real-time object detection with region proposal networks, Advances in Neural Information Processing Systems, pp. 1137-1149, (2017)
[8]
MALINOWSKI M, ROHRBACH M, FRITZ M., Ask your neurons: a neural-based approach to answering questions about images, Proceedings of the IEEE International Conference on Computer Vision, pp. 1-9, (2015)
[9]
GRAVES A., Long short-term memory, Supervised Sequence Labelling with Recurrent Neural Networks, pp. 37-45, (2012)
[10]
REN M, KIROS R, ZEMEL R., Image question answering: a visual semantic embedding model and a new dataset, Advances in Neural Information Processing Systems, (2015)