共 14 条
- [1] HE K M, ZHANG X Y, REN S Q, Et al., Deep residual learning for image recognition, The IEEE Conference on Computer Vision and Pattern Recognition, pp. 770-778, (2016)
- [2] LIN Y O, LEI H, LI X Y, Et al., Deep learning in NLP: Methods and application, Journal of University of Electronic Science and Technology of China, 46, 6, pp. 913-919, (2017)
- [3] TAKMAZ E, PEZZELLE S, BEINBORN L, Et al., Generating image descriptions via sequential cross-modal alignment guided by human gaze, Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, pp. 4664-4677, (2020)
- [4] ZHOU Y E, WANG M, LIU D Q, Et al., More grounded image captioning by distilling image-text matching model, The IEEE Conference on Computer Vision and Pattern Recognition, pp. 4776-4785, (2020)
- [5] LI X P, SONG J K, GAO L L, Et al., Beyond RNNs: Positional self-attention with co-Attention for video question answering, The 31st Innovative Applications of Artificial Intelligence Conference, pp. 8658-8665, (2019)
- [6] LE T M, LE V, VENKATESH S, Et al., Hierarchical Conditional relation networks for video question answering, The IEEE Conference on Computer Vision and Pattern Recognition, pp. 9969-9978, (2020)
- [7] KOTTUR S, MOURA J, PARIKH D, Et al., Visual coreference resolution in visual dialog using neural module networks, The 15th European Conference on Computer Vision, pp. 160-178, (2018)
- [8] KANG G, LIM J, ZHANG B., Dual attention networks for visual reference resolution in visual dialog, Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, pp. 2024-2033, (2019)
- [9] NIU Y L, ZHANG H W, ZHANG M L, Et al., Recursive visual attention in visual dialog, The IEEE Conference on Computer Vision and Pattern Recognition, pp. 6679-6688, (2019)
- [10] DAS A, KOTTUR S, GUPTA K, Et al., Visual dialog, The IEEE Conference on Computer Vision and Pattern Recognition, pp. 1080-1089, (2017)