共 26 条
[1]
Kojima A., Izumi M., Tamura T., Et al., Generating natural language description of human behavior from video images, Proceedings of 2000 International Conference on Pattern Recognition, pp. 728-731, (2000)
[2]
Aradhye H., Toderici G., Yagnik J., Video2Text: learning to annotate video content, Proceedings of 2009 IEEE International Conference on Data Mining Workshops, pp. 144-151, (2009)
[3]
Krishnamoorthy N., Malkarnenkar G., Mooney R., Et al., Generating natural-language video descriptions using text-mined knowledge, Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence, pp. 541-547, (2013)
[4]
Szegedy C., Liu W., Jia Y., Et al., Going deeper with convolutions, Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1-9, (2015)
[5]
Venugopalan S., Xu H., Donahue J., Et al., Translating videos to natural language using deep recurrent neural networks, Proceedings of the 2015 Annual Conference of the North American Chapter of the ACL, pp. 1494-1504, (2015)
[6]
Donahue J., Hendricks L.A., Rohrbach M., Et al., Long-term recurrent convolutional networks for visual recognition and description, IEEE Transactions on Pattern Analysis & Machine Intelligence, 39, 4, pp. 677-691, (2017)
[7]
Sutskever I., Vinyals O., Le Q.V., Sequence to sequence learning with neural networks, Proceedings of 2014 International Conference on Neural Information Processing Systems, pp. 3104-3112, (2014)
[8]
Hochreiter S., Schmidhuber J., Long short-term memory, Neural Computation, 9, 8, pp. 1735-1780, (1997)
[9]
Yao L., Torabi A., Cho K., Et al., Describing videos by exploiting temporal structure, Proceedings of 2015 IEEE International Conference on Computer Vision, pp. 4507-4515, (2015)
[10]
Zhang H.-G., Li H., Chinese word segmentation method on the basis of bidirectional long-short term memory model, Journal of South China University of Technology(Natural Science Edition), 45, 3, pp. 61-67, (2017)