共 25 条
- [11] Bahdanau D, Cho K, Bengio Y., Neural Machine Translation by Jointly Learning to Align and Translate
- [12] Schuster M, Paliwal KK., Bidirectional recurrent neural networks, IEEE Transactions on Signal Processing, 45, 17, pp. 2673-2681, (1997)
- [13] Soomro K, Zamir A R, Shah M., UCF101: A Dataset of 101 Human Actions Classes From Videos in the Wild
- [14] Kuehne H, Jhuang H, Garrote E, Et al., HMDB: a large video database for human motion recognition, International Conference on Computer Vision, pp. 2556-2563, (2011)
- [15] Deng J, Dong W, Socher R, Et al., ImageNet: A large-scale hierarchical image database, Computer Vision and Pattern Recognition, pp. 248-255, (2009)
- [16] Xiang L, Chuang G, Et al., Multimodal keyless attention fusion for video classification, 32nd AAAI Conference on Artificial Intelligence, pp. 7202-7209, (2018)
- [17] Yuan Y, Wang D, Wang Q., Memory-Augmented Temporal Dynamic Learning for Action Recognition
- [18] Fan L, Huang W, Gan C, Et al., End-to-end learning of motion representation for video understanding, Computer Vision and Pattern Recognition, pp. 6016-6025, (2018)
- [19] Sengupta B, Qian Y., Pillar Networks++: Distributed Non-parametric Deep and Wide Networks
- [20] Peng X, Wang L, Wang X, Et al., Bag of visual words and fusion methods for action recognition: Comprehensive study and good practice, Computer Vision and Image Understanding, 150, 2016, pp. 109-125, (2016)