共 41 条
- [31] Jin Q., Chen J., Chen S.Z., Et al., Describing videos using multi-modal fusion, Proceedings of the 24th ACM International Conference on Multimedia, pp. 1087-1091, (2016)
- [32] Krishna R., Hata K., Ren F., Et al., Dense-captioning events in videos, Proceedings of the IEEE International Conference on Computer Vision, 1, pp. 706-715, (2017)
- [33] Liu S., Ou X.Y., Qian R.H., Et al., Makeup like a superstar: deep localized makeup transfer network, Proceedings of the 25th International Joint Conference on Artificial Intelligence, pp. 2568-2575, (2016)
- [34] Liu L.Q., Xing J.L., Liu S., Et al., Wow! You are so beautiful today!, ACM Transactions on Multimedia Computing, Communications, and Applications, 11, 1, (2014)
- [35] wikiHow-How to do anything
- [36] Tran D., Bourdev L., Fergus R., Et al., Learning spatiotemporal features with 3D convolutional networks, Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497, (2015)
- [37] Karpathy A., Toderici G., Shetty S., Et al., Large-scale video classification with convolutional neural networks, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1725-1732, (2014)
- [38] Bojanowski P., Lajugie R., Bach F., Et al., Weakly supervised action labeling in videos under ordering constraints, Proceedings of European Conference on Computer Vision, pp. 628-643, (2014)
- [39] Szegedy C., Ioffe S., Vanhoucke V., Et al., Inception-v4, inception-ResNet and the impact of residual connections on learning
- [40] Davis S.B., Mermelstein P., Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences, Readings in Speech Recognition, pp. 65-74, (1990)