共 53 条
[1]
Li S., Xiao T., Li H., Yang W., Wang X., Identity-aware textual-visual matching with latent co-attention, Proceedings of the IEEE International Conference on Computer Vision, pp. 1890-1899, (2017)
[2]
Wang Z., Zhu A., Zheng Z., Jin J., Xue Z., Hua G., IMG-Net: inner-cross-modal attentional multigranular network for description-based person re-identification, J Electron Imaging, 29, 4, (2020)
[3]
Zhu A., Wang Z., Li Y., Wan X., Jin J., Wang T., Hu F., Hua G., Dssl: Deep surroundings-person separation learning for text-based person retrieval, Proceedings of the 29Th ACM International Conference on Multimedia, pp. 209-217, (2021)
[4]
Ding Z., Ding C., Shao Z., Tao D., Semantically Self-Aligned Network for Text-To-Image Part-Aware Person Re-Identification. Arxiv, 2107, (2021)
[5]
Chen Y., Zhang G., Lu Y., Wang Z., Zheng Y., Tipcb: A simple but effective part-based convolutional baseline for text-based person search, Neurocomputing, 494, pp. 171-181, (2022)
[6]
Dosovitskiy A., Beyer L., Kolesnikov A., Weissenborn D., Zhai X., Unterthiner T., Dehghani M., Minderer M., Heigold G., Gelly S., An Image is Worth 16 × 16 Words: Transformers for Image Recognition at Scale, (2020)
[7]
Devlin J., Chang M.-W., Lee K., Toutanova K., Bert: Pre-Training of Deep Bidirectional Transformers for Language Understanding, (2018)
[8]
Zhang Y., Lu H., Deep cross-modal projection learning for image-text matching, Proceedings of the European Conference on Computer Vision (ECCV)., pp. 686-701, (2018)
[9]
Szegedy C., Vanhoucke V., Ioffe S., Shlens J., Wojna Z., Rethinking the inception architecture for computer vision, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818-2826, (2016)
[10]
Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A.N., Kaiser L., Polosukhin I., Attention is all you need, Adv Neural Inf Process Syst, 30, (2017)