共 72 条
[1]
Vaswani A(2017)Attention is all you need Adv Neural Inf Process Syst (NIPS) 30 5998-6008
[2]
Shazeer N(2022)Contextual ensemble network for semantic segmentation Pattern Recogn 122 108290-16
[3]
Parmar N(2022)Image captioning model using attention and object features to mimic human image understanding J Big Data 9 1-570
[4]
Uszkoreit J(2019)Multi-scale deep context convolutional neural networks for semantic segmentation World Wide Web 22 555-5959
[5]
Jones L(2018)Beyond bilinear: Generalized multimodal factorized high-order pooling for visual question answering IEEE Trans Neural Netw Learn Syst 29 5947-15919
[6]
Gomez AN(2021)Transformer in transformer Adv Neural Inf Process Syst (NIPS) 34 15908-796
[7]
Kaiser Ł(2022)CAAN: Context-aware attention network for visual question answering Pattern Recogn 132 108980-73
[8]
Polosukhin I(2021)Dual self-attention with co-attention networks for visual question answering Pattern Recogn 117 107956-1273
[9]
Zhou Q(2022)Dual self-guided attention with sparse question networks for visual question answering IEICE Trans Inf Syst 105 785-3209
[10]
Wu X(2017)Visual genome: Connecting language and vision using crowdsourced dense image annotations Int J Comput Vis (IJCV) 123 32-undefined