共 63 条
[1]
Zhang W., Ma C., Wu Q., Yang X., Language-guided navigation via cross-modal grounding and alternate adversarial learning, IEEE Trans. Circuits Syst. Video Technol., 31, 9, pp. 3469-3481, (2021)
[2]
Wang L., He Z., Dang R., Chen H., Liu C., Chen Q., RES-StS: Referring expression speaker via self-training with scorer for goal-oriented vision-language navigation, IEEE Trans. Circuits Syst. Video Technol., 33, 7, pp. 3441-3454, (2023)
[3]
Wang X.E., Jain V., Ie E., Wang W.Y., Kozareva Z., Ravi S., Environment-agnostic multitask learning for natural language grounded navigation, in Proc. Eur. Conf. Comput. Vis. (ECCV)., pp. 413-430, (2020)
[4]
Zhang H., Lu Y., Yu C., Hsu D., Lan X., Zheng N., INVIGORATE: Interactive visual grounding and grasping in clutter, (2021)
[5]
Nawaz H.S., Shi Z., Gan Y., Hirpa A., Dong J., Zheng H., Temporal moment localization via natural language by utilizing video question answers as a special variant and bypassing NLP for corpora, IEEE Trans. Circuits Syst. Video Technol., 32, 9, pp. 6174-6185, (2022)
[6]
Zhao L., Et al., Towards explainable 3D grounded visual question answering: A new benchmark and strong baseline, IEEE Trans. Circuits Syst. Video Technol., 33, 6, pp. 2935-2949, (2022)
[7]
Chen C., Anjum S., Gurari D., Grounding answers for visual questions asked by visually impaired people, in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 19076-19085, (2022)
[8]
Zhang Y., Ji Z., Pang Y., Li X., Consensus knowledge exploitation for partial query based image retrieval, IEEE Trans. Circuits Syst. Video Technol., 33, 12, pp. 7900-7913, (2023)
[9]
Ji Z., Meng C., Zhang Y., Pang Y., Li X., Knowledge-aided momentum contrastive learning for remote-sensing image text retrieval, IEEE Trans. Geosci. Remote Sens., 61, (2023)
[10]
Liu Z., Chen F., Xu J., Pei W., Lu G., Image-text retrieval with cross-modal semantic importance consistency, IEEE Trans. Circuits Syst. Video Technol., 33, 5, pp. 2465-2476, (2023)