Semantic-Relation Transformer for Visible and Infrared Fused Image Quality Assessment

被引:14
作者
Chang, Zhihao [1 ]
Yang, Shuyuan [1 ]
Feng, Zhixi [1 ]
Gao, Quanwei [1 ]
Wang, Shengzhe [2 ]
Cui, Yuyong [2 ]
机构
[1] Xidian Univ, Sch Artificial Intelligence, Xian 710000, Peoples R China
[2] Norla Inst Tech Phys, Chengdu 610041, Peoples R China
基金
中国国家自然科学基金;
关键词
Image quality assessment; Semantic-relation transformer; Multi-head self-evaluation; Visible and infrared images fusion; MULTI-FOCUS; FUSION;
D O I
10.1016/j.inffus.2023.02.021
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Although extensive researches have carried out on visible and infrared images fusion, quality assessment of the fused image is still challenging due to the absence of the reference image. In this paper, a subjective benchmark dataset and a semi-reference objective assessment method based on a Transformer encoder-decoder framework named Semantic-Relation Transformer (SRT), are developed for Visible and Infrared Fused Image Quality Assessment (VIF-IQA). Different with existing Transformers, SRT decoder can extract multi-level source image features and adopts a Multi-Head Self-Evaluation (MHSE) block which is constructed to mine latent relation knowledge between the fused image and source images. The relation knowledge is then injected into 3D-token with deep semantic embedding of receptive regions. Finally, the objective assessment score is obtained from-token through linear mapping of local to global. Moreover, we meticulously select 4,000 fused images from 200 scenes in TNO, MSRS, M3FD and Road Scene datasets and create a Visible and Infrared fuSed qualiTy Assessment (VISTA) dataset, which is guided by Subjective VIF-IQA Specification rigorously. VISTA dataset is utilized to comprehensively validate the proposed SRT. The experimental results demonstrated that the output of SRT is more in keeping with subjective feelings. Moreover, SRT has state-of-the-art performance on quantitative metrics when compared with 12 popular methods and the output of SRT is more in keeping with subjective feelings. VISTA dataset is available at https://github.com/ChangeZH/VISTA-Dataset.
引用
收藏
页码:454 / 470
页数:17
相关论文
共 61 条
[1]  
Abdi H., 2007, Encyclopedia of Measurement and Statistics, P508
[2]   A new image quality metric for image fusion: The sum of the correlations of differences [J].
Aslantas, V. ;
Bendes, E. .
AEU-INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATIONS, 2015, 69 (12) :160-166
[3]  
Ba JL, 2016, Layer normalization
[4]  
Benesty Jacob, 2009, The SAGE Encyclopedia of Educational Research, Measurement, and Evaluation, P1, DOI DOI 10.4135/9781506326139
[5]   Root mean square error (RMSE) or mean absolute error (MAE)? - Arguments against avoiding RMSE in the literature [J].
Chai, T. ;
Draxler, R. R. .
GEOSCIENTIFIC MODEL DEVELOPMENT, 2014, 7 (03) :1247-1250
[6]   Perceptual Image Quality Assessment with Transformers [J].
Cheon, Manri ;
Yoon, Sung-Jun ;
Kang, Byungyeon ;
Lee, Junwoo .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, :433-442
[7]  
Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929
[8]   Image quality measures and their performance [J].
Eskicioglu, AM ;
Fisher, PS .
IEEE TRANSACTIONS ON COMMUNICATIONS, 1995, 43 (12) :2959-2965
[9]  
Haghighat M, 2014, I C APPL INF COMM TE, P424
[10]   A new image fusion performance metric based on visual information fidelity [J].
Han, Yu ;
Cai, Yunze ;
Cao, Yin ;
Xu, Xiaoming .
INFORMATION FUSION, 2013, 14 (02) :127-135