Intangible cultural heritage image classification with multimodal attention and hierarchical fusion

被引:7
作者
Fan, Tao [1 ,2 ]
Wang, Hao [1 ,2 ]
Deng, Sanhong [1 ,2 ]
机构
[1] Nanjing Univ, Sch Informat Management, Nanjing 210023, Peoples R China
[2] Knowledge Serv, Jiangsu Key Lab Data Engn, Nanjing 210023, Peoples R China
基金
中国国家自然科学基金;
关键词
Digital humanities; Intangible cultural heritage; Image classification; Multimodal fusion; Attention mechanism; SENTIMENT ANALYSIS;
D O I
10.1016/j.eswa.2023.120555
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Designing an efficient Intangible Cultural Heritage (ICH) image classification model is beneficial for the public to recognize the ICH and fostering the preservation and spread of ICH. Currently, related ICH image classification researches mainly focus on the visual features of ICH images, ignoring attached textual descriptions. However, attached textual descriptions can provide crucial clues for ICH images classification. Therefore, in this study, we propose to combine attached textual descriptions to perform ICH image classification in a multimodal way. Additionally, to capture intra- and inter-interactions between ICH images and attached textual descriptions, we propose a novel model named MICMLF, mainly consisted of multimodal attention and hierarchical fusion. Multimodal attention is employed to make the model focus on "important regions" and "important words" in ICH image and attached textual descriptions respectively. Hierarchical fusion is utilized to capture inter-modal dynamics interactions. Extensive experiments are conducted on datasets of two Chinese nation-level ICH projects, New Year Print ( yen (sic)) and Clay Figurine (Zg). Experimental results demonstrate the superiority of MICMLF, compared with several state-of-the-art methods. Also, the proposed model can handle the situation where ICH images and textual descriptions are incomplete.
引用
收藏
页数:14
相关论文
共 43 条
[1]  
[Anonymous], 2018, J. Telecommun. AElectron. Comput. Eng.
[2]  
[Anonymous], 2017, HCI 17 P 31 BRIT COM
[3]   Flower classification with modified multimodal convolutional neural networks [J].
Bae, Kang Il ;
Park, Junghoon ;
Lee, Jongga ;
Lee, Yungseop ;
Lim, Changwon .
EXPERT SYSTEMS WITH APPLICATIONS, 2020, 159
[4]   Comparative effectiveness of convolutional neural network (CNN) and recurrent neural network (RNN) architectures for radiology text report classification [J].
Banerjee, Imon ;
Ling, Yuan ;
Chen, Matthew C. ;
Hasan, Sadid A. ;
Langlotz, Curtis P. ;
Moradzadeh, Nathaniel ;
Chapman, Brian ;
Amrhein, Timothy ;
Mong, David ;
Rubin, Daniel L. ;
Farri, Oladimeji ;
Lungren, Matthew P. .
ARTIFICIAL INTELLIGENCE IN MEDICINE, 2019, 97 :79-88
[5]  
Belhi A, 2018, I C COMP SYST APPLIC
[6]   Sentiment Analysis About Investors and Consumers in Energy Market Based on BERT-BiLSTM [J].
Cai, Ren ;
Qin, Bin ;
Chen, Yangken ;
Zhang, Liang ;
Yang, Ruijiang ;
Chen, Shiwei ;
Wang, Wei .
IEEE ACCESS, 2020, 8 :171408-171415
[7]   ArCo: The Italian Cultural Heritage Knowledge Graph [J].
Carriero, Valentina Anita ;
Gangemi, Aldo ;
Mancinelli, Maria Letizia ;
Marinucci, Ludovica ;
Nuzzolese, Andrea Giovanni ;
Presutti, Valentina ;
Veninata, Chiara .
SEMANTIC WEB - ISWC 2019, PT II, 2019, 11779 :36-52
[8]   A deep learning CNN architecture applied in smart near-infrared analysis of water pollution for agricultural irrigation resources [J].
Chen, Huazhou ;
Chen, An ;
Xu, Lili ;
Xie, Hai ;
Qiao, Hanli ;
Lin, Qinyong ;
Cai, Ken .
AGRICULTURAL WATER MANAGEMENT, 2020, 240
[9]   Xception: Deep Learning with Depthwise Separable Convolutions [J].
Chollet, Francois .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1800-1807
[10]   CNN Classification of the Cultural Heritage Images [J].
Cosovic, Marijana ;
Jankovic, Radmila .
2020 19TH INTERNATIONAL SYMPOSIUM INFOTEH-JAHORINA (INFOTEH), 2020,