Deep learning for visual aesthetics: using convolutional vision transformers and HRNet for classifying anime and human selfies

被引:0
作者
Zhang, Congli [1 ]
机构
[1] Zhengzhou Academy of Fine Arts, Speciality of Fine Arts, Henan, Zhengzhou
关键词
artificial intelligence; classification; CNNs; convolutional neural networks; deep learning; feature extraction; vision transformers; visual aesthetics; ViT;
D O I
10.1504/IJICT.2025.146811
中图分类号
学科分类号
摘要
Digital media today plays a vital role in visual aesthetics and bringing them into play can impact user engagement and be crucial for personalised recommendations of content. Using AI, the task to classify and differentiate between human selfies and animated images, which are hard because of the subtle stylistic changes and the complex feature presentations in both categories. In this research study, we proposed an advanced framework that utilises vision transformers (ViT) and high-resolution networks (HRNet) for classification. With the help of an online dataset, the proposed models not only learn high level representations but also representational contextual dependencies well, classifying test data with 99% accuracy for ViT and 97% for HRNet at a level better than 10% of what traditional convolutional neural network (CNN) based models can achieve. The results leading for automatically content moderation, provide a solid base of using advanced vision models into multimedia and digital content processing. Copyright © 2025 Inderscience Enterprises Ltd.
引用
收藏
页码:75 / 98
页数:23
相关论文
共 25 条
[1]  
Bansal G., Nawal A., Chamola V., Herencsar N., Revolutionizing visuals: the role of generative AI in modern image generation, ACM Transactions on Multimedia Computing, Communications and Applications, 20, 11, pp. 1-22, (2024)
[2]  
Bekhet S., Alghamdi A.M., Taj-Eddin I., Gender recognition from unconstrained selfie images: a convolutional neural network approach, International Journal of Electrical & Computer Engineering, 12, 2, pp. 2066-2078, (2022)
[3]  
Chen S., Zwicker M., Transfer learning for pose estimation of illustrated characters, Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 793-802, (2022)
[4]  
Chen S., Zhang K., Shi Y., Wang H., Zhu Y., Song G., An S., Kristjansson J., Yang X., Zwicker M., Panic-3d: stylized single-view 3D reconstruction from portraits of anime characters, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 21068-21077, (2023)
[5]  
Hasan B.M.S., Mustafa R.J., A study of gender classification techniques based on iris images: a deep survey and analysis, Science Journal of University of Zakho, 10, 4, pp. 222-234, (2022)
[6]  
Hou L., Pan X., Aesthetics of hotel photos and its impact on consumer engagement: a computer vision approach, Tourism Management, 94, 9, (2023)
[7]  
Huang L., Zheng P., Human-computer collaborative visual design creation assisted by artificial intelligence, ACM Transactions on Asian and Low-Resource Language Information Processing, 22, 9, pp. 1-21, (2023)
[8]  
Jiang Y., Jiang L., Yang S., Loy C.C., Scenimefy: learning to craft anime scene via semi-supervised image-to-image translation, Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7357-7367, (2023)
[9]  
Khan U., Khan H.U., Iqbal S., Munir H., Four decades of image processing: a bibliometric analysis, Library Hi Tech, 42, 1, pp. 180-202, (2024)
[10]  
Lan Z., Maeda K., Ogawa T., Haseyama M., GCN-based multi-modal multi-label attribute classification in anime illustration using domain-specific semantic features, 2022 IEEE International Conference on Image Processing (ICIP), pp. 2021-2025, (2022)