Vision Transformers (ViTs) for Feature Extraction and Classification of AI-Generated Visual Designs

被引：0

作者：

Yun, Qing ^{[1
]}

机构：

[1] Kyungil Univ, Sch Int Exchange, Gyongsan 38428, South Korea

来源：

IEEE ACCESS | 2025年 / 13卷

关键词：

Artificial intelligence; Art; Visualization; Deep learning; Transformers; Accuracy; Data models; Computer vision; Feature extraction; Computational modeling; convolutional neural network; deep learning; vision transformer; visual aesthetics;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Deep learning has become a cornerstone of modern Artificial Intelligence (AI), enabling machines to process and interpret complex visual information with unprecedented accuracy. As AI-generated content becomes more realistic, the ability to distinguish between machine-created and human-created images is increasingly important. This challenge extends beyond technical concerns, influencing digital media credibility, intellectual property rights, and the integrity of visual communication. Developing robust classification models to accurately attribute image origins is crucial for ensuring transparency, preventing misinformation, and upholding artistic authenticity in an era of rapidly evolving generative AI technologies. This study addresses the critical need to differentiate between AI-generated and human-generated aesthetic images through the application of advanced deep learning models. We investigate the effectiveness of advanced deep learning architectures including High Resolution Networks (HRNet), and Vision Transformers (ViT) which are generally accurate when used to infer the creative characteristics of human visual art. The proposed model ViT, employing a mechanism of self-attention to process images as sequences of patches for feature extraction, is examined for its potential to capture global contextual relationships within images, which is essential for recognizing the nuanced differences between AI and human artistry. ViT achieves 97% accuracy shows that superior performance validates its ability, using its transformer structure, to analyze and learn about the complex features of images which disclose their origin as compared to HRNet model of 95%. This research highlights the potential of using sophisticated deep learning techniques to address the challenges of content authenticity in digital media. By leveraging the unique strengths of each model, we provide insights into their applicability and effectiveness in distinguishing between different forms of digital creation, marking a significant step forward in the field of digital forensics and content verification.

引用

页码：69459 / 69477

页数：19

共 41 条

[1]

Alghobiri M., 2016, Int. J. Bus. Technol., V4, P6, DOI [10.33107/ijbte.2016.4.2.06, DOI 10.33107/IJBTE.2016.4.2.06]

[2] Using deep learning and word embeddings for predicting human agreeableness behavior [J].

Alsini, Raed ;

Naz, Anam ;

Khan, Hikmat Ullah ;

Bukhari, Amal ;

Daud, Ali ;

Ramzan, Muhammad .

SCIENTIFIC REPORTS, 2024, 14 (01)

[3]

Ara A., 2024, Global Mainstream J. Innov., Eng. Emerg. Technol., V3, P11

[4]

Aris S., 2023, Journal of Cyberspace Studies, V7, P219, DOI [10.22059/jcss.2023.366256.1097, DOI 10.22059/JCSS.2023.366256.1097]

[5] ViViT: A Video Vision Transformer [J].

Arnab, Anurag ;

Dehghani, Mostafa ;

Heigold, Georg ;

Sun, Chen ;

Lucic, Mario ;

Schmid, Cordelia .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :6816-6826

[6] Revolutionizing Visuals: The Role of Generative AI in Modern Image Generation [J].

Bansal, Gaurang ;

Nawal, Aditya ;

Chamola, Vinay ;

Herencsar, Norbert .

ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (11)

[7] Humans versus AI: whether and why we prefer human-created compared to AI-created artwork [J].

Bellaiche, Lucas ;

Shahi, Rohin ;

Turpin, Martin Harry ;

Ragnhildstveit, Anya ;

Sprockett, Shawn ;

Barr, Nathaniel ;

Christensen, Alexander ;

Seli, Paul .

COGNITIVE RESEARCH-PRINCIPLES AND IMPLICATIONS, 2023, 8 (01)

[8] CIFAKE: Image Classification and Explainable Identification of AI-Generated Synthetic Images [J].

Bird, Jordan J. ;

Lotfi, Ahmad .

IEEE ACCESS, 2024, 12 :15642-15650

[9]

Borana S. L., 2019, 2019 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS), P495, DOI 10.1109/ICCCIS48478.2019.8974502

[10]

Dong X., CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows

← 1 2 3 4 5 →