Frequency-Domain Guided Image Classification With Large Model Assistance

被引：0

作者：

Hua, Xia ^{[1
]}

Han, Lei ^{[1
]}

机构：

[1] China Univ Petr East China, Dept Phys Educ, Qingdao 266580, Shandong, Peoples R China

来源：

IEEE ACCESS | 2024年 / 12卷

关键词：

Frequency-domain analysis; Sports; Image classification; Training; Accuracy; Convolution; Random forests; Nearest neighbor methods; Image synthesis; Image recognition; Sport image classification; frequency-domain guided; large model assistance;

D O I：

10.1109/ACCESS.2024.3500099

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Image classification technology has made significant advancements, but methods tailored for specific image classification, such as sport image, remain inadequate. This is primarily constrained by two factors. First, the image quality of sport images varies greatly, and many rare sports have very few images. Second, most sport image classification techniques directly transfer image classification methods from other domains, with few approaches specifically designed based on the unique features of sport images. To address this problem, we devise a Frequency-domain Guided sport image classification method with Large model Assistance, named FGLA. Specifically, we design two main modules for FGLA. The first module combines Fourier Transform and Wavelet Transform to embed the frequency-domain information into the original image as additional channels, which converts the image into a three-dimensional cuboid, incorporating more comprehensive information. This allows the model to assign different weights to each channel, thereby enhancing image classification performance. The second module leverages the powerful image generation capabilities of large models to augment the dataset with more images, especially for rare sports. Additionally, it directly generates frequency-domain images to enhance the generalization of the classification model. We analyze the effectiveness of FGLA across many models and several classic sports datasets. The results indicate that FGLA achieves the highest accuracy in sport image classification. Moreover, this method also demonstrates strong generalization capabilities and can be adapted to other image classification tasks.

引用

页码：186246 / 186254

页数：9

共 25 条

[1]

Ali Md Anas, 2023, 2023 International Conference on Machine Learning and Cybernetics (ICMLC), P242, DOI 10.1109/ICMLC58545.2023.10327939

[2] Recognizing Daily and Sports Activities in Two Open Source Machine Learning Environments Using Body-Worn Sensor Units [J].

Barshan, Billur ;

Yuksek, Murat Cihan .

COMPUTER JOURNAL, 2014, 57 (11) :1649-1667

[3] Investigating the Backdoor on DNNs Based on Recolorization and Reconstruction: From a Multi-Channel Perspective [J].

Chen, Honglong ;

Gao, Yudong ;

Zhang, Anqing ;

Sun, Peng ;

Jiang, Nan ;

Liu, Weifeng ;

Wang, Xingang .

IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2024, 19 :6923-6934

[4]

Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929

[5]

Gao YD, 2024, AAAI CONF ARTIF INTE, P1851

[6] Deep Residual Learning for Image Recognition [J].

He, Kaiming ;

Zhang, Xiangyu ;

Ren, Shaoqing ;

Sun, Jian .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778

[7] Robust Principal Component Analysis Based on Maximum Correntropy Criterion [J].

He, Ran ;

Hu, Bao-Gang ;

Zheng, Wei-Shi ;

Kong, Xiang-Wei .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2011, 20 (06) :1485-1494

[8] Densely Connected Convolutional Networks [J].

Huang, Gao ;

Liu, Zhuang ;

van der Maaten, Laurens ;

Weinberger, Kilian Q. .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :2261-2269

[9]

Jyothi V., 2023, P INT C SUST COMP DA, P669

[10] Large-scale Text-to-Image Generation Models for Visual Artists' Creative Works [J].

Ko, Hyung-Kwon ;

Park, Gwanmo ;

Jeon, Hyeon ;

Jo, Jaemin ;

Kim, Juho ;

Seo, Jinwook .

PROCEEDINGS OF 2023 28TH ANNUAL CONFERENCE ON INTELLIGENT USER INTERFACES, IUI 2023, 2023, :919-933

← 1 2 3 →