Research on Feature Extraction and Classification for Unstructured Data based on Deep Learning

被引：0

作者：

Yu, Huayan ^{[1
]}

机构：

[1] Univ Toronto, Dept Math Statist, Toronto, ON, Canada

来源：

2024 5TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND COMPUTER ENGINEERING, ICAICE | 2024年

关键词：

Multi-Modal Learning; Transformer; Feature Extraction; Unstructured Data; Cross-Modal Attention;

D O I：

10.1109/ICAICE63571.2024.10864093

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

With the rapid growth of unstructured data such as text, images, audio, and video, traditional data analysis techniques are facing great challenges in dealing with the complexity and high dimensionality of such data. In this study, we propose a multimodal enhanced Transformer model to process and fuse different types of unstructured data by improving the self-attention mechanism and designing a multi-stream input architecture. Firstly, the model adopts a multi-stream input structure, each stream processes a data modality separately, and maps the data of each modality to the feature space of the same dimension through a dedicated preprocessing and coding network to form a unified representation. Subsequently, a cross-modal self-attention mechanism is introduced into the model, which can establish a global dependency between different modalities and automatically learn the correlation between modal features, so as to extract key features more accurately in the classification process. In order to reduce the computational complexity, an optimization algorithm based on sparse matrix is used to enable the model to efficiently process long sequences and high-dimensional data. Experimental analysis on the benchmark dataset shows that the proposed model is superior to the existing methods in terms of accuracy, precision and recall.

引用

页码：203 / 207

页数：5

共 11 条

[1]

Alkordy Noor Hamzah, 2022, P COMP METH SYST SOF, P104

[2] A Survey on Recent Named Entity Recognition and Relationship Extraction Techniques on Clinical Texts [J].

Bose, Priyankar ;

Srinivasan, Sriram ;

Sleeman, William C. ;

Palta, Jatinder ;

Kapoor, Rishabh ;

Ghosh, Preetam .

APPLIED SCIENCES-BASEL, 2021, 11 (18)

[3]

dos Santos Silva Bruno, 2023, Computerized Systems for Diagnosis and Treatment of COVID-19, P65

[4]

Harby Ahmed A., 2022, 2022 IEEE INT C BIG

[5] Masked Autoencoders Are Scalable Vision Learners [J].

He, Kaiming ;

Chen, Xinlei ;

Xie, Saining ;

Li, Yanghao ;

Dollar, Piotr ;

Girshick, Ross .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :15979-15988

[6]

Jaegle Andrew, 2021, P MACHINE LEARNING R, V139

[7] Multilevel Superpixel Structured Graph U-Nets for Hyperspectral Image Classification [J].

Liu, Qichao ;

Xiao, Liang ;

Yang, Jingxiang ;

Wei, Zhihui .

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60

[8]

Mikolov T, 2013, Arxiv, DOI [arXiv:1301.3781, 10.48550/arXiv.1301.3781]

[9] FLAVA: A Foundational Language And Vision Alignment Model [J].

Singh, Amanpreet ;

Hu, Ronghang ;

Goswami, Vedanuj ;

Couairon, Guillaume ;

Galuba, Wojciech ;

Rohrbach, Marcus ;

Kiela, Douwe .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :15617-15629

[10] Challenges and opportunities beyond structured data in analysis of electronic health records [J].

Tayefi, Maryam ;

Ngo, Phuong ;

Chomutare, Taridzo ;

Dalianis, Hercules ;

Salvi, Elisa ;

Budrionis, Andrius ;

Godtliebsen, Fred .

WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2021, 13 (06)

← 1 2 →