Variational Autoencoder Model Combining Deep Learning and Probability Statistics and Its Application in Large-scale Data Analysis

被引：0

作者：

Zou, Lingguo ^{[1
]}

Zhang, Meihua ^{[2
]}

机构：

[1] School of Public Education, Xiamen Ocean Vocational College, Xiamen

[2] College of General Education, Xiamen Huatian International Vocation Institute, Xiamen

来源：

Informatica (Slovenia) | 2024年 / 48卷 / 22期

关键词：

bayesian; deep learning; layer-by-layer learning strategy; probability statistics; variational autoencoder;

D O I：

10.31449/inf.v48i22.6921

中图分类号：

学科分类号：

摘要：

A multi-layer generative model is proposed as a means of enhancing the accuracy of large-scale data analysis. This model addresses the problem of limited feature extraction capability and insufficient association with label information in existing topic models. The model is divided into three main modules: text encoding, autoencoder inference, and layer-by-layer learning. The model combines a hierarchical Bayesian model with a deterministic upward random downward network structure. It uses a Poisson Gamma Belief Network as a decoder to capture hierarchical implicit features in text data during text encoding, autoencoder inference, and layer-by-layer learning. Random Gradient Monte Carlo sampling is used for posterior inference to improve the model efficiency. In addition, the Fisher information matrix is used to adaptively adjust the learning rate of different levels and topic parameters, and a layer-by-layer learning strategy is introduced to construct a learning network. Based on this, text data and label information are combined for feature extraction. The results demonstrated that the test error rates of the designed model on the 20News, RCV1, and IMDB datasets were 16.52%, 18.72%, and 11.67%, respectively, all of which were the lowest. Additionally, the testing time was the shortest, at 0.020s, 0.017s, and 0.015s, respectively, indicating a high level of accuracy and efficiency. In addition, the perplexity levels on the 20News, RCV1, and Wiki datasets were 590.23, 953.12, and 982.67, respectively, significantly lower than those of other comparison models. Given this, the designed model has high data analysis and interpretation capabilities and relatively high computational efficiency, which can provide scientific tools for accurately analyzing large-scale data in batches. © 2024 Slovene Society Informatika. All rights reserved.

引用

页码：31 / 46

页数：15

共 24 条

[1] Li X., Li C., Rahaman M. M., Sun H., Li H., Wu J., Yao Y., Grzegorzek M., A comprehensive review of computer-aided whole-slide image analysis: From datasets to feature extraction, segmentation, classification and detection approaches, Artificial Intelligence Review, 55, pp. 4809-4878, (2022)
[2] Barbhuiya A. A., Karsh R. K., Jain R., CNN based feature extraction and classification for sign language, Multimedia Tools and Applications, 80, pp. 3051-3069, (2021)
[3] Al-doori S. K. S., Taspinar Y. S., Koklu M., Distracted driving detection with machine learning methods by cnn based feature extraction, International Journal of Applied Mathematics Electronics and Computers, 9, 4, pp. 116-121, (2021)
[4] Taiwo R., Zayed T., Seghier M. E. A. B., Integrated intelligent models for predicting water pipe failure probability, Alexandria Engineering Journal, 86, pp. 243-257, (2024)
[5] Sharifani K., Amini M., Machine learning and deep learning: A review of methods and applications, World Information Technology and Engineering Journal, 10, pp. 3897-3904, (2023)
[6] Menghani G., Efficient deep learning: A survey on making deep learning models smaller, faster, and better, ACM Computing Surveys, 55, 12, pp. 1-37, (2023)
[7] Yonekura K., Suzuki K., Data-driven design exploration method using conditional variational autoencoder for airfoil design, Structural and Multidisciplinary Optimization, 64, pp. 613-624, (2021)
[8] Bai J., Wang W., Gomes C. P., Contrastively disentangled sequential variational autoencoder, Advances in Neural Information Processing Systems, pp. 10105-10118, (2021)
[9] Kumar V. T. R. P., Arulselvi M., Sastry K. B. S., Comparative assessment of colon cancer classification using diverse deep learning approaches, Journal of Data Science and Intelligent Systems, 1, 3, pp. 128-135, (2023)
[10] Mansour R. F., Escorcia-Gutierrez J., Gamarra M., Gupta D., Castillo O., Kumar S., Unsupervised deep learning based variational autoencoder model for COVID-19 diagnosis and classification, Pattern Recognition Letters, 151, pp. 267-274, (2021)

← 1 2 3 →