Using Probabilistic Models for Data Compression

被引：4

作者：

Iatan, Iuliana ^{[1
]}

Dragan, Mihaita ^{[2
]}

Dedu, Silvia ^{[3
]}

Preda, Vasile ^{[2
,4
,5
]}

机构：

[1] Tech Univ Civil Engn, Dept Math & Comp Sci, Bucharest 020396, Romania

[2] Univ Bucharest, Fac Math & Comp Sci, Bucharest 010014, Romania

[3] Bucharest Univ Econ Studies, Dept Appl Math, Bucharest 010734, Romania

[4] Gheorghe Mihoc Caius Iacob Inst Math Stat & Appl, Bucharest 050711, Romania

[5] Costin C Kiritescu Natl Inst Econ Res, Bucharest 050711, Romania

来源：

MATHEMATICS | 2022年 / 10卷 / 20期

关键词：

data compression; descriptors; probabilistic models; entropy; Huffman coding; coding redundancy; coding efficiency; artificial intelligence; ENTROPY; PATTERN;

D O I：

10.3390/math10203847

中图分类号：

O1 [数学];

学科分类号：

0701 ; 070101 ;

摘要：

Our research objective is to improve the Huffman coding efficiency by adjusting the data using a Poisson distribution, which avoids the undefined entropies too. The scientific value added by our paper consists in the fact of minimizing the average length of the code words, which is greater in the absence of applying the Poisson distribution. Huffman Coding is an error-free compression method, designed to remove the coding redundancy, by yielding the smallest number of code symbols per source symbol, which in practice can be represented by the intensity of an image or the output of a mapping operation. We shall use the images from the PASCAL Visual Object Classes (VOC) to evaluate our methods. In our work we use 10,102 randomly chosen images, such that half of them are for training, while the other half is for testing. The VOC data sets display significant variability regarding object size, orientation, pose, illumination, position and occlusion. The data sets are composed by 20 object classes, respectively: aeroplane, bicycle, bird, boat, bottle, bus, car, motorbike, train, sofa, table, chair, tv/monitor, potted plant, person, cat, cow, dog, horse and sheep. The descriptors of different objects can be compared to give a measurement of their similarity. Image similarity is an important concept in many applications. This paper is focused on the measure of similarity in the computer science domain, more specifically information retrieval and data mining. Our approach uses 64 descriptors for each image belonging to the training and test set, therefore the number of symbols is 64. The data of our information source are different from a finite memory source (Markov), where its output depends on a finite number of previous outputs. When dealing with large volumes of data, an effective approach to increase the Information Retrieval speed is based on using Neural Networks as an artificial intelligent technique.

引用

页数：29

共 50 条

[31] Data envelopment analysis models for probabilistic classification
Pendharkar, Parag C.
COMPUTERS & INDUSTRIAL ENGINEERING, 2018, 119 : 181 - 192
[32] Evaluating computer animation models with lossy data compression using Kolmogorov complexity
Campani, CAP
Menezes, PB
CISST'03: PROCEEDING OF THE INTERNATIONAL CONFERENCE ON IMAGING SCIENCE, SYSTEMS AND TECHNOLOGY, VOLS 1 AND 2, 2003, : 721 - 725
[33] IMAGE DATA-COMPRESSION USING AUTOREGRESSIVE TIME-SERIES MODELS
DELP, EJ
KASHYAP, RL
MITCHELL, OR
PATTERN RECOGNITION, 1979, 11 (5-6) : 313 - 323
[34] Selectivity estimation using probabilistic models
Getoor, L
Taskar, B
Koller, D
SIGMOD RECORD, 2001, 30 (02) : 461 - 472
[35] Integration of Multi-Omics Data Using Probabilistic Graph Models and External Knowledge
Tripp, Bridget A.
Otu, Hasan H.
CURRENT BIOINFORMATICS, 2022, 17 (01) : 37 - 47
[36] Using sequence compression to speedup probabilistic profile matching
Freschi, V
Bogliolo, A
BIOINFORMATICS, 2005, 21 (10) : 2225 - 2229
[37] Query-based biclustering of gene expression data using Probabilistic Relational Models
Zhao, Hui
Cloots, Lore
Van den Bulcke, Tim
Wu, Yan
De Smet, Riet
Storms, Valerie
Meysman, Pieter
Engelen, Kristof
Marchal, Kathleen
BMC BIOINFORMATICS, 2011, 12
[38] Query-based biclustering of gene expression data using Probabilistic Relational Models
Hui Zhao
Lore Cloots
Tim Van den Bulcke
Yan Wu
Riet De Smet
Valerie Storms
Pieter Meysman
Kristof Engelen
Kathleen Marchal
BMC Bioinformatics, 12
[39] Bayesian Probabilistic Analysis of DEER Spectroscopy Data Using Parametric Distance Distribution Models
Sweger, Sarah R.
Pribitzer, Stephan
Stoll, Stefan
JOURNAL OF PHYSICAL CHEMISTRY A, 2020, 124 (30): : 6193 - 6202
[40] Using Probabilistic Models for Missing Data Prediction in Network Industries Performance Measurement Systems
Kuhi, Kristjan
Kaare, Kati Korbe
Koppel, Ott
25TH DAAAM INTERNATIONAL SYMPOSIUM ON INTELLIGENT MANUFACTURING AND AUTOMATION, 2014, 2015, 100 : 1348 - 1353

← 1 2 3 4 5 →