Using Probabilistic Models for Data Compression

被引:4
|
作者
Iatan, Iuliana [1 ]
Dragan, Mihaita [2 ]
Dedu, Silvia [3 ]
Preda, Vasile [2 ,4 ,5 ]
机构
[1] Tech Univ Civil Engn, Dept Math & Comp Sci, Bucharest 020396, Romania
[2] Univ Bucharest, Fac Math & Comp Sci, Bucharest 010014, Romania
[3] Bucharest Univ Econ Studies, Dept Appl Math, Bucharest 010734, Romania
[4] Gheorghe Mihoc Caius Iacob Inst Math Stat & Appl, Bucharest 050711, Romania
[5] Costin C Kiritescu Natl Inst Econ Res, Bucharest 050711, Romania
关键词
data compression; descriptors; probabilistic models; entropy; Huffman coding; coding redundancy; coding efficiency; artificial intelligence; ENTROPY; PATTERN;
D O I
10.3390/math10203847
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Our research objective is to improve the Huffman coding efficiency by adjusting the data using a Poisson distribution, which avoids the undefined entropies too. The scientific value added by our paper consists in the fact of minimizing the average length of the code words, which is greater in the absence of applying the Poisson distribution. Huffman Coding is an error-free compression method, designed to remove the coding redundancy, by yielding the smallest number of code symbols per source symbol, which in practice can be represented by the intensity of an image or the output of a mapping operation. We shall use the images from the PASCAL Visual Object Classes (VOC) to evaluate our methods. In our work we use 10,102 randomly chosen images, such that half of them are for training, while the other half is for testing. The VOC data sets display significant variability regarding object size, orientation, pose, illumination, position and occlusion. The data sets are composed by 20 object classes, respectively: aeroplane, bicycle, bird, boat, bottle, bus, car, motorbike, train, sofa, table, chair, tv/monitor, potted plant, person, cat, cow, dog, horse and sheep. The descriptors of different objects can be compared to give a measurement of their similarity. Image similarity is an important concept in many applications. This paper is focused on the measure of similarity in the computer science domain, more specifically information retrieval and data mining. Our approach uses 64 descriptors for each image belonging to the training and test set, therefore the number of symbols is 64. The data of our information source are different from a finite memory source (Markov), where its output depends on a finite number of previous outputs. When dealing with large volumes of data, an effective approach to increase the Information Retrieval speed is based on using Neural Networks as an artificial intelligent technique.
引用
收藏
页数:29
相关论文
共 50 条
  • [1] Metabolomic Data Deconvolution Using Probabilistic Purification Models
    Wang, Minkun
    Di Poto, Cristina
    Ferrarini, Alessia
    Yu, Guoqiang
    Ressom, Habtom W.
    2016 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2016, : 204 - 209
  • [2] Learning Semantic Models of Data Sources Using Probabilistic Graphical Models
    Binh Vu
    Knoblock, Craig A.
    Pujara, Jay
    WEB CONFERENCE 2019: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2019), 2019, : 1944 - 1953
  • [3] Data compression of color images using a probabilistic linear transform approach
    Gershikov, Evgeny
    Porat, Moshe
    NUMERICAL METHODS AND APPLICATIONS, 2007, 4310 : 582 - +
  • [4] Compression of Probabilistic Volumetric Models using multi-resolution scene flow
    Biris, Octavian
    Ulusoy, Ali O.
    Mundy, Joseph L.
    IMAGE AND VISION COMPUTING, 2017, 64 : 79 - 89
  • [5] Spam filtering using statistical data compression models
    Department of Intelligent Systems, Jožef Stefan Institute, Jamova 39, Ljubljana, SI-1000, Slovenia
    不详
    不详
    J. Mach. Learn. Res., 2006, (2673-2698):
  • [6] Spam filtering using statistical data compression models
    Bratko, Andrej
    Cormack, Gordon V.
    Filipic, Bogdan
    Lynam, Thomas R.
    Zupan, Blaz
    JOURNAL OF MACHINE LEARNING RESEARCH, 2006, 7 : 2673 - 2698
  • [7] Probabilistic updating of building models using incomplete modal data
    Sun, Hao
    Bueyuekoeztuerk, Oral
    MECHANICAL SYSTEMS AND SIGNAL PROCESSING, 2016, 75 : 27 - 40
  • [8] Sentiment polarity classification using statistical data compression models
    Ziegelmayer, Dominique
    Schrader, Rainer
    12TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2012), 2012, : 731 - 738
  • [9] Data-driven discovery using probabilistic hidden variable models
    Smyth, Padhraic
    DISCOVERY SCIENCE, PROCEEDINGS, 2006, 4265 : 13 - 13
  • [10] Detecting Anomalies in Controlled Drug Prescription Data Using Probabilistic Models
    Hu, Xuelei
    Gallagher, Marcus
    Loveday, William
    Connor, Jason P.
    Wiles, Janet
    ARTIFICIAL LIFE AND COMPUTATIONAL INTELLIGENCE, 2015, 8955 : 337 - 349